In logistic regression, \(P(G=k|X=x)\) is modelled as follows:
For k = 1 to K-1,
\[ P(G=k|X=x) = \frac{ \exp \left\{ \beta_k^Tx \right\} } {1 + \sum_{i=1}^{K-1} \exp \left\{ \beta_i^Tx \right\} } \]
and for the final class,
\[ P(G=K|X=x) = \frac{ 1 } {1 + \sum_{i=1}^{K-1} \exp \left\{ \beta_i^Tx \right\} } \]
Here \(x\) and \(\beta_k\) are all \(d+1\)-vectors, where \(d\) is the dimension of input data, and the additional 1 parameter is for the intercept term (or bias).
Let \( \beta \) be an array which combines all the \( \beta_k \), i.e.
\[ \beta^T = \{ \beta_1^T, \beta_2^T, \ldots, \beta_{K-1}^T \} \]