a bit of cs: Logistic Regression

Friday, July 10, 2015

Machine Learning: Logistic Regression

Logistic Regression is one of the linear classifier. Here I would like to describe the process of implementing a multi-class linear regression learning algorithm.

In logistic regression, \(P(G=k|X=x)\) is modelled as follows:

For k = 1 to K-1,

\[ P(G=k|X=x) = \frac{ \exp \left\{ \beta_k^Tx \right\} } {1 + \sum_{i=1}^{K-1} \exp \left\{ \beta_i^Tx \right\} } \]

and for the final class,

\[ P(G=K|X=x) = \frac{ 1 } {1 + \sum_{i=1}^{K-1} \exp \left\{ \beta_i^Tx \right\} } \]

Here \(x\) and \(\beta_k\) are all \(d+1\)-vectors, where \(d\) is the dimension of input data, and the additional 1 parameter is for the intercept term (or bias).

Let \( \beta \) be an array which combines all the \( \beta_k \), i.e.

\[ \beta^T = \{ \beta_1^T, \beta_2^T, \ldots, \beta_{K-1}^T \} \]

Saturday, May 2, 2015

Machine Learning: Logistic Regression for Classification

Well, today I spent significant amount of time teaching my Mac to distinguish 0 from 1. The motivation was mostly to check my own understanding of logistic regression.

Logistic regression model for two classes can be summarised as follows:
Let \(P(Y=1 | X=x) = h_w(x) \) where \(h_w(x) = g(w^Tx)\) and \(g(x) = \frac{1}{1+e^{-x}}\). The function \(g(x)\) is called a sigmoid function which takes value between 0 and 1. Naturally we have \(P(Y=0|X=x) = 1-h_w(x)\), since we are assuming only two classes. It can be used to map a real value to a probability space, and it has many characteristics that is good for this purpose.