Processing math: 100%

Tuesday, November 4, 2014

Machine Learning: Linear Regression

A small piece of machine learning: linear regression.
(More like a note to myself hehe)
(Prof. Andrew Ng of Stanford University is damn awesome!)

In linear regression, we want to come up with hypothesis hθ(x)=ni=0θixi which minimizes the least square function J(θ)=12mi=0(hθ(xi)yi)2. To do so, we employ an algorithm called the gradient descent algorithm, which seeks to minimizes J(θ) by incrementally updating θ by going down the steepest gradient. For each j, we update
θj=θjαθjJ(θ)
until J(θ) converges. α is called the learning rate which determines the rate of convergence (somewhat, but if α is too big, the algorithm becomes very unstable and diverges). The algorithm is usually repeated in several (to thousands) of iterations.



If we operate the partial derivatives, we get
θj=θjαmi=0(hθ(xi)yi)xij
From here we have two choices of implementing the algorithm: by batch gradient descent or by stochastic gradient descent, depending on whether we want to iterate through all m data in the training set, or we want to process each one at a time.
I think choosing the appropriate α and number of iteration is the key for this algorithm to run and output satisfactory hypothesis function.

Follow up post:
Machine Learning: Closed form Linear Regression

No comments:

Post a Comment