A small piece of machine learning: linear regression.
(More like a note to myself hehe)
(Prof. Andrew Ng of Stanford University is damn awesome!)
In linear regression, we want to come up with hypothesis hθ(→x)=∑ni=0θixi which minimizes the least square function J(θ)=12∑mi=0(hθ(→xi)−yi)2. To do so, we employ an algorithm called the gradient descent algorithm, which seeks to minimizes J(θ) by incrementally updating θ by going down the steepest gradient. For each j, we update
θj=θj−α∂∂θjJ(θ)
until J(θ) converges. α is called the learning rate which determines the rate of convergence (somewhat, but if α is too big, the algorithm becomes very unstable and diverges). The algorithm is usually repeated in several (to thousands) of iterations.
If we operate the partial derivatives, we get
θj=θj−α∑mi=0(hθ(xi)−yi)xij
From here we have two choices of implementing the algorithm: by batch gradient descent or by stochastic gradient descent, depending on whether we want to iterate through all m data in the training set, or we want to process each one at a time.
I think choosing the appropriate α and number of iteration is the key for this algorithm to run and output satisfactory hypothesis function.
Follow up post:
Machine Learning: Closed form Linear Regression
No comments:
Post a Comment