machine learning - Why derivative of a function is used to calculate Local Minimum instead of the actual function? -
in machine learning regression problem, why local minimum computed derivative function instead of actual function?
example: http://en.wikipedia.org/wiki/gradient_descent
the gradient descent algorithm applied find local minimum of function $$
f(x)=x^4−3x^3+2, ----(a)
with derivative
f'(x)=4x^3−9x^2. ----(b)
here find local minimum using gradient descent algorithm function (a) have used derivative function of (a) function (b).
the reason because function concave (or convex if you're doing maximisation---these problems equivalent), know there single minimum (maximum). means there single point gradient equal zero. there techniques utilize function itself, if can compute gradient, can converge much faster because can think of gradient giving info how far optimum.
as gradient descent, there's optimisation method known newton's method, requires computation of sec derivative (the hessian in multi-variate optimisation). converges faster still, requires able invert hessian not feasible if have lot of parameters. there methods around compute limited memory approximation of hessian. these methods converge faster still because they're using info curvature of gradient: it's simple tradeoff, more know function you're trying optimise, faster can find solution.
machine-learning linear-regression derivative differentiation gradient-descent
No comments:
Post a Comment