Gradient Descent is a commonly used algorithm in Machine Learning to find what parameters would minimize a cost function and find the best hypothesis function to predict a dataset.

The principle of Gradient Descent is to incrementally update parameters until a combination of parameter values creates a hypothesis function which most accurately predicts the data. To accurately predict the data, the hypothesis function would necessarily have a small amount of error. The error is measured by the cost function. After the rate of cost function decrease falls below a certain threshold, the Gradient Descent algorithm has converged on a local minimum point which may not be the global minima. To find a global minima, different starting parameters needs to be used.

In a single variable or feature problem, Gradient Descent works by defining an initial cost function with initial parameter values. The partial derivative of the cost function is found for one, a *part*, of the parameters. Once the partial derivative is found, the initial parameter value used is subtracted by the partial derivative. This continues for the remaining parameter and until parameter values are found to minimize the cost function.

In multiple variable Gradient Descent, the process is the same except there are more parameters update in each iteration.