Previously I described a machine learning single variable linear regression cost function algorithm which used a single feature to define a hypothesis. In this post, I describe how a cost function can be extended to use multiple features.

A single variable or linear equation takes the form of where are the parameters of the function and `x`

is a feature in the dataset. An equation which takes two or more variables are polynomial equations such as a quadratic or cubic equations ( or ).

Using Linear Algebra, the parameters and features can be contained in separate vectors. The parameter vector can be transposed and then multiplied with the features vector to get the hypothesis function prediction. This is a nice concise way to find the result of a hypothesis function without writing out the whole equation.

A cost function is used along with the Gradient Descent algorithm to find the best parameters .

Now that there are more variables, they may be of different scales. This is where feature scaling comes in. For a feature, find the maximum value and minimum value of the value range. Subtracting the minimum from the maximum results in a range size. Modify the cost function to divide by the range size. The result are feature values between -1 and 1.

A learning rate can be tuned by either decreasing or increasing the learning rate. A larger learning rate results in overshooting and not finding convergence. A small learning rate results in a slow convergence. It is a good idea to test out several learning rates in order to strike a good balance between convergence speed and accuracy.