Kaggle Housing Prices Competition Evaluation Metric

While I was reading through the Housing Prices Competition for Kaggle Learn Users description, I wanted to get a better understanding of how user’s submissions were evaluated. What follows is an exploration of the metric used.

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

The metric is using a logarithm to convert measures from a scale based on individual dollar units to a logarithmic scale based on proportional differences between the predicted and the observed sales price.

For illustration, I’ll denote the units to be powers of 10 which can be scaled up to make the result more realistic. A number like 10^1 = 10 can be scaled to 100,000 by multiplying the result by 10,000.

Let’s say there were two houses sold. A cheap house and an expensive house. The cheap house was sold for 10^{-1} and the expensive house was sold for 10^1. If a prediction of $1 (10^0) was made for both houses, then the differences without taking the logarithm would be -0.9 (10^{-1} - 10^0 = 0.1 - 1 = -0.9) and 9 (10^1 - 10^0 = 10 - 1 = 9) respectively. The corresponding RMSE would be 0.9 and 9.

Just by looking at the numbers, one may think the prediction was better for the cheaper house than the more expensive house. Actually, the predictions are equal in the amount that they differ from the observed sales price. This can be concluded by switching from measuring on a dollar unit scale to a proportional scale.

Between the numbers 1 (10^0) and 10 (10^1), there are 9 discrete units (1…10). The same goes for 0.1 (10^{-1}) and 1 (10^0) by (0.1…1). Since the relative unit distances of the prediction and the observed sales price are the same, the relative prediction error should also be the same.

This can be shown by |log(10^0) - log(10^{-1})| = |log(10^0) - log(10^1)| = 1. The absolute value is taken since this would be the effect of taking RMSE on a single data point like this. \sqrt{\frac{(log(10^0) - log(10^{-1}))^2}{1}} = 1

Here I use a geometric mean to show that the difference between halfway between the order of magnitudes above and below the predict error will result in the same amount of error.

|log(10^0) - log(\sqrt{log(10^0) \cdot log(10^1)})| = |log(10^0) - log(\sqrt{log(10^0) \cdot log(10^{-1})}| = 0.5


Derivatives is a concept introduced in Calculus where it is the instantaneous rate of change or slope at a given point. The slope of a line can be found by \frac{rise}{run} or \frac{height}{width} or \frac{\delta y}{\delta x}.

In the linear equation of f(x) = 2x, the slope is 2 because for each incremental change of x the output value would be 2. \frac{2}{1}

For curves such as a quadratic, the idea of a slope still applies and it is instead called the derivative. g(x) = x^2 has a derivative of 2x because for a change in x such as 2, the resulting output of the function g(2) is twice the input or in this case 4. g(2) = 4. \delta x = 2 \delta y = 4. \frac{\delta y}{\delta x} = \frac{4}{2}. We say the instantaneous rate of change or the derivative is 2x. Instead of saying the whole equation’s slope is 2x as we did for f(x) above, we can use the Leibniz notation.

Further reading: Derivatives Wikipedia