While I was reading through the Housing Prices Competition for Kaggle Learn Users description, I wanted to get a better understanding of how user’s submissions were evaluated. What follows is an exploration of the metric used.
Metric
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
The metric is using a logarithm to convert measures from a scale based on individual dollar units to a logarithmic scale based on proportional differences between the predicted and the observed sales price.
For illustration, I’ll denote the units to be powers of 10 which can be scaled up to make the result more realistic. A number like can be scaled to by multiplying the result by .
Let’s say there were two houses sold. A cheap house and an expensive house. The cheap house was sold for and the expensive house was sold for . If a prediction of $1 () was made for both houses, then the differences without taking the logarithm would be and respectively. The corresponding RMSE would be and .
Just by looking at the numbers, one may think the prediction was better for the cheaper house than the more expensive house. Actually, the predictions are equal in the amount that they differ from the observed sales price. This can be concluded by switching from measuring on a dollar unit scale to a proportional scale.
Between the numbers and , there are 9 discrete units (1…10). The same goes for and by (0.1…1). Since the relative unit distances of the prediction and the observed sales price are the same, the relative prediction error should also be the same.
This can be shown by . The absolute value is taken since this would be the effect of taking RMSE on a single data point like this.
Here I use a geometric mean to show that the difference between halfway between the order of magnitudes above and below the predict error will result in the same amount of error.