REGIONAL PRICE PREDICTION ERROR ANALYSIS

Introduction

We are now predicting the spot price of each super regional via a historic regression based on comparable LIFFE & Hectare Trading bids. This can be found here.

But how accurate is this prediction? This analysis will attempt to implement some meaningful error rate metrics that can be used to evaluate the accuracy of the prediction and steps we may want to implement next to help improve accuracy.


Establishing an error value

First we need to establish a singular quantitative value that measures how off the mark our prediction was. As we are trying to predict the value of spot prices for each given day. Let's define the error as the difference between the predicted price based on the LIFFE on a given, compared to the actual spot bid prices that were placed on that day.

For example, on the 25th of April, 2024. The regression predicted that the spot price in the Midlands and Wales super region would be £171.87, based ont the LIFFE price at the point being £164.90.

On that day, we received two spot bid prices in the Midlands and Wales super region. These were priced at £175 and £172, compared to our prediction that gives us an error of ~£2.90 and £0.20. Let's call this value our predicted price error.

Now we can calculate the predicted price error for all 365 days in which we are predicting prices and use it to evaluate the prediction.

Analysing Error Values

Error Distribution

Let's start by plotting the distribution of the predicted price error, just for Midlands & Wales for now.

Cool, we can see that the error is a normally distributed, which means we can create use some common error metrics:

Average Absolute Error

The most simple being the average rate of error from the observed spot prices. We use the "absolute" error, meaning that we convert all negative error values into positive values and take the average of the whole set.

The average absolute error for the Midlands & Wales for example is £

Standard Deviations

Another way to represent this error would be to use the standard deviation of the distribution. Simply put, the standard deviation is the error value limit that 34% of the error values would fall into, we can then take the negative side of our distribution. For the Midlands & Wales for example 68% of spot prices fall within ±£ error.

Static Threshold Value

We could just set our own threshold of acceptable error. This makes quite a lot of sense in the context of what our prediction aims to achieve. Our sellers will always receive a range of prices when they create an listing, the important thing is that the predicted price gives a reasonable idea of what price range they can expect.

To test this can set a error range that we deem acceptable and check how many spot values would have fallen into that range. For example, let's say that we want our predicted price to fall within ±£5 of our predicted price. If we predicted the price to be £180, any spot price received between £175 and £185 would be considered a within our range, and outside we'd consider as an error.

For the Midlands & Wales for example, of spot prices fell within a £5 range of the predicted price.

Overall Results

For each of the regions this would leave us with the following error values:


Evaluating Forward Carry Offset Error

When building the pricing regressions, if we cannot compare a bid price to a LIFFE price directly due to the bid's movement month being outside of the LIFFE's fixed movement month options it provides pricing for, we scale down the next available LIFFE price so we can compare it to the bid price.

We do this by applying a £2 offset for every month in the future the LIFFE price is compared to our bid price.

For example, if we received a price of £180 to move in May, we would compare this price directly to the LIFFE May price. However if we received a price of £182 to move in June, we would take the LIFFE July price and subtract £2 from it to account for the extra carry the LIFFE price has for moving an month later than our price.

Doing this allows us to increase our sample size, and to receive more data in sparser regions. This means that our regressions will be more representative of changes in the regional biases that may occur over time, as there will be less gaps in our data. But how does this effect the accuracy of our predictions?

Without Forward Carry Offset Samples

With Forward Carry Offset Samples

There is some reduction in Std. Deviation error and the £5 error rate in some regions but not all. But overall there is a negligible effect on the accuracy of our predictions when adding samples with forward carry offsets applied to the LIFFE data.

This shouldn't been seen as a failure to make the predictions more accurate, as:


Evaluating Error over Time

By plotting the error over time, we can see if there's any seasonal patterns in our prediction error:

A reminder that when the error is negative value this means that the spot prices are lower than the predicted value. So we can see that in July - October, our regressions are consistently predicting a higher value than our incoming spot values.

This may be due to the fact that we are currently taking the next upcoming movement month's price from the LIFFE to base our prediction on. On dates where this price is based on a future movement month, there is some forward carry we need to account for. This is especially apparent in the month of August/September where the next available LIFFE data point is the November movement month.

To remedy this, we could add a £2 per month offset (you can read more on this here) to the predicted value we give each regression to account for how far forward the LIFFE price is compared the prediction date.

Click the button below to add this offset.

However this has negligible effects on the actual accuracy as the model at an overall level, as it pushes the model to slightly overestimate the price the majority of the time.

With Predicted Price Forward Carry Offset

Super Region Avg. Absolute ... Std. Deviation £5 Error Rate £7.5 Error Rate £10 Error Rate
South £4.39 ±£2.98 71% 91% 97%
Midlands & Wales £4.17 ±£3.16 75% 87% 96%
North £4.96 ±£4.58 70% 74% 81%
Scotland £3.73 ±£3.20 78% 87% 97%

Without Predicted Price Forward Carry Offset

Super Region Avg. Absolute ... Std. Deviation £5 Error Rate £7.5 Error Rate £10 Error Rate
South £4.26 ±£2.96 72% 90% 97%
Midlands & Wales £4.01 ±£3.11 76% 88% 97%
North £5.03 ±£4.49 67% 72% 81%
Scotland £3.85 ±£3.31 78% 88% 96%

It's worth noting that this period of over estimation came at point of very low usage. To avoid overfitting we may want to wait until we have more data during this period using this method in the upcoming months and see if we need to apply this offset.