Wednesday, April 14, 2021

Basketball Predictions Overview

There is a variety of methods to predict the outcome of basketball games. In most cases, models attempt to forecast the expected margin by which a team will win (score spread), and subsequently the winner. That’s usually based on past performance data and proxy variables that measure the latent skills or strengths of each team. In this article we will analyze such data in Microsoft Excel, make simple calculations to predict future winners and scores, and evaluate the accuracy of such predictions in three major basketball leagues.


Winning Probability

The winning probability is the ultimate measure of the winning potential of a team. There are different ways to directly compute winning probabilities using past data. The most elemental approach is to reconcile historical wins between two opposing teams (head-to-head). Let’s see an example.

The table below shows the games played between Real Madrid (playing home) and Barcelona (away) in the Spanish ACB league since 2004. Real Madrid won 8 out of 15 games (53%) and lost the other 7 games (47%). The historical records show there is approximately a 50-50 chance for either team to win. However, the data spans over a long period and may not reflect the current situation.

Let’s have a look at more recent data. The table below shows the stats for the Spanish ACB basketball league in 2018-2019. It is a simple Excel pivot table with calculated fields to show win/lose percentage (W/L), and the average number of points scored (GF) and allowed (GA) for teams playing home and away. The source data used in this example and basketball results for other leagues can be found here: Basketball Scores in Excel


During 2018-2019, Real Madrid won 88% of the 17 games played at home and Barcelona 71% playing away. Based on that data, we can calculate the outcome probability for Real Madrid playing home against Barcelona as the simple average of win/lose opposed probabilities, i.e. the winning probability of Real Madrid playing home (88%) and the losing probability of Barcelona away (29%).

Real Madrid win probability = (88% × 17 + 29% × 17) / 34 = 58.5%

Real Madrid seems to have a slightly better chance to win; the result is not too different to what we’ve seen with head-to-head stats. We have done the same calculations for all games in recent seasons (pre-covid) for three major basketball leagues. Expecting the team with a higher probability to win, we observed the predictions were only right in 52-65% of the games (lower end for NBA, upper for ACB and BBL). That suggests that predictions against winning probabilities based on previous results are just slightly better than pure chance.

 

Point Estimates

We can use the number of points scored and allowed over a period of time to predict the expected score, and subsequently the winner. It is practical to reconcile the average number of points per game as the basis for comparison (advanced models rather use points per 100 possessions – see later). Let’s predict the expected score for Real Madrid playing home against Barcelona. Real Madrid scored an average of 91.2 points per game and allowed 78.5 when playing home in 2018-2019 (see the table above). Barcelona scored 83.7 points per game and allowed 76.9 when playing away. We can now calculate the expected number of points as the arithmetic mean of the average number of points scored and allowed by each team during 2018-2019 as follows:

 

Real Madrid’s points = (91.2 × 17 + 76.9 × 17) / 34 = 84.1 points

Barcelona’s points = (83.7 × 17 + 78.5 × 17) / 34 = 81.1 points

 

We can also weight the average against the overall league points per game scored/allowed. The overall league points scored and allowed in 2018-2019 was in average 83.3 and 80.2 respectively. We can now use the following equation to calculate the expected number of points for Real Madrid and Barcelona:


Real Madrid’s home points = (91.2 × 76.9) / 83.8 = 83.7 points

Barcelona’s away points = (83.7 × 78.5) / 80.2 = 81.9 points

 

According to the estimates, Real Madrid seems more likely to win, but just by 2 or 3 points difference (score spread). The total number of points expected is around 165. This may help making better and informed decisions around win/lose and total points over/under betting lines, but can we really trust these simple estimates?

We have performed the same calculations described above for all games in recent seasons (pre-covid) for three major basketball leagues, using the previous season’s data to forecast the following one. We observed that the estimated point spread (using either of the two methods explained) predicted the winning team correctly for 62-70% of the games. This indicates that point estimates are fairly better indicators than win/lose percentage to forecast the winner.

However, the estimated number of points (home, away, and total) was found to poorly correlate with the actual scores. The scatter charts below show home, away, and total points estimates in the x-axis (using 2017-2018 data) against the actual number of points (y-axis) for all games in the Spanish ACB league in 2018-2019. The coefficient of correlation (R) between the two variables is 0.37 for home points, 0.23 for away points, and 0.16 for total points.

 

The estimated score spread figures correlate better though, which can be observed in the scatter chart below. The coefficient of correlation (R) for score spread estimates (using 2017-2018 data) against the actual scores in the Spanish ACB 2018-2019 is around 0.46.  


There is yet another simple method often used to predict the score spread that uses the difference between points per game scored and allowed for each team as the basis for comparison (known as point differential). Let’s calculate the point differential for Real Madrid playing home and Barcelona playing away in 2018-2019:

Point Differential Real Madrid (home) = 91.2 – 78.5 = 12.7

Point Differential Barcelona (away) = 83.7 – 76.9 = 6.8

Score Spread Estimate = 12.7 – 6.8 = 5.9

The score spread estimate using point differential is 5.9, which is about twice the previous estimate in this particular example. Point differential predictions rendered similar results to those presented earlier. The score spread estimates correlated rather poorly with actual spreads, the coefficient of correlation with results of the Spanish ACB 2018-2019 is 0.47.

These basic score spread estimates are not sufficient to make accurate predictions. A score spread of 2 or 3 points is too narrow to be confident of the outcome. It is very difficult to predict the exact spread, and specially the exact number of points scored by each team. Therefore, a range of points or confidence interval within which we expect the final score to fall, is more convenient and generally given instead.

 

Confidence Intervals

A confidence interval is a range of values within which we expect the population parameter to fall (in this case, the number of points scored). It is usually a more accurate representation of reality than just a point estimate. In basketball, the number of points scored is normally distributed (different to what we’ve seen earlier for football goals that follow a Poisson distribution – see this other article: Football Predictions with Poisson Distribution). The histogram below shows the frequency (number of games) for each interval of total points scored in the Spanish ACB league 2018-2019 season, which is normally distributed with a mean of 163.9 points. Therefore, we can use the standard deviation of the estimates to get a range of points for a given confidence level.

 

The table below shows additional stats for teams in the Spanish ACB league playing home and away during 2018-2019 including the average (Avg), standard deviation (SD), minimum (Min) and maximum (Max) values for points scored and allowed by each team for the number of games played (GP). That can easily be calculated using the corresponding Excel built-in functions. When applying filters though, the SUBTOTAL function is used instead. For example, for values within column D that would be:

=SUBTOTAL(1, D:D)                         ‘for average (arithmetic mean)

=SUBTOTAL(7, D:D)                         ‘for standard deviation

=SUBTOTAL(5, D:D)                         ‘for minimum value

=SUBTOTAL(4, D:D)                         ‘for maximum value

 

But we have actually done that using a VBA macro that filters the data for each team and calculates the respective statistics mentioned above.


Now we can calculate the confidence interval for points scored or allowed by a given team as the average of points (Avg) +/- the variation of the estimate, which consists of two components:

  1. The standard error (SE), which is the standard deviation (SD) divided by the squared root of the sample size n (number of games played).
  2. The reliability factor, which is the normal distributed z-score for a/2, where 1-a is the confidence level.


Let’s see how to get the confidence interval of expected points scored by Real Madrid playing home in 2018-2019 for a 95% confidence level, which means that the true population score would fall into the specified interval in 95% of the cases. The average number of points scored (sample mean) is 91.2 (see table above). The standard deviation is 8.9, and the sample size is 17. The standard error of the mean would be 2.16 (see below equation).


For a confidence level of 95%, a is 0.05 (1-0.95). Then we can get the critical value of Z0.025 looking up the value 0.975 (1-a/2) in the normal distribution table or simply using the Excel NORM.S.INV built-in function. The critical value or z-score is 1.96. The 95% confidence interval of points scored by Real Madrid can then be computed as follows:


Now we can probably be more confident saying that Real Madrid is expected to score in average between 87 and 95 points playing home in the Spanish ACB league, than just saying it will score around 91 points.

 

Advanced Models

The number of points scored is the ultimate measure of the offensive performance of a team. On the other hand, the number of points allowed is an indicator of defensive performance. Advanced models often use the number of points per 100 possessions to determine the offensive and defensive efficiency. The concept of possession becomes key as the per-possession efficiency measurement provides a rating or index that can be used for direct comparison.

For example, the NBA Basketball Power Index (BPI) is a measure of team strength (both offensive and defensive strength combined) that represents how many points above or below average a team is. BPI accounts for game-by-game efficiency, strength of schedule, pace, days of rest, game location and preseason expectations. Thus, BPI is constantly being updated to reflect the current form of each team and is considered the best predictor of performance going forward.

Most advanced models are successful to predict the score spread and expected winner. It is more challenging though to get the exact score in advance. In the next article, we will see how that can be improved when making score predictions in live basketball games.

 

No comments:

Post a Comment

Popular Posts