The field of soccer analytics has lately been grappling with expected goal models, which are algorithms for predicting how many goals a team will score given the characteristics of its shots. But as Simon Gleave likes to point out, what matters in soccer is points, not goals; indeed, Mark Taylor recently showed that teams with the same number of expected goals in a game may obtain different results. As it happens, connecting shots to points via expected goals is no easy task.
The core of the problem is that expected goals are the sum of probabilities, while expected points are the sum of products of probabilities. To understand the difference, start with a simple example. Team A takes one shot per game and has a 50% chance of scoring. Team B takes 50 shots per game, each with a 1% chance of scoring. Their expected goals – the total goals the two teams would score on average, given their scoring rates – are equal at 0.5. But their expected points are not.
Consider Team B. It will score zero goals about 60% of the time; this is just the 99% chance of missing to the 50th power. Team B will score one goal about 30% of the time, which is 99% to the 49th power (for the 49 misses) multiplied by 1% (for the goal scored) and then by 50 (for the number of ways this could happen). That means Team B will score two or more goals – always enough to beat Team A – 10% of the time.
But Team B can also beat Team A by a score of 1-0. This will happen 15% of the time, since Team B scores once 30% of the time and Team A doesn’t score 50% of the time; the chance of a 1-0 win for Team B is just the product of those probabilities. Team A will only win when it scores once and Team B doesn’t score. This will happen half the time that Team B doesn’t score, or 30% of the time overall.
To sum up, Team A has a 30% chance of winning, and Team B’s chance of winning is 10% + 15% = 25%. The draw will occur 45% of the time, so Team A’s expected points are 1.35 to Team B’s 1.2. The reason for this difference, despite the identical expected goals, is that Team B’s expected goals take into account its chances of scoring three, four, five, or more times. Unfortunately for Team B, the only goals that are relevant here are the first two; there are no additional points available for scoring more.
The same is true in the English Premier League. Imagine a team that has a 10% chance of scoring with every shot. For each additional shot, its expected goals will rise by 0.1, no matter what. Yet no team has recently scored six goals and come away with fewer than three points. So, once six goals are virtually assured, any additional shots will not improve expected points at all. The following graph, based on the relationship between points and goals scored from 2009-10 through the present, illustrates this idea:
(Note that the data presented in this graph are independent of the opposition faced.)
Because just a few shots aren’t necessarily enough to grab all three points, the marginal value of a shot rises at first. But as three points become increasing likely, the marginal value of a shot falls essentially to zero. The dropoff is not as steep, however, when looking at shots with a 5% chance of scoring. Indeed, their value changes little for the first 20 shots or so:
Of course, both of these graphs are gross simplifications. In practice, almost every one of a team’s shots has a different chance of scoring. But the graphs do show that situations in games are essential to the link between shots and points. In more rigorous terms, expected points are the chances of scoring multiplied by the chances, conditional on scoring, that goals will be pivotal in determining results.
The marginal value of each shot, measured in expected points, depends greatly on the situation in the game. Some situations have intuitive consequences. Expected points per shot are probably higher the closer in-game goal difference is to zero. They are also likely higher when less time is left in the game, and when the opposition is playing too poorly to answer a goal. For teams that have already scored or conceded six goals, expected points per shot at the margin are close to zero, since the result is all but certain.
These observations may be useful for analysts who just want to predict the outcome of games. But for analysts who evaluate players, the picture is a little different. If a team wins a match 2-1 after trailing 0-1, was its first goal more pivotal than the second? Arguably the first goal was worth one point (for a draw), and the second was worth two more (for the win). If this is the case, then the shots taken at 1-1 were twice as valuable as those taken at 0-1. But perhaps each goal was really worth 1.5 points, since both were needed equally to take home the three points. Then the shots at 0-1 and 1-1 were equally valuable, too.
Should the decision between these two interpretations affect the assessment of players? There may be no reason to believe that scoring was any easier at 0-1 than it was at 1-1. And even at 2-1, the winning team’s shots still had value, since an additional goal would have made it harder for the opposing team to come back. But how should the value of those shots, which didn’t win any points, compare to the values of the shots at 0-1 and 1-1?
An analyst’s answers these questions may depend in large part on his or her objectives. If an analyst evaluating players gives an extremely high value to shots taken at 0-0, players may opt to take bad shots themselves at 0-0 rather than passing to teammates with better chances to score. To make the analyst’s evaluation incentive compatible, some game situations might have to be ignored. By contrast, an analyst trying to predict results might assign a different value to shots at every minute of the game and at every possible score. The same issue arises when deciding which variables to use in calculating expected goals.
I’ve been struck by how the discussion of expected goals models has omitted this consideration of objectives. Analysts with different objectives will not necessarily value shots in the same way, either as expected goals or as expected points. In this case, there may be many truths rather than just one.