clock menu more-arrow no yes mobile

Filed under:

Why expected goals is important

A primer on the sport’s most popular advanced stat.

Arsenal v Leicester City - Premier League Photo by Shaun Botterill/Getty Images

Whatever your opinion of it, expected goals is pushing its way into the mainstream more than any other advanced statistic in soccer. Just this past Thursday, for example, the BBC published a lengthy article online explaining it.

So, what should we make of expected goals?

Like any metric or piece of information, quantitative or qualitative, expected goals is not without flaws. That does not mean, however, that it is worth discrediting. In a sport where acceptance and understanding of data and advanced statistics still has a ton of room to grow, expected goals is currently the best we’ve got.

For reference, here is the expected goals data from Arsenal’s 4-3 win over Leicester City, courtesy of Michael Caley, a writer for FiveThirtyEight (and former Cartilage Free Captain blogger):

Based on the game’s chances, Arsenal were expected to score 2.4 goals and concede 1.2. These maps are helpful in digesting what just took place on the pitch, and providing a quick snapshot as to who controlled the game, and who deserved to win.

Expected goals makes sense

You may have seen this image pop up on your Twitter timeline in the last year:

This neatly sums up why people should, at the very least, be open to expected goals as a useful metric. While the image is purposefully simplistic, it shows that expected goals, in theory, makes a lot of sense.

Michael Bertin, in a very worthwhile critique of expected goals back in 2015 for Deadspin, offered a similarly helpful summary of the logic behind the statistic:

The basic intuition behind it - that some shots are more likely to go in than others - is super easy to understand...If your actual goals - what a player or team scored - is greater than expected goals, then you are over-performing, or efficient, or something...Even the biggest numerophobe would have a hard time poo-pooing something that tries to measure the one stat that matters in soccer (namely how many did you score?).

In other words, you care about expected goals, even if you didn’t know it already.

It’s Better Than Other Statistics

Furthermore, valuing expected goals over other statistics to determine the quality of chances makes intuitive sense as well. Total Shot Ratio, the ratio of your team’s shot attempts (even off-target ones) versus the other team’s shot attempts, for example, isn’t as telling as expected goals. Blogger and soccer analytics consultant 11tegen11 spelled out where Total Shot Ratio (TSR) falls short a few years ago: “The problem with TSR is that it all treats shots equal, which does not fit the fluency of football, where shots are not equal. Shots may come through a crowd of defenders from 40 yards out, or from the penalty spot in optimal circumstances...both influence TSR equally.”

11tegen11 also found that expected goals was more strongly correlated to results and more repeatable year-to-year than Total Shots Ratio.

TSR has its value for analysis at the team level. The more shots you take, the more chances you have to score. A shot also means your team has the ball, meaning that TSR can be a decent proxy for meaningful possession. But TSR can’t estimate the quality of chances, which is especially important in a sport like soccer where there are very few shots. In higher event sports like hockey, TSR (a.k.a Corsi in hockey) is more strongly correlated with future success.

The same goes for conversion rate, a popular stat for evaluating individual attacking players. In 2013, Statsbomb writer Constantinos Chappas wrote, “If a striker has a 25% conversion rate, that does not mean he is a better finisher compared to someone with a 20% conversion rate. Perhaps his chances were from more favourable positions compared to the other striker’s chances.”

This is not say conversion rate or TSR are meaningless statistics. Everything, whether it’s a stat or an interesting anecdote, tells us something. Expected goals happens to provide us with a better method to judge shot quality than other metrics, and the logic behind it is quite easy to understand.

Expected goals is not perfect

The expected goals value for any single shot is based on a variety of inputs. They may vary depending on the model, but here are the factors Caley used when he originally outlined his methodology in 2014:

Courtesy of Michael Caley/Cartilage Free Captain

While the inclusion of these elements all make sense, expected goals is, unsurprisingly, not able to capture everything. Caley outlined a few of expected goals’ key issues in a subsequent post in 2015, including the fact that expected goals can’t naturally account for where the defenders were positioned for each chance. “This is a big one. We simply do not have the data. Opta tracks ball actions, not player locations. And we know simply from experience that if you are well-defended and there are players between you and the goal, you are much less likely to score than if you’re free one-on-one with the keeper.”

Caley also noted that expected goals does not factor in who is shooting. Expected goals would calculate the same chance from Alexis Sanchez and Per Mertesacker equally, even though we know Alexis is a much better finisher and is more likely to convert. Another problem with expected goals is that it only computes values for shots. So, for example, if Arsenal found itself in a great position to score but made an extra pass rather than shooting (unrealistic, I know), we would not have expected goals data for what was still an excellent chance.

Bertin, in his Deadspin piece, built multiple models of his own that varied in complexity, and found all of them to be poor predictively. Despite that however, Bertin was not willing to write off expected goals because so many of the inputs used to create his models still had value:

Some of my coefficients absolutely contain information that could give a team an advantage. In other words, I can put a number on which factors have a sizable impact on the chances a shot goes in, and many of them are extremely statistically significant...Even just numerical clues as to where to start looking for an edge can ultimately lead to points on the table.

Models are not supposed to be perfect. As Bertin alludes to in his piece, a perfect or near perfect model should be a sign that something isn’t right. Consider this tweet from hockey analytics guru Garret Hohl:

That last sentence is the critical one: “The only question of interest is “is the model illuminating and useful?”

By using and developing expected goals, we can quantify how important it is to shoot closer to the goal, or how through balls lead to better chances and more goals than cutbacks, and so on. This is all crucial information that enhances our understanding of the game and our ability to analyze it, even if the model as a whole isn’t perfect.

Expected goals is improving and will continue to do so

It is worth recognizing that expected goals will get better as more data becomes available. People like Michael Caley will have time to build better models. Our use and grasp of it, like anything else, will evolve over time. Caley’s model is a perfect example of this. In October 2015, Caley introduced a model that notably, used a larger sample of shots and had new indicators to better estimate defensive positioning and individual player skill. Unsurprisingly, Caley’s model became more predictive. Relative to points, goal ratio, Total Shot Ratio, Shots on Target Ratio, and Shots on Target Difference, Caley concluded “that expected goals does the best at predicting future goal difference and future points total.”

Expected goals is not the answer to everything. Nothing is. And critiquing existing work is the right idea. But by using expected goals, we can establish how good a chance really was and how lucky or unlucky a team was, rather than debate it subjectively. At it’s very core, that is what expected goals is trying to accomplish. It is adding objective information that should enhance our ability to accurately discuss outcomes while also being the best metric we currently have to predict future results. While it would be irresponsible to blindly trust expected goals, it would be similarly unwise to discount it all together or deny its capacity and potential to enhance our knowledge of the game.