clock menu more-arrow no yes mobile

Filed under:

What Do We Talk About When We Talk About Statistics?

Let's talk numbers. Or not.

This guy's looking for better stats too
This guy's looking for better stats too
Kevin Liles-USA TODAY Sports

Americans have always loved to quantify things, almost more so than any other nation. We know the size of things, their size relative to similar things, their ranking among the world's similar things, all that stuff. And we don't just know it, we care about it - box office ratings, music charts, company sizes/annual returns, you name it, it's all rated, ranked, and quantified to within an inch of its life.

This obsession with quantifying has long been a part of sports. Baseball's statistics are the most famous example of this, due to the relative lack of major rule changes in baseball throughout its history and the resulting consistent-throughout-time data set which that rule stasis has produced, but every sport keeps them - and in this day and age, every sport publishes them - so we as fans can obsess over them, and pretend we're as smart as the people who actually do sports for a living when it comes to building a team.

While boxscores and rudimentary stats have always been around, the modern depth of sports numberism is relatively recent - thanks to the internet, it's pretty easy now to pull up some amazingly advanced stats for not just the biggest teams in the biggest leagues, but for lower-league teams in fringe sports in small countries around the world. You want Bangladeshi cricket info and stats? BOOM! You want hurling results? ZAP! and so on. Seemingly every sport has a fantasy game now, and every piece of information that can be found about that sport and its players is used by people to inform their fantasy-sports decisions and strategies.

Before I go on, I feel like I need to clarify - this is by no means a bad thing. But we happen to be fans of a sport that has, up to now, been somewhat resistant to quantification - not just culturally, but in actual practice as well. It's hard to quantify something as fluid and motive as soccer - there are very few discrete events to which we can pin statistical values, and it's really hard to be predictive or to be able to analyze over time with the numbers at our disposal. None of this is new information; we all know this, and have known it for a long time.

So this is the point I'm trying to get to. The human desire to quantify everything has made for a headlong rush to quantify soccer, without fully knowing what it is that is being (or needs to be) quantified. SBN's own Graham MacAree has done some outstanding work in this area, to mostly debunk myths around how statistics are used.

His point, and the point I'm reiterating here, is not that statistics are unnecessary - in fact, they're really useful and a helpful shorthand to accompany any game narrative. The problem with soccer statistics right now is their newness - people who are conditioned to know their fantasy running back's yards per carry and their fantasy utility guy's OBP are so desperate to fill the void and have soccer statistics that a lot of times, they take what's there and run with it to a completely unproductive degree, because "hey we have something!", forgetting that statistics have to have some sort of relevance for them to be of use.

Graham's linked piece above is one excellent example of this, but I want to talk about two others:

1. Heat Maps

Can somebody please, please tell me what the benefit is of looking at a heat map is? Here's a heat map of Mesut Özil's game against Sunderland:
1379170581604_lc_galleryimage_image001_png_jpg_medium

via i.dailymail.co.uk

So, those blobs sure are....I can see that this tells me...after reading this I understand...what again? It tells me on what patches of ground Özil was standing at any given point in the match, and in aggregate, where he spent most of his time during the match. Which is definitely...information, I guess, but it's not information that's really useful in any way. Why? Because it lacks any sort of context.

So he spent a lot of time out there in that peanut on the left. That's great! But...what did he do when he was standing there? Did he get or make lots of passes? Turnovers? Anything? Why was he standing there, and had he stood somewhere else would he have scored, or created, more goals? We'll never know, these are just splodges on a green background, and yet people discuss them as if they had some sort of narrative value. They don't. Maps that show completed and incomplete passes are more useful, because they at least put actions into a spatial context (player X is good at passing out of the back, etc), but heat maps? They're pretty much useless. Ignore them.

EDIT 9/25/13: Per Graham MacAree, as Ted mentions in the comments, it has been made clear that this is not what heat maps show.  Heat maps show only where players had touches during a match; they do not show where a player stood throughout the match. This changes my thinking in ways I do not have time to contemplate or elaborate on now, but will do in a post soon.

2. Misuse Of The "Key Pass" Metric

Opta defines "Key Pass" as "The final pass or pass-cum-shot leading to the recipient of the ball having an attempt at goal without scoring". Which, fine, that makes sense, right? Sure...but. What is tending to happen these days is that, now that there's a definition, there is an overabundance of weight being put on a "key pass", as if that were the single thing, the secret, the holy grail that would unlock all mysteries about How To Be Good At Being A Soccer Team.

The problem with that is, that's not the case. The main problem with "Key Pass" as a metric is the lack of weight it assigns to Key Passes - in Opta's world, a key pass is a key pass, as long as it fits the above definition it's labeled "key". But in the real world, that's clearly not the case; if a pass is delivered sideways 5 yards goalside of the midfield line, and the recipient of the pass then has a belt from 30 yards out that goes 5 yards wide of goal into row R, Opta counts that as a key pass. If a pass is delivered from five yards behind the 18 yard box to a player just entering the area who's in full flight, who then takes two strides and hammers the ball into the net past a helpless keeper, Opta counts that as a key pass. Which key pass would you rather see on a regular basis?

To be fair, Opta does provide geospatial data for all its events, so you can see where a player makes said pass and then hopefully assign a weight to passes (maybe by zone of the pitch or some other mechanism), and thus can determine how key "key" actually is; trouble is, no media outlet takes the time to pair up the key pass stat with the geospatial element to get a true picture of how key a pass really is. It is a significant, labor-intensive exercise to do this, but without it, key passes don't really tell us all that much more than "completed passes" does.

So what's my point here? It's important to make clear that I'm not suggesting abandoning Key Pass, or possession, or any other Opta stat (I am, however, strongly suggesting you abandon heat maps). What I'm hoping is that, as you read more articles that refer to these concepts and metrics, that you don't just accept them at face value. I'm hoping that you challenge them, and think about them, and maybe even evolve them and make them better.

I've been very public about my disdain for crowdsourcing; I think that crowdsourcing is one of the most misunderstood, misused concepts in modern discourse (spoiler alert: crowdsourcing doesn't actually mean "ask a bunch of people and you'll get the right answer in aggregate"). However, in something like trying to define and refine soccer statistics, crowdsourcing is a great idea - someone puts a metric out there like "key passes", then a whole bunch of super-smart people take the time to unpack it, figure out where it falls short, and make it better.

Soccer stats are young enough that a crowdsourcing effort can only help; I'm not arrogant enough to say "THIS IS MY CALL IN LIFE YOU SHALL GO FORTH AND STATISTICIZE THINGS", or even to use this space as a call to arms, driving the TSF army to revolutionize the game (although that'd be awesome!), because to be honest, my relationship with the ability to develop and advance the knowledge of statistics is much like a blind person's relationship with an unfamiliar space. I'm nobody's Bill James, in other words; I'm great at consuming stats and finding where they fall short, but I'm not so bueno at creating or improving them.

So, given my shortcomings in that arena, if I can do anything to call attention to where said soccer stats are lacking, and point people towards what I see as the holes in them in an effort to get people who are way smarter than me to refine these stats and make them more useful, then I will do that as much as I can.