Long time readers will recall a discussion last summer about a stat I called xWSR, weighted expected success rate (I agree, it’s a mouthful, and I need some help in the PR department). My motivation for the stat was to evaluate teams based on the sum of their expected points change across the season. Simple enough, right?
I stole a lot of inspiration from baseball’s xwOBA stat, wherein a statistician has access to data regarding the quality of contact: exit velocity, launch angle, and direction of the hit. Then, putting those together, statisticians can attach an expected run value to each batted ball - a double is worth, on average, something like 1.45 runs, and so if you hit a ball that is a double 85% of the time, but you get out, then you are awarded .85*1.45 runs for that batted ball event, and your xwOBA is 1.2325, whereas your wOBA is 0 and your batting average is 0. Still with me?
I want to translate that to football. I wrote a lot of initial thoughts in the linked piece, but as we are ramping up to the college football season (and we are ramping up, despite it feeling forever away), I want to revisit the idea of valuations by expectation in college football. That starts first with some descriptive work about expected points.
To talk about expected points, we first need to understand the landscape of points in general - how did people score, and from where? First, I had to correct a problem with the data that was a simple yet annoying fix. Here is a histogram from the uncorrected data of the yard line associated with each play from the 2018 season:
Astute readers will notice five separate peaks - one at the 0 yard line, and then one at the 25, 35, 65, and 75 yard lines. The 0 yard line is a coding error - any time there is an end of the period in the data, the yard line is coded as 0, so I’m not worried about those (they’ll get tossed out with the bathwater when I make garbage time adjustments). The other peaks, though, are the most common starting points - touchbacks and kickoffs. The problem is that the yard lines are coded 0-100 for some teams (who would kick off from their own 35, and get a touchback at their own 25), and 100-0 for others (who would kick off from their own 65, and get a touchback at their own 75). I wasn’t aware this was a problem in the data when I started my xwSR work last summer, and so I erroneously calculated expected points incorrectly (i.e. - ideally, a touchdown from the 35 should mean that a team was on its own 35 and went 65 yards for a touchdown, but due to the coding, ‘35’ captured a team going 65 yards for a touchdown, but also 35 yards for a touchdown). Effectively, those numbers were nonsense as a result.
But, fear not - I have amended my errors and present to you the recoded data, so that every team is now on a 0-100 scale. This was, as I sad before, a small problem, but an annoying one, and I have it all ironed out now.
So now, we have yard_norm (yards normalized), and that looks much more like a real picture of where teams started plays. Having clarified that error, now we can move forward with some descriptive work about the distribution of scoring plays across the field.
In 2018, there were 6,198 touchdowns scored by offenses, 51% of which were rushing touchdowns. The average distance of a touchdown play was 19.13 yards, and 653 scoring plays were longer than 50 yards (TCU had 7 of those plays, thank you very much).
In order to calculate an expected points measure, we first have to look at situational scoring. So, first, here are graphs of how many scoring plays happened at each yard line, by down and by period.
What do we learn? Well, predictably, the density of scoring plays gets higher the closer to the end zone (shocking, I know), and other than that, the only real difference tends to be that fourth down scores are much less likely at any point on the field. It doesn’t appear that down and period, on their own, are very informative as to refining our expectation of points for any given play. Overtime (the light blue circles in the second graph) is a different animal, and probably should be excluded for the purposes of calculating expected points - it’s too different, in terms of strategy and stakes.
Digging deeper, let’s see how rushing and passing plays differ.
A little more information here - we see some rushes breaking big from deep inside your own territory, and then it looks like passes are strictly more likely to be scoring plays until teams get inside the ten.
So, in constructing expected points estimates, we have three important components: down, distance, and play type.
Expected Points: A Naive Approach
This week, my first step in reconstructing a better valuation system for teams is the naive expected points approach. Here’s the methodology: for each yard line, down, and distance combination, I’m going to multiply the proportion of scoring plays times six. No field goals, no defensive scores, no special teams. Just offense, plain and simple.
When we talk about a mathematical expectation, we are talking about the expected value of a random variable. Here, scoring play is a random variable, with a binomial distribution. The easiest and most straightforward way to take the expectation is the unconditional mean. Here, that’s just the expected points per play. That is not very informative here, as it doesn’t mean much that the average play generates .8145 points. That’s where the conditional expectation comes in - we don’t care about the average play, we care about the points per play from an average situation. For example, we’d expect much higher probability of scoring from a first and ten inside the opponents’ twenty than we would third and long on your own 2. The conditional expectation will take two forms - first, the more-naive conditional expected points from yard lines, and second, the expected points conditional on down, yard line, and distance.
Now, this simple conditional mean looks pretty useful - aside from a few outliers from small sample sizes and big plays in own territory, this looks about like you’d expect. The issue here is that we are losing information by not breaking this out situationally. So, let’s add in down and distance and see what changes.
Wow, so that got fun. This is a little too crazy, and maybe not a good visual representation of the data. Here’s a little bit cleaner representation, broken out between “short” and “long” distances to go on each play.
This graph shows just plays with > 5 yards to go; there seems to be some substantial heterogeneity (fancy word for differences) in this group, which validates the propriety of conditioning the expected points on down, distance, and yard line.
So, what have we accomplished today? We have corrected the play-by-play data and seen that a conditional expectation of yards per play, based on down, distance, and yard line can be very informative as to differential expected outcomes per play. Quality expected points data is the backbone of any worthwhile comparison of team expected outcomes. The next steps are to incorporate historical data to smooth out some of the noise, then connect those expected points to team outcomes, and then I can start aggregating across teams.