clock menu more-arrow no yes mobile

Filed under:

Weighted Expected Success Rate: How the Data Stacks Up

The formula is out - does it work?

Texas Tech v Texas Photo by Tim Warner/Getty Images

As regular readers recall, last week, I unveiled the methodology of another way to think about success in college football. Specifically, I stole from baseball’s Expected Weighted On Base Average to create a metric that considers the change in expected points from any play and the number of successful plays a team runs. Sparing you the gory details (the uninitiated can read about methodology here), I’ll remind you of the basic formula for a weighted expected success rate:

wxSR = sum of (total change in expected points * successful plays)/total plays.

This week, I took a first raw attempt at the data, and I want to unpack that here, briefly. The first element I had to consider was the expected points charts I made; I realized in calculating them, that I had inadvertently calculated a too-broad qualification - that is, using yard line bins really mucked things up. I recalculated the expected points charts, and they look much nicer - expected points increase linearly the closer you are to the opponents’ end zone, and they decrease exponentially with down and distance.

I attached those down and distance expected point values to the individual plays, and to the result of the play. So, every play now had a starting expected point value and an ending expected point value. I had to do some spot checking to make sure these made sense. For example, in the Texas-West Virginia game, the Longhorns had the ball with 5:29 left in the third quarter, down 14-7. Facing first and goal from the five, a college football team in 2017 could expect to score about 4.29 points. Sam Ehlinger threw a spectacular interception returned for a touchdown, a swing of 10 points - the expected positive 4.29 to the actual negative 6. Looks like my numbers for change in expected points make sense.

From there, I then just plugged in the formula for wxSR and sorted some values. My goal for this post is to look at the top ten offenses and defenses according to this system and evaluate this attempt, using offensive and defensive S/P+ as a baseline for measurement.

Top 10 Offenses by wxSR (offensive S/P+ rank):

  1. OU (1)
  2. Arizona (8)
  3. Missouri (13)
  4. UCF(2)
  5. Penn State (10)
  6. Buffalo (53)
  7. SMU(11)
  8. West Virginia (26)
  9. North Texas (28)
  10. USF (27)

Top 10 Defenses by wxSR (Defensive S/P+ rank):

  1. Fresno St (13)
  2. MTSU(47)
  3. Alabama(1)
  4. Nevada(121)
  5. Duke(41)
  6. SDSU(38)
  7. Boise State(30)
  8. Ohio State(8)
  9. Tenn (68)
  10. Notre Dame (27)

So these aren’t great, but also they’re not horrific - it’s at least some kind of positive that generally, good teams are at the top. I would suggest that the degree of separation with this metric is such that decimal places are creating some weirdness here - in fact, the top 50 defenses are all within a hundredth of a point. There are a couple of genuine head-scratchers, though, and it’s hard to be proud of a metric that lauds the Nevada Defense or Buffalo’s Offense.

A Couple of Theories About Why wxSR is Lacking

1. As one commenter suggested last week, garbage time filters favor bad teams: by taking out those plays where one team is dominating, we are certainly filtering out some meaningless information, but perhaps we are rewarding bad teams - they have fewer negative point values and are recorded as “less bad”, artificially inflating their numbers relative to teams that played more challenging schedules and played more meaningful plays, even if they were successful.

2. Opponent adjustment: In this iteration, plays against FCS teams and plays against Alabama count the exact same. Clearly, teams like Buffalo - who were decently successful, given their standing in the college football landscape - aren’t pound for pound as good as Penn State, and so some measure of opponent quality would be important. My immediate thought is to attach the total wxSR of a team to a schedule, and filter it through how hard your schedule was above average - so, a team that played a schedule 25% more efficient than average would receive a 25% boost in wxSR. Rough, but immediately a step in the right direction.

3. Sample Size: the numbers for expected points still look a little noisy, and that is to be expected. The weirdness of a single season carries too much weight; I’ll have to compile the expected points data historically. That in itself raises the question of how far back to go? Ideally, numbers reflect the current recruiting classes and no more - so three years may be a starting point for smoothing out those numbers. As you can see below, the double-U shape of this graph suggests that single-season noise is driving expected points numbers.

The U shape of this graph suggest the noisiness of single-season data.
Expected Points by Yardline, 2017

4. Finally, maybe efficiency doesn’t matter that much if you can’t convert scoring opportunities? Is efficiency a virtue? If you can move the ball into opposing territory but can’t ever score, are you a good offense? The “bend-don’t-break” mentality of good defenses seems to mitigate efficiency measures. While efficiency and expected points can tell us a lot, perhaps the difference between expected points and actual points is more of the direction to go.

I’ll keep tweaking and monitoring these numbers throughout the season, as I’m disappointed with the first run. As always, I’m open to suggestions and feedback. Most importantly, I’m just ready for the season to start. Next week, I’ll have a primer on watching games with an analytical mindset, and then the real fun of actual football on field gets started.