Friday, December 19, 2008

Point Total Analysis with Singular Vector Decomposition

by Oberon Faelord

Any football fan can tell you how well a team did in a game. If they scored more points than the other guys, they won. Great.

That's really only the surface though. There are at least three follow-up questions, which are all a lot harder:

1) Who really played better? Remember, football involves luck and circumstance. Did the winning team really "deserve" it? Did they really play better than their opponents?

2) Who will play better next week? Even things that aren't luck (say, causing a fumble) may not be a repeatable skill. Even worse, everything about a team's performance is in the context of their opponent. Just because a team was mediocre, it doesn't mean they won't be great next week (against substantially worse competition).

3) Why did a team win? Defense? Offense? Running game? Pass rush?

Questions 1 and 2 have been heavily researched, but question 3 has not. Why? Because it's really, really hard. This will be the very first step of an attempt to answer that question.

First off, the challenges. Any statistical analysis is only good as its input data, and the input data for this analysis is going to be...bad. Really bad, in truth. The only input is the final scores from a season of games. No play-by-play, no yardages, no information about when a game happened.

Another challenge of the approach in this analysis is that it provides numbers, but no names. For example, Miami had an Offensive Factor 1 of 4.03 in 2006. What's "Offensive Factor 1"? Good question.

So, with those caveats, what is the approach of this analysis? It's based on Singular Vector Decomposition. More specifically, it's based on the gradient descent SVD-alike solver described at http://sifter.org/~simon/Journal/20061211.html . The inputs are the scores for each game. The two sets are offenses and defenses (as opposed to movies and users, as used in that article). As an example, lets look at the 2007 numbers.

The gradient descent solver has two numbers that affect its accuracy. The first is the number of "features" being calculated. A feature is an aspect of an offense or defense. They each apply to both offenses and defenses. Each score is calculated by going through each feature, multiplying the offense's value by the defense's, then adding all these numbers up. One effect of this is that high numbers are good for offenses, while low numbers are good for defenses. One aspect of the algorithm is that each feature is less important than the previous one, so there are diminishing returns in raising the number.

The second important number is how many refinements are made to each feature before it's considered "done". Once again, this has diminishing returns, as the numbers converge over time. For this analysis, 25 features were used, and 2500 refinements were made to each.

So what are the numbers? First, let's look at what adding additional features does. These are the average amount the algorithm is off on each score after each feature is added:

6.867
5.848
4.864
4.029
3.299
2.790
2.341
1.991
1.645
1.334
1.091
0.896
0.781
0.670
0.579
0.489
0.391
0.341
0.305
0.267
0.226
0.205
0.178
0.165
0.161

To put this in perspective, Feature 1 is accurate enough to get each score of each game within a touchdown. By the time the last feature is added, the formula is less than a 5th of a point off, on average. So what does this formula look like? Here are the first (and most important) 5 features (these alone will get within 3.3 points on average):



















































1







2







3







4







5








offdefoffdefoffdefoffdefoffdef
ARI4.635.54-0.020.53-0.22-0.79-0.330.313.32-2.14
ATL3.755.26-0.791.743.38-1.56-4.930.74-0.050.82
BAL3.645.282.38-1.05-0.98-0.18-0.82-4.31-1.410.3
BUF3.054.220-0.950.14-2.82-0.21.0410.56-0.7
CAR3.744.48-5.14-1.130.551.352.541.1-0.61-2.25
CHI4.694.750.195.53-1.35-0.81-0.45-0.0410.71
CIN5.346.136.980.19-2.111.27-0.01-4.510.421.36
CLE5.555.753.790.542.85-0.7-0.37-2.624.79-1.52
DAL6.464.641.57-0.860.5-2.62-0.550.151.010.69
DEN4.265.452.160.23-0.42-6.253.20.221.2-0.2
DET5.385.840.62-9.94-1.621.9-7.67-0.550.281.78
GB6.124.060.30.082.130.28-1.46-0.941.05-0.05
HOU4.634.89-2.47-0.9-0.412.011.340.710.17-0.51
IND6.173.38-1.431.18-0.590.53-0.080.67-1.16-1.2
JAX5.924.14-7.890.270.79-4.170.18-1.250.910.22
KC3.083.74-0.210.651.540.21-1.53-0.40.335.43
MIA3.835.33-0.021.81-1.512.33-1.566.18-1.87-0.33
MIN4.934.181.041.092.04-1.244.120.8-1.592.41
NE7.323.54-1.870.14-0.580.062.21-9.340.35-0.59
NO4.875.880.421.65-2.620.9-7.261.45-0.250.99
NYG5.134.280.158.87-4.262.44-0.071.51-3.110.74
NYJ3.535.040.350.833.96-2.75-0.04-0.99-0.55-0.25
OAK3.955.25-0.16-1.51-0.330.862.11-0.69-0.740.1
PHI5.184.08-1.540.171.4-0.69-0.930.163.480.29
PIT4.753.86-10.29-0.02-0.672.41-0.134.740.69-0.33
SD5.53.62-1.9701.176.60.69-0.63-0.01-3.48
SF2.464.960.33-1.45-0.25-0.010.99-0.06-1.3-1.37
SS5.024.92-0.480.03-8.713.720.16-2.040.34-0.62
STL3.455.617.16-1.291.120.082.030.09-1.550.64
TB3.914.130.15-1.22-2.07-0.242.740.64-0.26-2.07
TEN3.633.921.450.940.69-0.213.620.621.2411.37
WAS4.44.23-0.36-10.47-1.23-0.280.51.984.090.09


That first column is the team. Each feature is numbered, with the offense/defense numbers pairs off for each team. So Atlanta's Defensive Feature 3 is -1.56, for example.

So is this a real pattern? Absolutely. With 25 features from 2002-2007, the average error per game-score is:

2002: 0.167
2003: 0.155
2004: 0.166
2005: 0.152
2006: 0.164
2007: 0.161

So, what's left do to?

1) Get better inputs. Final game scores are interesting, but who knows what factors affect them. This ties into the other questions in the introduction to this analysis: numbers that better explain what happened, and what *will* happen, would be more useful. However, because individual matchups are taken into account by this algorithm, it's key that each number be per-game, and not adjusted for opponent.

2) Use a better implementation of the algorithm. A number of refinements suggested by the NetFlix researcher (cutting off out-of-range values and non-linear per-feature output being the most obvious).

3) Figure out what these features are. In theory, they should correlate with something real and observable about each team. It's likely that this would be more obvious with better numbers (produced by improvements 1 and 2).

Full inputs, outputs, and code available upon request.

2 comments:

Anonymous said...

I was fascinated by this, but it raised a bunch of questions. Is the probability distribution of football scores Gaussian ? Should the calculated features actually correspond to any real-world statistics ? (Could that be determined by regression analysis)? Isn't it probable that a given feature would actually be a blend of several real-world statistics ? Are the "feature" values for a given team relatively consistent over the course of the season ? In other words, if I calculate the value of Offensive Feature 1 for AZ for weeks 1-12 in 2007, will it be "reasonably close" to the value I'd get if I calculated it for weeks 1-14 in 2007 ?

TK said...

This is awesome! Would love to get the full inputs and outputs and also the code.

Post a Comment

Note: Only a member of this blog may post a comment.