by Andy Steiner
In this analysis I wanted to see if there was any relationship between simply choosing to pass and winning. There have been many similar studies about this before, but I wanted look at it from a slightly different perspective. Brian Burke of this site has written the article “Offenses Run Too Often on 1st Down” which looks at the passing advantage from an expected point (EP) analysis. There seems to be an advantage by simply choosing to pass, but there is some wiggle room to that analysis because we can’t prove that EP is a perfect measure of utility. It is my opinion that it is a very good measure of utility as used in the above mentioned study; in the first and third quarters and when the score is within 10 points (which I think that would be an interesting study for another time if it hasn’t already been done!). I wanted to look at the direct effect on winning. If we can do this then some common arguments like “controlling the clock”, “keeping the opposing quarterback off the field”, and “tiring out the defense” lose a lot of credibility.
Essentially, this study is based on Brian Burke’s passing Expected Points Added (EPA) studies; I wanted to see if the situations where there was a known delta between passing and running (where passing was better) would relate to teams actually winning.
All data is from the play by play data made public on this site. I used all plays from the 2002 through 2009 seasons during the 1st and 3rd quarters, when the score was within 10 points. I included all 1st and 10s (excluded all other first downs) and all 2nd and 10 through and including 2nd and 2. I did not include 2nd and 1 because based on Brian Burke’s “Run Pass Imbalance on 2nd and 3rd Down” study, there is an equal EPA for both the run and the pass. I wanted to see if teams were rewarded for choosing the higher EPA decision, and I predicted that 2nd and 1 would simply add noise.
Normalizing for 2nd down distance
I don’t know if I “normalized” in the true statistic sense, I simply compared each team’s propensity to pass with the league average for that 2nd down and distance situation. This is based on the idea that good teams are in 2nd and short more often than bad teams. The article “Predictability on 2nd and 10”, from this site, made me realize that shorter distance situations cause teams to rush more. I have no idea if this is optimum or not (should teams rush to try and get the easy first down or throw a bomb and try to catch the defense sleeping, I have no idea!). I then chose to focus the study on how you perform relative to your peers. If you pass more than your peers would in a given situation, would you win more than your peers would?
PPAN (Passing Plays Above Normal)
I made a metric called Passing Plays Above Normal (PPAN), which would show how far above normal on a per play basis a given team was above the average passing rate. If a team was in a situation where the league passed 50% of the time, and this team was in that situation 10 times choosing to pass 9 times the PPAN would be 4. This is then added to all the other down distance combinations (including the 1st down set) to arrive at a total PPAN. This value is then divided by the total plays in the data set for that team (in that year only, note that each new year all teams are treated as new teams).
To determine the league average for first down was easy, I only included 1st and 10 so I simply added up the first and 10 passes in the data set divvied by the 1st and 10 total plays. This number came to 43.6%.
For 2nd down plays, I at first used Brian Burke’ “Predictability on 2nd and 10” values directly, but because I wasn’t sure of the exact data set that came from, I decided to reconstruct my own using the same data set used in this analysis. This is shown in the graph below.
When the PPAN and PPAN per play data is compiled for each team-year, the results when plotted against that teams regular season winning percentage are shown below:
The R^2 value is very low, but it did come out to be statistically significant as shown below:
The formula used in the above equation is:
t = r*sqrt((n-2)/(1-r^2))
When comparing that t value to a students’ t table, for 254 degrees of freedom the value comes out to well above 99.9% confidence.
The slope of 67.759 says that for each additional .1 PPAN fraction (10% above league average), that team can expect a 6.8% higher season winning percentage.
Potential shortcomings that I could see:
Teams that are good at passing pass more. At best this study shows that teams that have good passing systems (quarterback, receivers, good passing routes and schemes, etc...) win more. It doesn’t really say that any given team should start passing more and expect to win more. I would be very interested if somebody could think up a way that could be done, to only show the effect of choosing to pass separated from the effect of being good at passing.
There is no normalization for plays near the goal line.
One potential concern that I have was that the optimum PPAN fraction is near the middle, so any linear trend line is simply just fitting the endpoints together. I tested this by fitting a quadratic trend line to the data as well. The parabola sloped upward for the entire range of the data. This, if it were the proper fit, would suggest that teams near the low end of PPAN fraction get minimal return but teams at the high end get very high returns for increasing PPAN fraction. I decided that the linear fit was the most accurate.
I was not able to figure out an easy way to weed out the playoffs, so each season has 11 additional games played by 12 good teams. It seems that any effect in playcalling or skewed team matchups would be minimized simply by the fact that there are only 11 games of these each season. Note that the PPAN fraction includes the playoffs but the actual winning percentage does not.
Thanks for reading!
Friday, October 1, 2010
by Andy Steiner