Friday, January 6, 2012

NFL Coach Quality: A Bayesian Approach To Approximating the Value of Coaches - UPDATED

by David Durschlag

You are currently viewing version 1 of this article. To view version two, please click here.

Summary

Evaluating NFL coaches is a difficult task, popular among fans and vitally important to franchises. This is a brief attempt at the task, using purely quantitative data.

Data

The numbers of regular season games each team won each year are treated as data points. No information beyond number of regular season wins was used.

While the "metagame" of the NFL continues to evolve, the data used herein is from 1993 onward, when the last Collective Bargaining Agreement was signed. While victories now come in different environments, they are all under (roughly) the same rules. Data from before this period could be skewed based on the different rules for control of players, so it was excluded.

Also excluded was the performance of any team in a year in which it had multiple head coaches. This was done to ensure that credit for a season was easy to assign.

In total, 107 coaches and 565 team-years of data were used.

Assumptions

• Each coach has a hidden "value". Better coaches have higher value.
• The number of games a team wins can be modeled as a draw from a normal distribution with a mean of the value of their coach, and an unknown variance dubbed "Season Variance". This variance is constant across all seasons for all coaches.
• The value of coaches is normally distributed across the population, with unknown mean and variance.

Process

The above assumptions were encoded as a model for BUGS, which was run for 10,000 iterations, then another 10,000 iterations.

Results

Coach quality had converged after 10,000 iterations.

The posterior distribution for system constants were as follows:

ConstantMeanStandard Deviation
Season Variance0.1370.009
Coach Value Population Mean7.6980.191
Coach Value Population Variance0.6260.207

The posterior distributions for coach values were as follows:

CoachMean ValueStandard Deviation Value
Bill Belichick10.1000.633
Tony Dungy9.9480.671
Mike Tomlin9.4900.920
Bill Cowher9.3500.636
Mike McCarthy9.3440.865
John Harbaugh9.3030.972
Sean Payton9.2450.852
Marty Schottenheimer9.2090.686
Mike Smith9.1970.970
Andy Reid9.1800.656
Mike Holmgren9.1070.606
Wade Phillips9.0510.792
Mike Shanahan8.9700.606
Barry Switzer8.8330.957
Mike Martz8.7630.863
Jimmy Johnson8.7510.892
Mike Sherman8.7460.856
Tom Coughlin8.6260.608
Jeff Fisher8.5780.604
Jack Pardee8.5491.221
Dennis Green8.5070.684
Brian Billick8.5050.738
Lovie Smith8.4660.777
George Seifert8.4120.812
Marv Levy8.3970.905
Rex Ryan8.3801.020
Don Shula8.3751.017
Bill Parcells8.3720.694
Jon Gruden8.3700.692
Steve Mariucci8.2160.778
Bobby Ross8.1480.772
Jim Caldwell8.1051.009
Wayne Fontes8.0940.945
Jim Fassel8.0770.805
Brad Childress8.0660.908
Dick Vermeil8.0580.778
John Fox7.9780.743
Pete Carroll7.9670.935
Al Groh7.9541.210
Ken Whisenhunt7.8760.897
Mike Tice7.8450.944
Jim Mora7.8330.956
Gunther Cunningham7.8011.104
Gary Kubiak7.7680.849
Jason Garrett7.7661.203
Jack Del Rio7.7590.746
Hue Jackson7.7451.201
Tony Sparano7.7340.943
Jim L. Mora7.7310.960
Dave Wannstedt7.7220.698
Dan Reeves7.7050.707
Norv Turner7.7040.634
Marvin Lewis7.6950.748
Nick Saban7.6311.094
Bill Callahan7.6291.099
Joe Gibbs7.6160.946
Mike White7.6071.102
Jim Haslett7.5840.855
Ray Rhodes7.5460.887
Mike Singletary7.4791.092
Mike Mularkey7.4671.103
Art Shell7.4231.016
Todd Haley7.4031.022
Jerry Glanville7.3731.210
Chan Gailey7.3580.968
Tom Cable7.3111.119
Rich Brooks7.3071.100
Josh McDaniels7.1551.102
Dick Jauron7.1510.752
Tom Flores7.1431.092
Buddy Ryan7.1421.096
Steve Spurrier7.1411.099
Lindy Infante7.1391.119
Jim Zorn7.1351.094
Eric Mangini7.1240.906
Vince Tobin7.1070.957
June Jones7.1041.022
Dennis Erickson7.0980.864
Herman Edwards7.0900.787
Butch Davis7.0020.947
Sam Wyche6.9971.030
Scott Linehan6.9831.104
Kevin Gilbride6.9771.232
Jim E. Mora6.9750.958
Richie Petitbon6.9751.231
Joe Bugel6.9671.113
Lane Kiffin6.9621.231
Romeo Crennel6.8700.957
Gregg Williams6.8671.033
Raheem Morris6.8641.035
Ted Marchibroda6.7930.861
Mike Nolan6.7221.025
Dave McGinnis6.7171.030
Chuck Knox6.6541.123
Bruce Coslet6.6170.972
Dom Capers6.5970.789
Mike Ditka6.5731.050
Dave Campo6.5691.032
Dick LeBeau6.4961.124
Mike Riley6.4431.052
Cam Cameron6.3661.263
Dave Shula6.3111.041
Rich Kotite6.2630.995
Chris Palmer6.0111.162
Marty Mornhinweg6.0091.164
Steve Spagnuolo5.8771.072
Rod Marinelli5.8641.071

Conclusions and Analysis

Despite having a small sample size and extremely limited data, the estimated value of coaches agrees closely with conventional wisdom. Value estimates are relatively narrow, ranging from a 66% chance that Jeff Fisher's value is between 7.974 and 9.182 to a 66% chance Cam Cameron's value is between 5.103 and 7.629. Applying Microsoft's ? - 3 * ? method of combining the parameters of a normal distribution for TrueSkill, then renormalizing (so that the unit of value is approximately the win), the following list is obtained:

CoachNormalized Combined Rating
Bill Belichick10.060
Tony Dungy9.870
Bill Cowher9.520
Mike Holmgren9.410
Andy Reid9.360
Mike Shanahan9.310
Marty Schottenheimer9.310
Tom Coughlin9.070
Jeff Fisher9.040
Mike McCarthy9.030
Mike Tomlin9.010
Sean Payton8.980
Wade Phillips8.970
Dennis Green8.820
John Harbaugh8.770
Jon Gruden8.700
Bill Parcells8.700
Brian Billick8.700
Mike Smith8.700
Mike Sherman8.620
Mike Martz8.620
Lovie Smith8.590
Jimmy Johnson8.550
George Seifert8.480
Barry Switzer8.470
Steve Mariucci8.410
Bobby Ross8.380
Norv Turner8.350
John Fox8.320
Dick Vermeil8.300
Marv Levy8.270
Jim Fassel8.260
Dave Wannstedt8.230
Dan Reeves8.200
Jack Del Rio8.150
Marvin Lewis8.110
Brad Childress8.030
Don Shula8.010
Rex Ryan8.010
Wayne Fontes7.970
Gary Kubiak7.940
Ken Whisenhunt7.920
Pete Carroll7.900
Jim Caldwell7.840
Jim Haslett7.800
Mike Tice7.790
Jim Mora7.760
Tony Sparano7.720
Dick Jauron7.710
Ray Rhodes7.700
Jack Pardee7.700
Jim L. Mora7.680
Joe Gibbs7.630
Herman Edwards7.590
Dennis Erickson7.430
Gunther Cunningham7.420
Chan Gailey7.390
Eric Mangini7.360
Art Shell7.340
Nick Saban7.320
Todd Haley7.310
Bill Callahan7.310
Al Groh7.300
Mike White7.290
Vince Tobin7.240
Dom Capers7.240
Ted Marchibroda7.220
Mike Singletary7.220
Butch Davis7.190
Mike Mularkey7.190
Jason Garrett7.180
Hue Jackson7.170
Jim E. Mora7.150
June Jones7.100
Rich Brooks7.080
Romeo Crennel7.070
Tom Cable7.040
Sam Wyche7.010
Tom Flores6.980
Buddy Ryan6.970
Jim Zorn6.970
Josh McDaniels6.970
Steve Spurrier6.960
Lindy Infante6.920
Gregg Williams6.910
Raheem Morris6.900
Jerry Glanville6.890
Bruce Coslet6.860
Scott Linehan6.840
Mike Nolan6.820
Joe Bugel6.810
Dave McGinnis6.810
Dave Campo6.700
Mike Ditka6.660
Mike Riley6.570
Chuck Knox6.560
Richie Petitbon6.560
Kevin Gilbride6.560
Rich Kotite6.560
Lane Kiffin6.550
Dave Shula6.500
Dick LeBeau6.450
Steve Spagnuolo6.120
Rod Marinelli6.110
Cam Cameron6.060
Chris Palmer6.020
Marty Mornhinweg6.020

The top two coaches both spent a lot of their career with transcendent quarterbacks. With the firing of Steve Spagnuolo and Raheem Morris, Romeo Crennel is the lowest ranking active head coach. If Bill Cowher says he'd like to get back into coaching, and inquires as to whether your organization is hiring, you say "yes." Using a similar model based on points for/against might not only produce slightly more accurate information, but could also reveal whether head coaches can be "defense-oriented" or "offence-oriented" and, if so, which ones are which.

Summary

Evaluating NFL coaches is a difficult task, popular among fans and vitally important to franchises. This is a brief attempt at the task, using purely quantitative data. The goal here is dual -- both to increase our understanding of how (and how much) NFL teams are reflective of their coaches, as well as to introduce the dichotomy of frequentist and Bayesian analysis to the NFL statistics community, from which it is largely absent.

Data

The numbers of regular season games each team won each year are treated as data points. No information beyond number of regular season wins was used.

While the "metagame" of the NFL continues to evolve, the data used herein is from 1993 onward, when the last Collective Bargaining Agreement was signed. While victories now come in different environments, they are all under (roughly) the same rules. Data from before this period could be skewed based on the different rules for control of players, so it was excluded.

Also excluded was the performance of any team in a year in which it had multiple head coaches. This was done to ensure that credit for a season was easy to assign.

In total, 107 coaches and 565 team-years of data were used.

Assumptions

• Each coach has a hidden "value". Better coaches have higher value.
• The number of games a team wins can be modeled as a draw from a normal distribution with a mean of the value of their coach, and an unknown standard deviation which is constant across all seasons for all coaches. This will be refered to as "Season Standard Deviation."
• The value of coaches is normally distributed across the population, with unknown mean and standard deviation. These will be refered to as "Population Mean" and "Population Standard Deviation".

Process

The above assumptions were encoded as a model for BUGS, which was run for 10,000 iterations, then another 10,000 iterations.

Results

Coach quality had converged after 10,000 iterations.

The posterior distribution for system constants were as follows. Posterior information for individual coaches is included in the large table in the next section.

ConstantMeanStandard Deviation
Season Standard Deviation2.700.09
Population Mean7.690.19
Population Standard Deviation1.330.18

Conclusions and Analysis

Below is a table with a variety of data about each coach.

• Value Posterior Mean: The mean of the posterior distribution for the coach's value.
• Value Posterior StdDev: The standard deviation of the posterior distribution for the coach's value.
• Triple-Conservative Rating: μ - 3 * σ -- a conservative single-number rating. Microsoft uses this to collapse TrueSkill distributions to single ratings.
• Normalized Triple-Conservative Rating: The triple-conservative rating re-normalized to the same scale as posterior mean coach value.
• Raw Average Wins: The coach's raw number of average wins over the data set. Provided for comparison to the posterior means -- this is one way to gauge the benefit of the Bayesian model over a standard frequentist approach. Because the Bayesian model incorporates the idea of uncertainty, a coach with one season of 14 wins is not considered a 14 win coach. Alternately sorting between this column and that of posterior mean, then judging which list looks better, is a reasonable shortcut for juding the modeling approach taken here.
• 80% Confidence Range: The range of win values the model expects with 80% confidence. In other words, this coach would be expected to win fewer games than the minimum 10% of the time, and more than the maximum 10% of the time. This can be used to gauge the amount of information added by the model -- the narrower these ranges are, the more information was available for judging that coach. The fact that these tend to be very wide indicates that the model cannot make strong predictions -- a result of both the minimal amount of data available and the unpredictability of the NFL. Taking all coaches as a data set together yields an 80% confidence range of 4.27-11.91. Comparing this to the ranges for invidual coaches is a good way to see how much information the model was able to add. In general, you'll find that the range has narrowed only very slightly, but has shifted a win or two. This can be translated roughly as "the uncertainty created by minimal data and the general difficulty in predicting the NFL means it is hard to predict how many games an individual coach's team will win, but we have a pretty good idea who the 'better' coaches are."
CoachValue Posterior MeanValue Posterior StdDevTriple-Conservative RatingNormalized Triple-Conservative RatingRaw Average Wins80% Confidence Range
Al Groh7.951.214.327.3094.03-11.87
Andy Reid9.180.657.219.369.695.81-12.54
Art Shell7.421.014.377.3473.70-11.14
Barry Switzer8.830.955.968.47105.17-12.49
Bill Belichick10.10.638.2010.0610.86.76-13.43
Bill Callahan7.621.094.337.317.53.82-11.43
Bill Cowher9.350.637.449.529.856.00-12.69
Bill Parcells8.370.696.298.708.634.97-11.77
Bobby Ross8.140.775.838.388.374.67-11.62
Brad Childress8.060.905.348.038.44.45-11.67
Brian Billick8.500.736.298.708.885.06-11.94
Bruce Coslet6.610.973.706.865.52.93-10.29
Buddy Ryan7.141.093.856.9763.34-10.94
Butch Davis7.000.944.167.196.253.34-10.65
Cam Cameron6.361.262.576.0612.39-10.33
Chan Gailey7.350.964.457.3973.68-11.03
Chris Palmer6.011.162.526.022.52.14-9.87
Chuck Knox6.651.123.286.564.52.82-10.48
Dan Reeves7.700.705.588.207.74.29-11.11
Dave Campo6.561.033.476.7052.83-10.30
Dave McGinnis6.711.033.626.815.332.98-10.45
Dave Shula6.311.043.186.504.332.56-10.05
Dave Wannstedt7.720.695.628.237.724.31-11.12
Dennis Erickson7.090.864.507.436.663.52-10.66
Dennis Green8.500.686.458.828.815.11-11.89
Dick Jauron7.150.754.897.716.883.69-10.60
Dick LeBeau6.491.123.126.4542.66-10.32
Dick Vermeil8.050.775.728.308.254.57-11.54
Dom Capers6.590.784.237.2463.10-10.09
Don Shula8.371.015.328.019.334.65-12.09
Eric Mangini7.120.904.407.366.63.51-10.73
Gary Kubiak7.760.845.227.947.834.21-11.32
George Seifert8.410.815.978.488.854.89-11.93
Gregg Williams6.861.033.766.915.663.12-10.60
Gunther Cunningham7.801.104.487.4283.99-11.61
Herman Edwards7.090.784.727.596.753.59-10.58
Hue Jackson7.741.204.147.1783.83-11.65
Jack Del Rio7.750.745.528.157.774.30-11.21
Jack Pardee8.541.224.887.70124.62-12.47
Jason Garrett7.761.204.157.1883.85-11.67
Jeff Fisher8.570.606.769.048.815.26-11.88
Jerry Glanville7.371.213.746.8963.45-11.28
Jim Caldwell8.101.005.077.848.664.39-11.82
Jim E. Mora6.970.954.107.156.253.31-10.63
Jim Fassel8.070.805.668.268.284.56-11.58
Jim Haslett7.580.855.017.807.54.02-11.14
Jim L. Mora7.730.964.857.687.754.06-11.39
Jim Mora7.830.954.967.7684.17-11.49
Jim Zorn7.131.093.856.9763.33-10.93
Jimmy Johnson8.750.896.078.559.65.15-12.34
Joe Bugel6.961.113.626.815.53.14-10.78
Joe Gibbs7.610.944.777.637.53.96-11.26
John Fox7.970.745.748.328.114.52-11.42
John Harbaugh9.300.976.388.77115.62-12.98
Jon Gruden8.370.696.298.708.634.97-11.76
Josh McDaniels7.151.103.846.9763.34-10.96
June Jones7.101.024.037.106.333.37-10.83
Ken Whisenhunt7.870.895.187.9284.27-11.47
Kevin Gilbride6.971.233.286.5643.03-10.91
Lane Kiffin6.961.233.266.5543.02-10.89
Lindy Infante7.131.113.786.9263.31-10.96
Lovie Smith8.460.776.138.598.874.98-11.94
Marty Mornhinweg6.001.162.516.022.52.13-9.87
Marty Schottenheimer9.200.687.159.319.755.81-12.60
Marv Levy8.390.905.688.2794.78-12.00
Marvin Lewis7.690.745.458.117.664.24-11.14
Mike Ditka6.571.053.426.6652.81-10.32
Mike Holmgren9.100.607.299.419.55.79-12.41
Mike Martz8.760.866.178.629.55.19-12.33
Mike McCarthy9.340.866.749.0310.55.77-12.91
Mike Mularkey7.461.104.157.1973.65-11.27
Mike Nolan6.721.023.646.825.332.99-10.45
Mike Riley6.441.053.286.574.662.68-10.20
Mike Shanahan8.970.607.159.319.315.65-12.28
Mike Sherman8.740.856.178.629.55.18-12.30
Mike Singletary7.471.094.207.2273.68-11.27
Mike Smith9.190.966.288.7010.755.52-12.87
Mike Tice7.840.945.017.7984.19-11.49
Mike Tomlin9.490.926.729.01115.86-13.11
Mike White7.601.104.307.297.53.79-11.41
Nick Saban7.631.094.347.327.53.83-11.43
Norv Turner7.700.635.808.357.714.36-11.04
Pete Carroll7.960.935.167.908.254.32-11.60
Raheem Morris6.861.033.756.905.663.12-10.60
Ray Rhodes7.540.884.887.707.43.95-11.13
Rex Ryan8.381.025.328.019.334.65-12.10
Rich Brooks7.301.14.007.086.53.50-11.11
Rich Kotite6.260.993.276.564.752.56-9.96
Richie Petitbon6.971.233.286.5643.03-10.91
Rod Marinelli5.861.072.656.113.332.08-9.64
Romeo Crennel6.870.953.997.0763.20-10.53
Sam Wyche6.991.033.907.0163.26-10.73
Scott Linehan6.981.103.676.845.53.17-10.79
Sean Payton9.240.856.688.9810.335.68-12.80
Steve Mariucci8.210.775.888.418.54.73-11.69
Steve Spagnuolo5.871.072.666.123.332.09-9.65
Steve Spurrier7.141.093.846.9663.33-10.94
Ted Marchibroda6.790.864.217.226.163.22-10.35
Todd Haley7.401.024.337.3173.67-11.13
Tom Cable7.311.113.957.046.53.48-11.13
Tom Coughlin8.620.606.809.078.875.31-11.93
Tom Flores7.141.093.866.9863.34-10.94
Tony Dungy9.940.677.939.8710.696.57-13.32
Tony Sparano7.730.944.907.727.754.08-11.38
Vince Tobin7.100.954.237.246.53.44-10.77
Wade Phillips9.050.796.678.979.755.55-12.54
Wayne Fontes8.090.945.257.978.54.44-11.74

One thing to note about this information is that some of it points to possible violations of the model's assumptions. The top two coaches both spent a lot of their career with transcendent quarterbacks, for example. That's ok -- the model will simply be off on coaches for whom the assumptions don't hold as well. This is not about finding an exact cause-and-effect relationship, it's about finding general corrolative information, which means that some coaches will be easier to predict than others.

11 comments:

Anonymous said...

How is this a measure of coaching value? It's a (crude) measure of the quality of teams a given coach has had. The main assumption you are making is that all teams have the same skill level -- clearly false. If you want to attempt to measure the value of a coach, you have to do some kind of analysis of how the team performed just before and just after his tenure. Even that would be pretty severely biased by unaccounted-for personnel changes.

Anonymous said...

How comes Norv Turner is 28th out of 107 ??? AFIR, the guys from FO did some work before the 2008 season; there he is the worst coach in the history of the NFL (rightfully chosen, b/c even w/o numbers the human eye can witness this every given sunday :-)). He lost the most games when entering the 4th Qtr and he had the fewest 4th Qtr comebacks. I doubt he improved in the last few years, b/c he is the same timid coach he always was. And we shall remember he inherited a 14-2 team which could have won some superbowls (i totally agree with Ryan here). Otherwise, i appreciate you work.

Karl, Germany

Boston Chris said...

Tony Dungy and his ultra-conservatism #2? I think what this ended up measuring is which teams won the most. Although I really appreciate the effort.

David Durschlag said...

I'll address the first response later, as it requires a somewhat more complex explanation, but the second is simple -- it's a matter of assumptions. The model assumes the difference between a coach's win count for a season and their actual skill or value can be modeled by Gaussian noise. If you believe the majority of Turner's career took place starting from a dominant team with a brilliant GM, which he gradually destroyed, well, that's not Gaussian noise. In other words, it's the same argument as "Belichick is only the top coach because he got lucky on Tom Brady" -- it may be true, but the model doesn't incorporate that information.

Also note that this is an attempt to evaluate a head coach in a way that considers team composition as part of a head coach's responsibility (which it is -- free agency, the draft, and trades are usually controlled by a team's head coach). I suspect the FO study was an attempt to measure coach value independent of player value -- what a coach actually does in terms of clock management, play design, etc. to take the same group of guys and make them better or worse.

To summarize -- the charges had a lot of pro bowlers under Norv. If you think he had nothing to do with that, then you will disagree with his ranking in the model. You may be right -- it is, after all, just a model.

David Durschlag said...

Boston Chris -- you are correct. This is largely based on which teams won the most. The idea is to attribute those wins to their head coach, since he has final say over the teams. I'll write more about how much information this adds (i.e. how strong predictions based on these rankings are) some point soon. The fact, however, that the model does not measure the things that cause a team to win does not mean it cannot predict which teams (i.e. coaches) will win in the future.

As to Tony Dungy himself -- one could argue as to whether the model applies well to him. To be honest, I think it does. His continuous success on both the Buccs and the Colts may have had extenuating circumstances, but all success involves a bit of luck. That he did it so consistently for so many years indicates that, given the opportunity, he'd probably do it again -- and that is what I set out to measure.

Anonymous said...

Yes the Chargers had many pro bowlers which Turner inherited from Schott (another timid coach, who not coincidentally never won a playoff game, but at least his teams were ready by september). After all, as you said, Turner "had nothing to do with that". All he did was wasting the most talented team of the late 2000´s.

No, Beli-Cheat, as we readers all know, is a great coach (from a pure football standpoint). Year in year out he has more wins than expected by Brians and others great models.

But as i said before, i still appreciate your good work.

Karl, Germany

Anonymous said...

Yes, assuming that a head coach is mostly responsible for drafting/trading/signing is a very poor assumption. Without this assumption the entire article becomes misleading at best.

David Durschlag said...

I hope that version 2 of the article answers some of the comments above, both in terms of the math and the goals of this piece.

Andrew Foland said...

I'll point out that the Bayesian results are 0.946 correlated with simply dividing wins by 16. So for most purposes, the straight-up winning percentage is probably sufficient.

One potential point of interest, is can you form any hypothesis as to what the remaining variation is due to? (By looking at which coaches over and underperform in the Bayesian value relative to the straight-up value, and trying to think about similarities in each set.)

If one is being careful, there is no need for the "season standard deviation". The draw should not be from a normal distribution with an unknown mean winning percentage and standard deviation, it should be from a binomial distribution with an unknown winning percentage. The variance of a binomial distribution is given by its mean, and so the extra parameter is unnecessary.

In fact, from a binomial perspective, it is impossible to get a "season standard deviation" (taking the square root of the variance) that is above 2. The fact that you found a value above 2 is probably the most interesting result of the analysis. It means there is some unknown additional factor (time-series non-invariance being the most likely culprit).

Anonymous said...

Probably need to explain hoe bayesian analysis works....and whatever the microsoft algorithm you used to normalize.

Anonymous said...

And what is BUGS?? I think you need to explain what you are doing better.

Post a Comment

Note: Only a member of this blog may post a comment.