This contribution comes from the desk of Dan Schlauch.
The importance of an effective kicking game in the NFL is undeniable. Kickers are regularly the highest scoring players on a team and are repeatedly asked to perform in key situations of high stakes. The difference between the best performing kicker (Janikowski) and the worst performing kicker (Brown) in 2009 totaled over 35 points, which would likely make the difference in several games. Additionally, the mean salary of kickers was the lowest in the NFL at just over $1.5 million.
However, despite these considerations, this analysis shows that kickers receive a disproportionately high percentage of team salary and that money spent on kickers has a startlingly low return on investment.
The database of information used in this study was collected from various play-by-play sources using a java parsing tool. The data covers detailed information on every NFL play in the modern era (since 2000). This totals 2654 games and 429,000 plays.
The purpose of this study is to evaluate the production of kickers relative to their salary. We should be calculating the marginal benefit of each marginal dollar spent on kicking. Kickers receiving more than average salary should produce more than average and vice versa. Our goal is to evaluate the degree to which this is true. In order to evaluate a kicker’s performance relative to a standard benchmark (in this case we use the expected value of average performance) we need to perform the non-trivial task of determining an expected kicker performance. Common kicker metrics, such as percentage and points, are effectively useless for this study. The degree to which sampling error of degree of difficulty influences a kicker’s success rate is enormous. A complete study needs to evaluate every kick performed by every kicker over the past decade and compare it to the expectation of success derived from the collection of all NFL kicks based on a reasonable number of measureable variables.
First, we must build the expectation database. From here we can begin the plotting of field goal success versus the most dominant variable- distance. At each yard line, success rates over the past 10 years were evaluated, and a local regression was applied to the resulting figures.
For each yardage, we can calculate an expectation of each kick in the NFL based on the distance to the end zone. Each kick has an Expected Success (ES) between 0 and 1, and a binary result, 0 (miss) or 1 (good). For example, a kick from the 30 (47 yard FG), has an ES of .632. If the kick is good, the kicker will earn +.368 against his ES, or +.368 Expected Success Added (ESA). Conversely, he will earn -.632 ESA for a miss. The obvious benefit of this system is the low-bias, situation specific evaluation of kicker production that normalizes for kick difficulty. Above average kickers will have positive ESA’s and below average kickers will have negative ESA’s. By comparing the ES with every kick of a particular kicker’s season we are able to compare every kicker’s season to the global average to determine what benefit, if any, that player had to his team.
Our next step was to identify systemic differences (besides skill) that favored certain kickers over others in order to neutralize them. A kicker from a team that plays home games in a dome has the benefit of kicking under ideal environmental circumstances for half of his games. Ignoring uneven situational factors would likely result in a systemic bias to kickers of certain teams in spite of their skill.
Distance is, by far, the most important variable in this equation. But for the purposes of completeness, several others, including dome/outside, temperature, wind and humidity need also be evaluated. Each of these factors was found to impact the analysis in some way, though not uniformly according to distance and with varying degrees of impact. Whether or not the kick occurred in a dome was the most impactful variable and also the easiest to evaluate because of the binary nature of the data. Unfortunately, the effect is not uniform, as we can see from the graph below. The benefit of kicking in a dome increases as distance to the goal increases, but we can calculate the benefit for each distance and adjust the model accordingly. The likelihood of a 52 yard field goal in a dome was approximately equal to the likelihood of a 47 yard field goal outside whereas the effect is negligible inside of 5 yards.
By including a variable for stadium design into the equation, we can better approximate the ES for a specific kick situation.
The effects of wind, temperature and humidity also played a role, though diminished. From the comparison graph below we can see that of those three factors, temperature had the greatest effect on success rate. Intuitively, low wind, high temperature and low humidity helped kickers, while the opposite hurt them. Humidity produced almost negligible effect, though the highest levels of humidity very slightly reduced kicking ES. These three factors were included in the data for outside kicks.
It is interesting to note that while cold temperature affects kicking linearly with distance, there is a distinct inflection point around the 20 yard line for the effect of wind. This is intuitive, as kicking in high winds has an exponential effect on difficulty with distance. Though the difference is slight, if given a choice, it appears that warm conditions are preferable for shorter (<38 yards) FGs, but calmer conditions are preferable for longer (>38 yards) FGs.
By combining the variables of distance, dome, temperature, wind and humidity, we can achieve an ES that describes any kick difficulty to a copious amount of precision. Kickers who repeatedly outperform their ES are undeniably valuable and should be considered a premium talent in the league.
The next question is the following: Who, if anyone, is consistently outperforming ES? By simply observing kicker EPAs, we can know who has produced more or less than should have been expected. But, we are also interested in knowing the confidence we can have in the repeatability of performance. Kicking, by nature, is a highly variable task. A kicker who was +3.0 ESA had an excellent year, producing 9 more points for his team than should have been expected of him given the circumstances of each
of his kicks. However, in a sample size of only one season, it becomes difficult to know whether the success should be attributed to luck or skill, an extremely important distinction. Skill repeats itself while luck does not. To address this problem we need to assign a p-value to each score signifying its statistical significance.
We do this with an iterative approach, simulating each sample size (1 season, for example) and finding the percentage of average kickers who would be expected to outperform the real kicker given an identical set of kicks. We might have more confidence in the abilities of kicker who is +6.0ESA over 5 seasons with p-value = .01 (1% of average kickers would expect the same level of success over the same set of kicks) than a kicker who has +3.0ESA with p-value = .15 over a single season despite the higher rate of success.
We can take a look over the entire careers (year 2000 on), to find any statistically significant performances. Using 79 kickers who attempted more than 5 field goals since 2000, we would expect the distribution of p-values to be uniform only if all kickers were essentially average. What we see is not far from it. Only 7 out of 79 kickers were able to outperform the top 5% of a similar set of average kickers, only one outperformed 99% of that set. If all kickers are essentially average, purely by chance we would expect about 4 kickers to be considered top 5% and about 1 kicker to be top 1%. What this indicates is that the distribution of skill in kickers in the NFL very strongly resembles the distribution of a set of completely average kickers with no discernible skill differences. The most productive kicker of the decade, Matt Stover, added only 4.2 points per season to his teams’ offensive production. Even this meager amount is likely due mostly to variance and is subject to regression to the mean in the coming years.
To put it all together we need to evaluate the ESA of each kicker based on his perceived worth. We use salary cap space entering the 2009 season for all starting kickers and measure their production based on ESA. There is a very small Pearson correlation between ESA and salary cap value of .13 indicating a very weak association between those that are paid more and those that produce more. When plotting ESA against kicker salary and applying a linear regression we see almost no increase in ESA with additional salary expenditure. The exact calculations indicate that the each additional point per season above the mean costs approximately $1.25 million when purchasing value through kickers.
A common argument is the value of “clutchness” for kickers. The belief is that certain kickers perform better under pressure than others and that they earn their salaries through excellence in a handful of key situations in their career. The very nature of clutch kicking requires the situations to be sparse, and consequently be subject to small sample bias. It would be very difficult to determine whether a kicker was truly excellent in the clutch or whether this perception developed from simple variance within small samples. We can approach this problem by observing the league as a whole. In clutch situations, defined as a deficit of 0-3 points with less than 3 minutes remaining or overtime, the league as a whole deviates less than half of a percent from normal game situation averages. This indicates that the sum
of kickers is essentially clutch-neutral, but leaves open the possibility that there are clutch kickers and “chokers” in equal proportions.
We can also borrow data from other situations and sports, such as baseball, which have performed numerous studies indicating that clutchness is not a quality that is inherent to an individual in professional sports. The perception of clutch and non-clutch players is a result of small sample variance. (http://www.baseballprospectus.com/article.php?articleid=2656)
Another consideration is the effect of kicker specialization. A kicker who excels in long-range FGs, but under-performs in short-range situations would appear virtually average if he is utilized in a normal distribution of kick distances. However, if it is possible to specifically take advantage of a particular kicker’s skill set, then his value will become more apparent through this model.
In any case, any justification for kicker expenses will require a value to be attributed to higher salaried players that is above and beyond their ability to exceed average kicker production. Are those kickers clubhouse leaders? Or do those kickers make their money by excelling at kickoffs or serving as backup QBs?
This study is an initial statistical analysis. While there remain areas for improvement and refinement, the conclusion is unlikely to change significantly. The conclusion is that the cost in terms of kicker salaries with respect to their expected returns is enormous. Though they play a high-pressure, vital part of the game, the evidence shows that there is little difference between the perceived best and worst kickers in the league. Past success and struggles have show to be extremely poor predictors of future performance. It is this parity that should drive the market price for this position down. Teams should not pay $3 million for a “proven” veteran kicker, when that kicker is scarcely more likely to succeed next season as the unknown who makes the league minimum. The value-over-replacement for kickers approaches 0. A more in-depth future analysis will calculate marginal costs and benefits of other positions for use as a direct comparison and evaluation of team salary distribution. A team’s kicker’s salary proportion should be reduced only if they believe that the return on investment for another investment is greater than 1 point per $1.25 million.
Sunday, September 19, 2010
This contribution comes from the desk of Dan Schlauch.