Saturday, December 11, 2010

Win Probabilities and Points


by Bruce D

There are many win probability tables published each week for upcoming NFL games that calculate the percentage chance of each team to win versus another. Perhaps the best known to visitors of Advanced NFL Stats is the weekly probabilities at The Fifth Down.

I've seen several comments about these types of probability charts asking how the win probability percentage equates to points(margin of victory). I've yet to find a reply to any of these, so I did some analysis in this regard using data for the last 6 NFL seasons, 2004-2009.

This analysis used points based on completed games by adding various points to each team's score, and then calculating the percentage of the games a team would win if given the extra points.

For each number of points tracked(VariablePoints, seen below) the following was performed for every team/game combination for the last 6 seasons.

If (points for + VariablePoints)-(points against) > 0, then its a win, and all wins are counted(win count).

Then to get the average:

(win count)/(number of team/game combinations checked, eliminating any ties)

The following table will show the actual win% based on the variable points.

So assuming these points to win%s represent the value of points, the following may be assumed(?):

Given 2 equal teams, giving either team 2 points would increase their win probability from 50% to 52.65%.

and also

Given a team with a 52.65% win probability, our best estimate as to margin of victory is 2 points.

Who knows how accurate these point margin of victory to win%s will be in the future, but is was based on quite a lot of actual results data.

Feel free to tear this analysis apart, I'm only human and may have missed something.

8 comments:

Tom said...

I've recently been working on a model for the NFL which takes drive data to simulate matches between teams. I can use this to simulate a game between two league average teams and produce a win percentage. I have added a variable that allows me to control what points advantage a team is given, and these are the values of win probability that I get. I give these to the nearest percent and I will explain why in a moment.

0 50%
1 54%
2 56%
3 59%
3.5 61%
4 63%
5 65%
6 67%
7 71%
8 74%
9 75%
10 78%
11 81%
12 82%
13 84%
14 87%
15 88%
16 89%
17 91%
21 95%

Looks odd in places, no? It agrees pretty strongly with your numbers, especially from three points up until about ten. I have some theories about the strangeness of certain 'jumps' like sixteen to seventeen, mostly to do with how scoring is done in the NFL.
Other than that I would say that the values for one and two do not agree well, and I'm not sure if that's my model or simply a lack of data on your part. These things do not settle very quickly. I set my model to simulate two hundred thousand games to get each of these percentages, and even then it only settles to the nearest percent. To get the first digit after the decimal point I would have to do seven million simulations for each one. Having seen this I suspect that there is a sparsity of data that is well settled at the extrema of your points values, but that is just a guess.
I plan to look into this further.

Brett said...

Home teams win about 58% of the time, and home-field advantage is assumed to be 2 to 3 points, so this would be more in line with Tom's numbers than Bruce's.

Bruce, your method seems sound to me, so I'm not sure why there is a discrepancy. Perhaps the 6-season sample is too small? Or maybe it's because your method is based on actual results rather than expected results like Tom's simulation method. I don't think actual margins of victory accurately represent the teams' win probability, especially in close games. For example, a team down by 6 in FG range at the end of the game will obviously go for the TD rather than kick a FG. When they fail to score, the actual margin of 6 does not mean they were actually 6 points worse than the other team. In reality, they probably had close to a 50% chance of winning until the last play of the game.

Bruce D. said...

om and Neil,

Excellent posts.

To Tom, I do tend to lean towards "real world" actual results. There are usually so many unknown variables in our endeavor that it seems safer to use what was proved to have happen(in the past) vs what should have happened. To me, all these hidden variables are hard to quantify.

To Brett, I agree with your specific situations such as, "a team down by 6 in FG range at the end of the game will obviously go for the TD rather than kick a FG. When they fail to score, the actual margin of 6 does not mean they were actually 6 points worse than the other team. In reality, they probably had close to a 50% chance of winning until the last play of the game.". My thoughts are, "real worlds results" has a better chance of having these situations included rather that simulations.

I'll be back with more analysis.

Thanks to you both.

Bruce D. said...

Apologies for the "om" Tom.

Tom said...

I understand your point Bruce. The endgame tactical side of football is something that cannot readily be included in a model, because for instance a model such as mine will always assume that each offensive drive has the same aim, that is, to generate points, and the number of points generated is a random variable dependent on a team's ability to score touch downs and field goals. That model is fine, perfect even, as long as we're not in the end game and a team may instead be trying to hold on to the ball and grind down the clock, or to just get into field goal range instead of going for the touchdown.
However, thinking about it now I believe that I can account for the behaviour of a team chasing three points in the final drive of the game rather than seven by assuming that if a team scores a touchdown say twenty percent of the time on a normal drive, and a field goal a further twenty percent, then their chance of scoring a field goal on their final drive can be decently approximated by summing those two and giving forty percent.
In fact, I may just go and try that now.

As for quantifying hidden variables... For the sake of considering the average team there is just no need. Every part of the game is wrapped up in the statistics that the game spits out, it's just a case of manipulating them, in which case creating theoretical data, i.e. using monte carlo methods, has proven more accurate than using standard statistical methods.

Bruce D. said...

Tom,

I'm lost about your post:

"As for quantifying hidden variables... For the sake of considering the average team there is just no need. Every part of the game is wrapped up in the statistics that the game spits out, it's just a case of manipulating them, in which case creating theoretical data, i.e. using monte carlo methods, has proven more accurate than using standard statistical methods. "

I don't understand what you mean, and I'd like to.

Expand if you have the time.

Bruce

Tom said...

I'd be happy to try to explain, Bruce.

If we want the values for a league average team, then we can use league average statistics. This is useful, as these are symmetrical statistics. That is, for instance, a league average team concedes as many points as it scores in a game, yards for yards, touchdowns for touchdowns...on average. So if the league average team allows 0.2 TDs per drive, they also score 0.2 TDs per drive. This symmetrical nature makes it far easier to compare the league average team to itself with minimal margin for error in the model (essentially none, in fact). As such there are no intangibles, no luck, nothing.

As for monte carlo methods, they are a relatively modern tool (only useful since the advent of modern computing) wherein regression and model-fitting is not used in order to make predictions, rather underlying statistics in the game that are strongly predictive of themselves, and directly quantifiable can be used within a simulation model of the game.
This model is not deterministic like a classical statistical model, it gives a different result every time. In my case, it gives a game score. Then I use the model to predict the score 200,000 times, and that gives me win percentages, amongst other things.
Monte Carlo methods are incredibly powerful because they make no assumptions about past performance other than for the team being considered, whereas classical regressive models say 'well this team looked numerically like that team in some areas and so this team is a good approximation for that team'...

Ok, that's not the best explanation, but I hope it helped.

Here's a couple of tables I produced for this weekend that show probabilities for games (and spreads, since I am a betting man), and also one that ranks teams by their GWP as calculated by the model. Two notes: 1) the expected scores look strange for some games, they are averages rounded to whole numbers, not truly the most likely score in each case due to the nature of point scoring in football. 2) The Vikes game was calculated as if it was being played in the dome, their WP would be 0.42 at a neutral site.

http://img823.imageshack.us/img823/5962/nflratingsweek15.png

http://img220.imageshack.us/img220/6093/nflprobabilitiesweek15.png

Tom said...

I'd be happy to try to explain, Bruce.

If we want the values for a league average team, then we can use league average statistics. This is useful, as these are symmetrical statistics. That is, for instance, a league average team concedes as many points as it scores in a game, yards for yards, touchdowns for touchdowns...on average. So if the league average team allows 0.2 TDs per drive, they also score 0.2 TDs per drive. This symmetrical nature makes it far easier to compare the league average team to itself with minimal margin for error in the model (essentially none, in fact). As such there are no intangibles, no luck, nothing.

As for monte carlo methods, they are a relatively modern tool (only useful since the advent of modern computing) wherein regression and model-fitting is not used in order to make predictions, rather underlying statistics in the game that are strongly predictive of themselves, and directly quantifiable can be used within a simulation model of the game.
This model is not deterministic like a classical statistical model, it gives a different result every time. In my case, it gives a game score. Then I use the model to predict the score 200,000 times, and that gives me win percentages, amongst other things.
Monte Carlo methods are incredibly powerful because they make no assumptions about past performance other than for the team being considered, whereas classical regressive models say 'well this team looked numerically like that team in some areas and so this team is a good approximation for that team'...

Ok, that's not the best explanation, but I hope it helped.

Post a Comment

Note: Only a member of this blog may post a comment.