Comments on Advanced NFL Stats Community: Win Probabilities and Points

I'd be happy to try to explain, Bruce. If we ...

2010-12-16T23:31:24.071-05:00

I'd be happy to try to explain, Bruce.

If we want the values for a league average team, then we can use league average statistics. This is useful, as these are symmetrical statistics. That is, for instance, a league average team concedes as many points as it scores in a game, yards for yards, touchdowns for touchdowns...on average. So if the league average team allows 0.2 TDs per drive, they also score 0.2 TDs per drive. This symmetrical nature makes it far easier to compare the league average team to itself with minimal margin for error in the model (essentially none, in fact). As such there are no intangibles, no luck, nothing.

As for monte carlo methods, they are a relatively modern tool (only useful since the advent of modern computing) wherein regression and model-fitting is not used in order to make predictions, rather underlying statistics in the game that are strongly predictive of themselves, and directly quantifiable can be used within a simulation model of the game.
This model is not deterministic like a classical statistical model, it gives a different result every time. In my case, it gives a game score. Then I use the model to predict the score 200,000 times, and that gives me win percentages, amongst other things.
Monte Carlo methods are incredibly powerful because they make no assumptions about past performance other than for the team being considered, whereas classical regressive models say 'well this team looked numerically like that team in some areas and so this team is a good approximation for that team'...

Ok, that's not the best explanation, but I hope it helped.

I'd be happy to try to explain, Bruce. If we ...

2010-12-16T23:28:37.240-05:00

Tom, I'm lost about your post: "As for ...

2010-12-16T21:30:26.816-05:00

Tom,

I'm lost about your post:

"As for quantifying hidden variables... For the sake of considering the average team there is just no need. Every part of the game is wrapped up in the statistics that the game spits out, it's just a case of manipulating them, in which case creating theoretical data, i.e. using monte carlo methods, has proven more accurate than using standard statistical methods. "

I don't understand what you mean, and I'd like to.

Expand if you have the time.

Bruce

I understand your point Bruce. The endgame tactica...

2010-12-16T17:27:50.421-05:00

I understand your point Bruce. The endgame tactical side of football is something that cannot readily be included in a model, because for instance a model such as mine will always assume that each offensive drive has the same aim, that is, to generate points, and the number of points generated is a random variable dependent on a team's ability to score touch downs and field goals. That model is fine, perfect even, as long as we're not in the end game and a team may instead be trying to hold on to the ball and grind down the clock, or to just get into field goal range instead of going for the touchdown.
However, thinking about it now I believe that I can account for the behaviour of a team chasing three points in the final drive of the game rather than seven by assuming that if a team scores a touchdown say twenty percent of the time on a normal drive, and a field goal a further twenty percent, then their chance of scoring a field goal on their final drive can be decently approximated by summing those two and giving forty percent.
In fact, I may just go and try that now.

As for quantifying hidden variables... For the sake of considering the average team there is just no need. Every part of the game is wrapped up in the statistics that the game spits out, it's just a case of manipulating them, in which case creating theoretical data, i.e. using monte carlo methods, has proven more accurate than using standard statistical methods.

Apologies for the "om" Tom.

2010-12-16T15:22:26.421-05:00

Apologies for the "om" Tom.

om and Neil, Excellent posts. To Tom, I do tend...

2010-12-16T15:18:43.746-05:00

om and Neil,

Excellent posts.

To Tom, I do tend to lean towards "real world" actual results. There are usually so many unknown variables in our endeavor that it seems safer to use what was proved to have happen(in the past) vs what should have happened. To me, all these hidden variables are hard to quantify.

To Brett, I agree with your specific situations such as, "a team down by 6 in FG range at the end of the game will obviously go for the TD rather than kick a FG. When they fail to score, the actual margin of 6 does not mean they were actually 6 points worse than the other team. In reality, they probably had close to a 50% chance of winning until the last play of the game.". My thoughts are, "real worlds results" has a better chance of having these situations included rather that simulations.

I'll be back with more analysis.

Thanks to you both.

Home teams win about 58% of the time, and home-fie...

2010-12-15T16:39:58.027-05:00

Home teams win about 58% of the time, and home-field advantage is assumed to be 2 to 3 points, so this would be more in line with Tom's numbers than Bruce's.

Bruce, your method seems sound to me, so I'm not sure why there is a discrepancy. Perhaps the 6-season sample is too small? Or maybe it's because your method is based on actual results rather than expected results like Tom's simulation method. I don't think actual margins of victory accurately represent the teams' win probability, especially in close games. For example, a team down by 6 in FG range at the end of the game will obviously go for the TD rather than kick a FG. When they fail to score, the actual margin of 6 does not mean they were actually 6 points worse than the other team. In reality, they probably had close to a 50% chance of winning until the last play of the game.

I've recently been working on a model for the ...

2010-12-15T08:06:26.541-05:00

I've recently been working on a model for the NFL which takes drive data to simulate matches between teams. I can use this to simulate a game between two league average teams and produce a win percentage. I have added a variable that allows me to control what points advantage a team is given, and these are the values of win probability that I get. I give these to the nearest percent and I will explain why in a moment.

0 50%
1 54%
2 56%
3 59%
3.5 61%
4 63%
5 65%
6 67%
7 71%
8 74%
9 75%
10 78%
11 81%
12 82%
13 84%
14 87%
15 88%
16 89%
17 91%
21 95%

Looks odd in places, no? It agrees pretty strongly with your numbers, especially from three points up until about ten. I have some theories about the strangeness of certain 'jumps' like sixteen to seventeen, mostly to do with how scoring is done in the NFL.
Other than that I would say that the values for one and two do not agree well, and I'm not sure if that's my model or simply a lack of data on your part. These things do not settle very quickly. I set my model to simulate two hundred thousand games to get each of these percentages, and even then it only settles to the nearest percent. To get the first digit after the decimal point I would have to do seven million simulations for each one. Having seen this I suspect that there is a sparsity of data that is well settled at the extrema of your points values, but that is just a guess.
I plan to look into this further.