Tuesday, January 20, 2009

In running model for the NFL.

by Denis O'Regan

(Before I start this can I just say that Brian's online "in running" calculator is a fantastic bit of kit and I want to thank him for making it available to everyone.My approach is going to be different to Brian's.....my calculator wouldn't know a first down if it got blindsided by one.....)

Modeling in running soccer matches is a relatively easy undertaking and usually involves estimating in game scoring rates for both teams and using these averages to calculate the probability of individual scoring events occurring in the remainder of the game by way of the Poisson distribution.

I decided to try to apply the approach to the NFL by following the methods I use for soccer games and trying to work around problems caused by the differences of the two sports when they arose.

At first glance American football is a much higher scoring sport than soccer.The average total goals in a soccer game hovers somewhere around 2.7 goals compared to 40 points in gridiron.However,the 40 NFL points are scored as a result of only 8 or 9 scoring events broken down as touchdowns,field goals with the odd safety thrown in.

Therefore I used scoring events instead of points to create a scoring expectancy for the NFL teams. For example,Team A's offense averages A scoring events per game in a league where the league average is L scoring events per game.They play Team B,whose defense allows B scoring events per game.It should be possible to work out how many scoring events Team A should be able to manage on average when they host Team B.

Team A scores at A/L times the league average.
Team B concedes at B/L times the league average.

Multiply these two rates together gives you a good idea of the scoring rate Team A will achieve against Team B's defense at a neutral venue.If we want to make Team A the home side we further need to divide the average scoring rate of all home teams (call this H) by the league average and incorporate this.

The scoring rate for Team A at home to Team B can be calculated as

Team A = A/L * B/L * H/L

Lastly,to convert this scoring rate to actual scoring events we multiply this rate by the league's average scoring events per game,namely L.

If we repeat this process for Team B,using the average scoring rate for away sides this time,we now have a scoring expectancy for each team.If you'd gone through this process for the NFC Championship game,you would have had Philly in for just over 5 scoring events and Arizona in for just over 4.

Armed with these team averages we can now use the Poisson distribution to calculate the probability that each team will achieve exactly zero scoring events,1,2,3 etc.
That further allows us to calculate the probability that,for example the game will end with Team A scoring twice and Team B scoring just once...and here's where the problems start.

In soccer winning 2-1 is definitive,you win the game,in the NFL it merely gives you a very good chance to win the game.Even if we throw out safeties as a rarity and assume all touchdowns are single point conversions,you can still score two field goals and lose to one touchdown.Whereas in soccer you could safely add the probability of a team winning the whole game 2-1 to it's overall win probability you have to keep some back in the NFL.

Breaking down the 2-1 scoring events into different combinations of TDs and FGs quickly become unwieldy if you include 2 point conversions,safeties,missed extra points,as do more common,higher scoring combinations.So I've tried various fudges.These include,do nothing (what you gain in terms of win probability on the 2-1 you lose when it comes to a 1-2 scoreline) to incorporating a points per score factor and using real life data based on scoring events.

Putting aside these problems for a moment,we do now have a way to attach a probability to every scoring event combination from say 0-0 all the way to 12-12.Which should cover most eventualities in the NFL.

So far all we've got is a clunky pre match predictor.To turn it into a serviceable "in play" predictor we need firstly to predict how each team's pre game scoring expectancy decays with time.Again soccer is quite easy.Scoring rate increases as the game goes on and you can calculate a team's remaining goal expectancy by multiplying the pre game number by the proportion of time remaining raised to the power of 0.84.The NFL also looks straightforward.After an initial lull due to kick off field position,the scoring rate remains relatively steady,until peaking inside each two minute warning.
For example 10 minutes into a game a team will still have 88% of it's pregame scoring expectancy "left".By half time it's declined so that only 47% remains.

We can now see what each team's pre game scoring expectation has decayed to at any point in a game.By entering these revised numbers into a Poisson calculation we can calculate scoring event combinations and their probabilities for the remainder of any game.Together with the current score and the average points per score for each team,this is valuable information as to determining the final outcome of the game.

A team that currently leads by 7 can now be assured of winning if they "win" the remaining mini match 2 scores to one.So for this particular combination of current score and predicted outcome the team can be assigned the full probability.Other combinations still require "interpretation".

Here's the current version in action.I've stuck to win probability updates after each score,firstly to keep it brief,but also to avoid the need to add much of an additional field position correction.

I've also added the in running probabilities from a UK betting site for comparison.

Philly@ Arizona.

Pre game Philly were favoured at most places by about 4 points.To make Arizona the favourites I think you had to take a very positive view about their home field advantage.The in running model favoured Philly pre game.I'll list the win probability of the current favourite and suffix it with a letter to denote who that fav was.

Score (Philly first) Model UK betting site.
Pre game 60%(P) 62%(P)
0-7 53%(A) 54%(A)
3-7 51%(A) 53%(A)
3-14 72%(A) 67%(A)
6-14 60%(A) 62%(A)
6-21 79%(A) 80%(A)
6-24 91%(A) 87%(A)
13-24 89%(A) 87%(A)
19-24 67%(A) 65%(A)
25-24 69%(P) 62%(P)
25-32 93%(A) 85%(A).

and now for Baltimore@Pittsburgh.

Pregame the UK were very bullish about Pittsburgh's chances and they were favoured by around 6 points.The model also favoured the Steelers,but only by about 4.5 points.

Score (Balti first) Model UK betting site.
Pre game 65%(P) 69%(P)
0-3 74%(P) 73%(P)
0-6 81%(P) 78%(P)
0-13 95%(P) 89%(P)
7-13 83%(P) 81%(P)
7-16 93%(P) 89%(P)
14-16 74%(P) 77%(P)
14-23 99%(P) 98%(P).

The fascinating game for me is the Ravens/Steelers one.Despite getting within 2 points,once the scoring started,the Ravens were still only a 26% chance to win immediately after that score.One of the strengths of this type of model is that it takes your pre game,long-term opinion of each team and sticks with it regardless.By the time they made it 14-16 just ten minutes remained.Baltimore's pre game scoring expectation had decayed to around half a score,Pittsburgh's was about a tenth higher.Plugging these new expectations into a Poisson you find that the chances of Baltimore scoring and Pittsbugh not in what remained of the game was around 17%.Both teams scoring once each was about a 20% chance,but most "one score each" permutations still gave the Steelers an overall win.Most betting men seemed to agree that,despite the closeness of the scores,Pittsburgh were still big favourites .....although I'll bet both sets of fans didn't feel quite so sure.

1 comment:

dWj said...

In re Poisson distributions for football, cf. http://www.xanga.com/deanjens/689404492/item/