Friday, December 2, 2011

Betting Market Power Rankings

by Michael Beuoy

The purpose of this post is to use the point spreads from recent weeks of the season to derive an implied power ranking. Basically, the point is to try to figure out what the betting market thinks are the best and worst teams in the NFL. From a broader perspective, I hope to provide insight into how the betting market “thinks” in general. One result that emerged from this analysis was a measure of how much the betting market reacts to the result of a particular game.

The challenge in deriving a power ranking from the point spreads is that the point spread only tells you the relative strength of the two teams. For example, Green Bay is favored by 7.0 points on the road against the NY Giants this week. We know that home teams are favored on average by 2.5 points, so after removing the home team bias, the betting market appears to think that Green Bay is 9.5 points better than the Giants. New England is favored by 21(!) points at home against Indianapolis So the betting market thinks that New England is 18.5 points better than Indianapolis.

The question is, does the betting market think New England or Green Bay is the better team? It’s impossible to answer just using the spreads from this week (you have 32 unknowns and only 16 equations). My approach below is to look back over the past five weeks of point spreads and results to come up with a best fit ranking, where the ranking is calibrated such that it best predicts the point spread according to the following formula:

Point Spread = Home Team Rank - Visiting Team Rank + 2.5

I figured I would cut to the chase and provide the rankings themselves, and save the methodology explanation for the end. I followed the format of the Advanced NFL Stats (ANS) Team Efficiency rankings, and also provided the actual ANS rankings as a point of comparison.

Here is a glossary of terms:

LSTWK - The betting market rank as of the prior week (using the same methodology). It’s interesting to see who the big movers are.
GPF - Stands for Generic Points Favored. It’s what you would expect a team to be favored by against a league average opponent at a neutral site.
GWP - Stands for Generic Win Probability. I converted the GPF into a generic win probability using the following formula: GWP = 1/(1+exp(-GPF/7)). This gives a more direct comparison to the ANS rankings.
ANS RNK - The Advanced NFL Stats Team Efficiency rankings for the same week (week 12 in this case)
ANS GWP - The Advanced NFL Stats Generic Win Probability for the same week.

Here are the rankings:


Some observations:
The top team and bottom team shouldn’t come as any surprise. In addition, there is the proverbial “50 feet of crap” (or 4 points) between the Colts and the next worse team.
Despite San Francisco’s place near the top of the “conventional” power rankings (ESPN, CBS, etc.), the market has them ranked much lower at number 8; not as low as the ANS rank of 13, but in the ballpark.
I was surprised to see New England ranked so closely to Green Bay (they were actually a half point ahead of them last week).


The first step was to see how many prior weeks of point spreads I had to feed into the model in order to get an optimized estimate of the point spreads for future games. The drawback of using prior weeks is that you’re using stale information. The point spread from a few weeks ago will not accurately reflect the market’s latest assessment of their strength. I attempted to address this somewhat by using a recency weighted average. If I was using 7 weeks of spreads, the most recent week would get a weight of 7, the week before a weight of 6, and so on. This allowed me to arrive at an answer while still giving preferential treatment to the more recent market estimates. Through trial and error optimization, I found that using the most recent five weeks of point spreads produced the lowest mean squared estimate error of the point spread for the coming week. The calculation itself is equivalent to a weighted linear regression with 32 dummy variables, 1 for each team.

How the Betting Market Reacts to Game Results

Although the approach above generated a set of rankings, it ignores some potentially useful information that could be used to better match the coming week’s point spreads. For example, the week 13 rankings used the point spreads from weeks 9 through 13. In week 9, New England was favored by 9.5 points over the New York Giants. However, the Giants ended up winning by 4 in that game. So, the outcome of the game deviated from the market’s expectation by 13.5 points. One would expect that the market would factor that result into future estimates of both New England’s and New York’s strength. I assumed that the betting market would recallibrate itself according to the following formula:

revised “best estimate” spread = original spread + (credibility coefficient) x (deviation from expected)

I then determined what that credibility coefficient (CC) was by trial and error optimization. I found that a coefficient of 15% generated the most accurate prediction of the coming week’s spreads. In other words, the betting market appears to treat the outcome of each game with 15% credibility when revising its estimates of each team’s strength. So, in the New England/ New York example above, if those two teams had been scheduled to play each other at New England again, the new spread would have been revised down from 9.5 points to 7.5 points ( = 9.5 + 0.15 * (-9.5 - 4).

Prediction of This Week’s Point Spreads

See below for a comparison if how well the ranking methodology predicted this week’s point spreads. Note that this uses rankings that factor in the results from last week’s games, but does not factor in the spreads of this week’s games into the rolling 5 week average (this keeps the estimate independent):

ATL @ TEX6.0-2.0-8.0
BAL @ CLE-6.0-6.5-0.5
CAR @ TB5.03.5-1.5
CIN @ PIT8.56.5-2.0
DAL @ ARZ-6.5-4.52.0
DEN @ MIN2.00.0-2.0
DET @ NO10.08.5-1.5
GB @ NYG-4.5-7.0-2.5
IND @ NE21.521.0-0.5
KC @ CHI8.58.0-0.5
NYJ @ WAS-5.0-3.02.0
PHI @ SEA-4.0-3.01.0
RAI @ MIA2.53.00.5
SD @ JAC1.0-2.5-3.5
STL @ SF11.013.52.5
TEN @ BUF0.51.51.0

The biggest miss in the line prediction is on the ATL/TEX game where it appears that the market values Matt Schaub’s talents over his replacement by a significant margin. I think this may also be inflating ATL’s overall ranking somewhat. Its favorable point spread over Houston is being compared (on a recency weighted basis) against point spreads when Matt Schaub was playing.

If there’s interest, I can produce these weekly. I’ve got this boiled down to a quick piece of R code (which anyone is welcome to if they’re curious about the details of the methodology).


Boston Chris said...

Very fascinating. I definitely wouldn't mind checking these out each week.

TBD said...

Definitely interested in seeing this weekly. The results look fairly similar to the "betting expert" rankings that ESPN insider posts from "Vegas gambling experts" every week

Steven said...

I might be interested in the R code; I'd definitely be interested in where to find easily-extracted historical point spreads.

Jeff Fogle said...

Very happy to see this becoming part of the mainstream analytics conversation. Quick notes:

*Hope you'll consider also tracking game win probabilities using the no-juice moneylines (splitting the difference between favorite price and dog payback). Talked about this in a comment to BB last week. Would allow for more direct comparisons to BB's work beyond just looking at the rankings.

*There's a problem here with the methodology I think when backup quarterback come into play. The market doesn't really "glide" over five weeks to the new place for the backup (though it can also glide based on developments with the backup). So, maybe, using scale from the chart above:

Houston with Schaub: 5.0
Houston with Leinart: 3.0
Houston with Yates: -1.0

Chicago with Cutler:+2.0
Chicago with Hanie:-1.0

Oddsmakers have different power ratings for each starting quarterback...or at least have a mental adjustment ready (partial disclosure: I've been ghostwriting on and off for some oddsmakers over the last two decades). We're in the midst of a tricky sequence with a few teams in recent weeks. You might consider computing separate ratings for all 32 teams with the 2 QB's most likely to get starts.

*Atlanta looks to be a bit warped by what's happened with this week's Houston game. Based on their other recent games, and how the opponents rank above for you, Atlanta would be:

3.5 against Minnesota (-10 at home)
0.0 vs. Tennessee (-3.5 at home)
5.0 vs. New Orleans (-1 at home)
-0.5 vs. Indy (-6.5 on the road)

I'm using 3 rather than 2.5 because oddsmakers tell me they generally use a blanket 3...though I do think that some very poor teams may only get 2 to 2.5 in some cases. That would be a composite 2.0 above for Atlanta as a four-game average...which is consistent with where you say they ranked last week. I don't personally believe the market has Atlanta as high as you're representing.

*Just as an FYI thing as you go forward. There are different components within the market that tend to give away their thoughts depending on what stage you are in the week.

Opening Lines: Oddsmakers assessments
First Moves: Professional wagerers assessments (called "sharps" in Vegas lingo, they will bet for value on openers if they believe a number is bad...the public doesn't bet early, tending to wait until the weekend to really become a factor)
Weekend Moves: Generally those inspired by public money...though it can get messy because the biggest sharps will try to manipulate the line for value...and then will jump in with both fists if they get their number.

The point is that it can be tricky truly defining the "market" because it's in transition through the week. If you only use opening's mostly the oddsmakers. If you only use Thursday lines, the public hasn't cast much of a vote yet. If you only use closers, then they can get warped the wrong way if there's a final hour self-defense mechanism on an extremely one-sided game (in terms of money).

A common recommendation is to use what's called "widely available" Sunday morning lines a couple of hours before kickoff. Nothing's perfect, but that's a good spot to settle in. When you hear analytical types in the field talk about the perfected market theory, they're often referring to where a line kind of finally locks in after the opener has been shaped by the smart money. They would suggest that THAT line can't be beat because the value has been bet out of it. Not everyone believes in the perfected market theory though. The sharps who do attack the openers...and then attack late if public sentiment has moved a number away from what they believe the right spot is on game day.

Not sure if this helps the project...but wanted to throw it out there for you.

I would add my vote to those who would like to see this study or variations of it posted weekly...

Jeff Fogle said...

Sorry wasn't clear there in the Atlanta comment. I was using 3.0 for home field rather than 2.5 for home field based on what oddsmakers say they typically use...

Tom said...

May I suggest, further to Jeff's idea of also comparing moneyline implied probabilities, that you do not take said probabilities from a bookie, but take them from Betfair, as then the juice is not an issue and you can take the (rather more accurate) midpoint between back and lay prices.

For instance, Betfair has Atlanta with a 57% chance of beating the Texans. Betfair is (idealistically) driven by the efficient-market hypothesis, so it avoids the bias that bookies can play with.

Tom said...

And by bias, I mean the fact that the bookies will put the juice where they think makes most sense, skewing the probabilities if you just split the juice evenly.

Jim A said...

I've been publishing a similar ranking system based on the point spreads for the last few years (see link on my name). My methodology is just to use iterative SRS (as described on the p-f-r blog) with point spreads rather than game results. For the recency issue I found that a simple 3-2-1 weighting of the last three games gave the best fit. Any farther back made the fit worse in my experiments.

I believe another difference is that I don't factor in last week's game results, but I do include this week's point spreads. At the time I implemented this, I didn't really care about predicting the point spreads or measuring the market reaction to results as you have. Rather, I only wanted to compare teams that weren't playing each other this week. One problem this presents is dealing with bye weeks. I didn't want to simply carry over a team's rating through its bye week because, in theory, incorporating updated spreads for non-bye teams should give you additional information about the strengths of the bye teams via opponent adjustments. In practice, bye week teams' ratings tend to be a little too volatile. Another weakness is that the largest individual game spreads seem to skew the ratings more than they should and those teams tend to have the greatest error relative to the actual spreads. I never got around to playing with various solutions to these problems.

Anyway, I found this to be a very interesting exercise and am glad to see someone else has too!

Mac said...

I use the same pointspread to moneyline conversion as you ml=exp(ps/7) or ps=7*ln(ml) in my power rankings (pointshare) however this means that if team A is 5 points better than B and B is 5 points than C then if you say a is 10 points better than c the moneyline odds would give you a different answer if you used a gwp or log 5 approach then converted that win prob to a pointspread. As the relationship is not linear so a moneyline approach would have team a as less than 10 point favourites. Could you redo your analysis first converting to odds then working out the rankings then convert back to pointspreads.

j holz said...

To avoid including stale information, it's best to look at future lines, not past ones. Some websites and Vegas sportsbooks offer odds on next week's games and "games of the year"; these odds will reflect the cutler and schaub injuries and also whatever we learned by watching last week's games.

This is a good concept, but you're measuring the wrong target.

Jim A said...

j holz, that was my thought, too, but there isn't enough "interconnected" data between teams not playing each other to use only future lines. One idea I had that might help is to incorporate the odds to win the Super Bowl in addition to the current week's spreads, although you'd have to be careful to recognize that divisional/conference alignments can affect those. The tradeoff is you can either treat the future games spreads as representing immutable fact regarding the relative strengths of those teams and fit the rest of the teams around that the best you can. Or you can spread the errors around more evenly, which is the method I chose.

I found that my 3-2-1 weighting of 50% future lines and 50% past lines yielded a reasonably accurate approximation.

Justin said...

I would be +1 for making this a weekly feature :)!

Jim Glass said...

This is all very excellent, and I second the idea that it would be nice to see this ranking posted somewhere every week, making the weekly changes in Vegas's opion visible.

One minor suggestion: remember when converting the Vegas point spread to win probability the over/under matters. A 6-point spread with an over/under of 37-points expected to be scored has a higher win probability than one with 47 points expected to be scored. So when converting the spread to win probability I don't use the exponential function but instead figure the projected score (spread applied to over-under) and then take the Pythagorean win expectation.

The difference can realistically be equivalent to a couple points of spread (more at the extremes) in a given game. Over a few weeks it washes out so I wouldn't worry about it. But if one is using a small sample of only three or four weeks plus over-weighting the last, it might make a visible difference for some teams.

That said, I probably still wouldn't worry about it. A difference significant enough to be visible isn't necessarily significant enough to be significant. False precision is something always to beware against.

IMHO, the value of objective ranking/rating systems like this isn't their great precision (which is impossible in a season of only 16 games, even less so only part way through the 16) but how they can make plainly visible to the naked eye something one might have missed otherwise. If "Vegas ratings" (or ANFL Stats ratings) rate a team by a bunch a points different than I would that's interesting, if by 1/2 a point or 1 1/2 points that's not so interesting.

So while I think it probably doesn't make any practical difference, I metion it just for the sake of logical consistency and because one might want to check the scale of the difference it makes, to be sure. After all, using a home field advantage of 2 points or 3 points doesn't make much difference and gets washed out quickly too, but people put a lot of effort into calculating that. (But then, they probably gamble a lot more money than I do.)

Michael Beuoy said...

Thanks to everybody for the feedback.

First off, my source for the spread information is It's also a useful site for game by game statistical information as well.

I'm glad there's interest in this. I will try to get these submitted to the site each week soon after the weekly ANS Efficiency rankings are published.

Here is a link to the R code. Any thoughts or suggestions on the methodology are welcome. Link:

Jim A - I did some googling while I was developing this approach to see if something like this existed. I wasn't able to find anything, but I'm not too surprised that I wasn't the first to try this.
I ran into the exact same difficulty when it came to bye weeks. Teams on bye seemed to have their ranking magnified (good teams moved up, bad teams moved down). The approach I eventually settled on was to normalize each team's weights to 1.0. If a team was missing a week due to a bye, the weights for the other weeks would get magnified to compensate. I wasn't thrilled with the solution, but it seemed to work well enough.
I took a look at your rankings. Your rankings better match the NO-NYG spread than mine, but mine got closer on the IND-NE spread. Care to put our approaches head to head for upcoming weeks? :)
I will try your weighting approach to see if I get a better fit. Like I said, mine was developed by trial and error and it's very possible I missed a better approach.

Jim Glass - I completely agree. I knew my approach was not perfect, and modelling error of a point or two was unavoidable. Fine tuning each decimal point was not my goal.

However, I am intrigued by the idea of using future weeks spreads to the extent they're available. I may take a deeper look at that.

Anonymous said...

Outstanding concepts. I would love to see this as a regular feature

The Wizard said...

Great stuff as always. I would love to see weekly breakdowns as well as any R code.

Mike D said...

I'd like to see them posted weekly as well.

I have some suggestions/questions (full disclosure: I am NOT a statistician & not even 100% sure I spelled it correctly).
Wouldn't it be possible to tweek the equation to its highest probability of accuracy by using only historical data (i.e., NFL 2010) then applying it to NFL 2011 to see if it translates? Instead of tweeking it week to week or including the previous 4 or 5 gms, is it possible to use the now static historical data as something like a laboratory conditions to create a better model?

Is there a scientific reason why the reality of static historical data isn't the primary source? Scientifically, is comparing NFL 2009 & 2010 or 2010 & 2011 like comparing apples & oranges instead of apples to apples?

Just a lay person chiming in...

Anonymous said...

This should definitely be a weekly feature. Good stuff!

Michael Beuoy said...

Mike D - View the point spreads as stock prices. If you wanted to know the state of Google right now, you would look at their stock price today, you wouldn't average their stock price over the past few years.

Basically, all I'm doing here is trying to figure out the "stock price" of each team. But instead of getting direct quotes off of the NYSE, all I have available is the difference in stock prices between different companies. And those companies change each day.

By necessity, I'm forced to look back to "old" stock prices just so I have enough connections between the various teams in order to get a proper comparison.

Jeff Fogle said...

Market Price snapshot an hour before kickoff (with win probability estimated based on no juice moneyline in taken from prominent offshore locale)

Buffalo -1 over Tennessee (Buffalo 52%)
Chicago -8 over Kansas City (Chicago 78%)
Miami -3 (-120) over Oakland (Miami 62%)
Pittsburgh -7 over Cincy (Pittsburgh 75%)
Baltimore -6.5 over Cleveland (Balt 74%)
NY Jets -3 over Washington (NYJ 60%)
Atlanta -1 over Houston (Atlanta 52%)
Carolina -2 over Tampa Bay (Carolina 56%)
***Note that Freeman is out for TB
New Orleans -8.5 over Detroit (NO 78%)
Minnesota -1 over Denver (Minnesota 52%)
San Francisco -14 over St. Louis (SF 89%)
Dallas -4 over Arizona (Dallas 65%)
***Kolb is back for Arizona
Green Bay -6.5 over NYG (Green Bay 71%)
New England -20 over Indy (NE 95%)

The TB/Carolina line was TB -3 at home earlier this week, suggesting equality. A 5-point move would mean TB with Josh Johnson is 5 points worse than Carolina, and wherever anyone had them with Freeman.

The Dallas/AZ line was Dallas -6.5 when it was thought Skelton was still playing. So, Arizona is 2.5 points better with Kolb than Skelton in the market's view...

Looks like Yates got a little respect in the market today, as Houston is now only +1 instead of +2. They should be 4 points worse than Atlanta in a market snapshot at the moment rather than just one or two I'd think.

Don't have time at the moment to compare differences here to MB's very interesting work up above...or to the win probabalities for the week from BB (life can be busy in the hour before kickoffs!). Wanted to throw down a live market look in the last hour since so many have posted interest in this kind of material. Might influence future discussions at the very least. Enjoy the games!

SportsGuy said...

I've been doing spread ratings since the mid-80s. The error distribution you'll see over the long haul is precisely the shape you're seeing this week, recency adjustments or not.

I just don't understand the obsession with using ranks. Why convert ranks to points when you can use points to start with?

Michael Beuoy said...

SportsGuy - That's exactly what I did. The model output is the "Generic Points Favored". The points are converted to a ranking, not the other way around.

Tom said...

Jim Glass, in all my work on NFL stats I have found no evidence that the strength of a spread is total dependent, this is also true of the NBA and NHL. From my observations this is due to covariance between team scoring.

SportsGuy said...

"Point Spread = Home Team Rank - Visiting Team Rank + 2.5"

That is kinda where I get the idea you're figuring ranks first then converting to points.

Have you run your algorithm on past data?

Michael Beuoy said...

SportsGuy - Sorry about that. I was being loose with my terminology. That should read "Point Spread = Home Team GPF - Visiting Team GPF + 2.5".

If you're asking if I backtested the approach, the answer is yes (it's how I arrived at optimized credibility coefficient and number of weeks).

Running the algorithm on past data generates a Mean Absolute Error (MAE) of about 1.7 points in predicting the spread for the upcoming week. Unfortunately, I had no benchmark to compare that to.

SportsGuy said...

I guess what I meant was how far back you tested. I'm interested in your error distribution. Do you have that data handy?

Whamp said...

I Would be interested in seeing this weekly

Michael Beuoy said...

I don't have anything handy on error distribution, but here's the MAE for the past 4 seasons:

2010 - 1.8
2009 - 1.7
2008 - 1.5
2007 - 1.6

Jim A said...

Mike, my ratings ended up on after I found a similar betting market system on that site. Some interesting discussions of methodology resulted in that site's owner asking me to contribute my own system to his site. This was around November 2009, as I recall. So I don't even claim to be the first to publish such a system. It wouldn't surprise me if there are others out there, too.

I've always thought I or someone else could come up with more accurate results. In particular, I wondered how much using SRS limited the results as opposed to a more complex computer rating system. Your previous work on opponent adjustments may be useful in this regard. I kind of lost interest in working on this myself after the initial thrill and have since moved on to other projects. Feel free to use my ratings as a benchmark or in any way that is helpful. Maybe I'll look at this again if I get a chance; as I recall my system's MAE was in the 0.7-0.8 range, but again my goal was slightly different than yours and using future games is, in a sense, cheating. It would be interesting to see a more detailed analysis of how the systems compare. I definitely look forward to seeing what you come up with next.

The Wizard said...

to JIm A, I don't quite follow your reasonaing. You say you do not factor in last weeks results (but this weeks pointspreads).
You state "At the time I implemented this, I didn't really care about predicting the point spreads or measuring the market reaction to results as you have. Rather, I only wanted to compare teams that weren't playing each other this week." In what way are your trying to compare them, then? by talent difference? I guess I miss your point.

and to j holz, you said to look at future spreads. Well, that to me seems to defeat the purpose. The whole purpose is to predict point spreads based on past point spreads, isn’t it? Yeah, a future one will essentially tell you what the bookies are thinking, but you want to see how you can predict what the bookies are thinking.
For the purpose intended, I would have done exactly what the author did with perhaps only weighting type differences.

Jim A said...

My point is that I was mainly interested in estimating how the bookmakers would set the line in a hypothetical game between any two teams. For example, my ratings estimate that if Green Bay and New England played on a neutral field right now, the Packers would be favored by 2.5 points. So basically, I'm letting the bookmakers do the work for me and using their expertise in setting lines. Future spreads are more up-to-date than past spreads and, in theory, should be more accurate in terms of predictive power (if for no other reason than they account for recent injuries).

Trying to predict future spreads based on past spreads is a similar but not identical exercise. That's closer to what bookmakers actually do, and such a model would be particularly useful if you were applying for a job with Las Vegas Sports Consultants (the company that provides initial lines to most books).

FYI, my rankings for week 14 are up. The MAE for the week is 0.41, which is particularly low because the teams are pretty well-connected--no bye weeks this week or previous two weeks.

Post a Comment

Note: Only a member of this blog may post a comment.