by Tom Baldwin
Back in 2008 Brian covered a well known ratings system first conceived by the man whose name it now bears, that article offers a great insight into both the good and bad points of Elo ratings, and is certainly worth a read in preparation for this article.
Recently I got thinking about Elo ratings again, and realized their limitations, that they do not consider the score of games, can be overcome. When a game's outcome is considered by the original ELO ratings system it is done so on the basis that the game ends with a binary result, a win or a loss, but that is not all of the information available. As we have seen, a team like last year's Falcons can appear far stronger than they truly are when only their win-loss record is considered, and that is because they were on the right end of luck. Close wins are much more about luck than they are skill, winning by ten points is a lot more of an indication of one team's supremacy than winning by one, but how can we quantify this? Well, we need to know the answer to quite a simple question: when two completely equal teams play each other, what is the chance that, by luck alone, team A beats team B by X or more points? In this instance it will always be by luck alone, since both teams are technically equal, their levels of 'skill' completely cancel each other out.
That question is simple to ask, but not so simple to answer, luckily, I had already done the legwork for such a question months ago. Back in January, before the playoffs, I decided to build a simulation model of an NFL game. Within that model I created two identical (league average) teams, characterised by their chances of scoring on a drive, I also allowed for a variable that added X points to the score of whichever team I chose after the game had been simulated. By simulating thousands of games with different values of X, I was able to create a distribution that describes just how likely it is that if a team wins by X points, that it was not because of luck, but because of a higher level of skill. For example, if I set X as being 5, and then always added that to the score of team A, and observed in simulations that team B still won 40% of the time, then that would mean that team A could be expected to beat a completely equal strength opponent by more than 5 points 40% of the time by luck alone. So, in the new ELO model, if team A wins by 6 points, rather than saying that team A won with absoluteness and award them 1 minus their expected win probability, I would instead award them 0.6 minus their expected win probability.
That change can have a dramatic implication for the treatment of the outcome of a game. Where before a team with an expected winning probability going into the game of 0.9 who goes on to win by 6 points would see their ranking increased by 0.1, now they would see it reduced by 0.3, which would then be awarded to their opposition. What this method does, which the original method did not, is make a distinction between a team winning, and a team performing to expectations.
In a follow up to this article I will further improve the ELO system by separating home performance from away performance, and post some results of the predictivity of the new system, in comparison to the old one.
Tuesday, May 3, 2011
by Tom Baldwin