Thursday, January 13, 2011

Generalizing Matchup Win Probability

by Andrew Foland
GWP as Brian calculates it is the probability, before the game starts, that a team at a neutral site will win a game against an average team. Which is to say, it is the probability, based on information that existed before the game started, that the team wins a game that in all other respects would present a 50/50 chance to win.

We may find ourselves interested in a quantity that describes the time dependence of GWP as the game progresses. It is not necessarily obvious how to define such a quantity. For instance, one may believe that the GWP of a team is unchanging over the course of a game. However, it is certainly not the case that WP is unchanging over the course of a game. So it’s not obvious that GWP is unchanged. As we will see, in fact it is not.

Brian also calculates a matchup win probability based (more or less) on the GWP of the two teams, including factors such as home field advantage. Let us call this MWP.

Let us next stipulate that once a week, Brian calculates a quantity we will call MWP(0) for each game. That is, it is the win probability at time t=0 of the game. How can we consistently define an MWP at other times, and what might we mean by it?

Let us generalize the concept of MWP to MWP(t) as meaning, “the probability, based only on information prior to the start of the game, that the team will win the game, given that at time t in the game, all other generic indications are that it has a 50/50 chance of winning.” Now, how shall we estimate MWP(t), given MWP(0)?

Let us first stipulate that MWP(0), as defined by Brian, reflects an underlying unchanging quantity that does reflect team quality. (It may do so with more or less accuracy; let us just assume that it does reflect so on average.) The time-invariant quantity that best defines team quality in the advancednflstats world is EPA / play.

So, how shall we put together the concepts of EPA/play, time remaining, and MWP(0) to create MWP(t)?

We previously established that we can connect MWP(0) to winning percentage in an earlier article, through the medium of points. In particular, we showed that the model win probability is not really sensitive to whether we consider a team as, on average, likely to outscore by 9 points between now and the end of the game, or simply spot them 9 points and expect the team, on average, to not outscore their opponent. The crucial phrase there is on average.

Even though we spot the team a nine point lead, and expect them to be neither outscore nor be outscored on average, their winning percentage is not 100%. This is because there are still chance events to come in the course of the game.

There is an entire branch of mathematics devoted to handling these sorts of problems, known as “stochastic calculus”. It’s heavily used in pricing options and other financial derivatives, because the problem is the same: a stock might right now be $9 above its stock price, but the option value is different from $9 because of what can happen over the course of time.

We could write a stochastic differential equation to account for game scoring thus:

dPD=EPA/play * dplay + sigma*dRandom

where PD is the point differential, EPA/play is calculated in the databases, dRandom is a infinitesimal random walk, and sigma measures the magnitude of scoring uncertainty. Any number of financial calculus texts will show the solution to this equation as

PD(Nplay)=EPA/play*# of plays + Sigma Sqrt(Nplay)

Where R is a gaussian-distributed random number. The probability that the accumulated point differential exceeds a given amount P after Nplays is given by one minus the integral of the normal distribution integrated up to a Z score of (PD-EPA/play*Nplay)/[Sigma*Sqrt(Nplay)].

This is getting close to allowing us to calculate a MWP(t). First: if only it had t instead of Nplay! This is easily fixed. Let us simply use some average value of plays / minute in the game, then we can define EPA / minute and rescale Sigma appropriate to time.

Second, and a little more subtly, we need to interpret the quantity PD-t*EPA/time in light of MWP(0). As described previously, we can convert MWP(0) into an average expected points to outscore by, over the course of the game; let’s call that Delta. Delta as function of MWP(0), Delta(MWP(0)), was tabulated in an earlier article. We can calculate EPA/t as that Delta divided by 60 minutes.

Putting everything together, we find

MWP(t)=Pnorm([(60-t)*Delta(MWP(0))/60] / [Sigma * Sqrt(60-t)])

Where Pnorm(Z) is the integral of the normal distribution from –infinity to Z. (It’s related to erf through a collection of exceedingly stupid Sqrt(2)’s).

So, you ask, what is sigma? It can be estimated the following way. Look at pairs of divisional games within the same season, where the two teams play twice. The teams are the same, but the results are not. The differences in scoring margin in the pairings may be taken to estimate the uncertainty. Put all your square roots of two in the right place, and you measure a standard deviation for the uncertainty of 14.3 points.

Finally, we can test the whole procedure! The random stochastic walk model ought to be applicable to the average game WP! In particular, we replace Delta(MWP(0)) with the points found, at each point in time, needed to maintain probability isoclines. Using sigma=14.3, we find the following agreement between the WP model and the stochastic calculus model given here, shown in the plot. In the plot, the “WP Isoclines” come directly from the WP calculator, the “fit” comes from the MWP(t) model:

We find the agreement is satisfactory, capturing the main behaviors of the WP model with no fudge parameters whatsoever. Again, in general, we find things working to within roughly a point, indicating percentage accuracy of a few percent. This indicates we have a pretty adequate model of MWP(t).

1 comment:

James said...

These last two posts were fantastic.

Post a Comment