Sunday, June 8, 2014

Infinite Field Football

by Michael Nahas


Physicists are always talking about ideal environments, like infinite planes with zero friction. What if we extended that to football and had teams play on an infinite field with an infinite clock?

How would it work? One team would start with the ball, drive some number of yards down field, and then turn the ball over to the other team. At that point, the other team would take the ball, drive some amount of yards in the other direction, before turning the ball over to the first team. And so on.

They would go back and forth, except for one strange exception. That exception is when the ball carrier is the fastest guy on the field and he gets through the defense. Can he run for infinity? The short answer is yes. But, for the moment, let's assume that doesn't happen and, later, I'll tell how to handle when it does.

It's pretty obvious that "winning" on the infinite field means that your team, when it possesses the ball, moves the ball further than when the other team possesses it. So, the thing we want to compute is the expected length of a drive.

The most informative way to calculate that is to break the drive into series of downs. Each series either ends in a new first down or a turnover. So let's define:
P = probability of a series ending in a new first down
D1 = expected length of series that ends in a new first down
D4 = expected length of a series that doesn't end in a first down

Now, we can build an equation for the expected length of a drive by looking at the number of first downs that occurred in it.

Exactly 0 first downs occurs (1-P) of the time
Exactly 1 first downs occurs P*(1-P) of the time
Exactly 2 first downs occurs P*P*(1-P) of the time
...
Exactly N first downs occurs P^N*(1-P) of the time

In probability, this is the geometric distribution. The Wikipedia page gives us the mean: P/(1-P). (NOTE: if you check the Wikipedia page, they're answer is (1-p)/p. This because their p is equal to my 1-P.) Now, knowing the number of first downs, we can write the expected length of a drive:

Expected length of a drive = (P/(1-P))*D1 + D4

Now, suddenly, P becomes very interesting. As it gets bigger, the length of the drive grows faster than linear.

P=.75 ---> Expected first downs = 3
P=.80 ---> Expected first downs = 4
P=.85 ---> Expected first downs = 5.666

Regarding the other factors, I don't expect the "expected distance to a new first down" to vary much between teams. Team with better offenses will produce more yards but some of those yards will go to reaching the first down in fewer downs. The "expected distance without a first down" will probably only differ between teams that punt after 3 downs and those who "go for it" on fourth.

When should a team "go for it"? Well, there we need to know the chance of making a first down. If that times the expected length of drive is longer than your punt, the team should go for it. Teams that "go for it" on fourth will have a larger probability of getting a first down, which we saw above, is the key metric in infinite field football.

What does this mean to finite field football?
The first conclusion to draw is that first down percentage and drive length are key stats to look at. We all knew they were important before, but I think they are worth considering as _the_ team metrics to look at. Likewise, for defenses, their affect on first down percentage is important.

My second conclusion is that first down percentage has a non-linear affect on drive length. A lot of analysis techniques assume linear relationships between terms. We need to be careful about how we apply those techniques.

My third conclusion is actually a supposition. I believe that EPA by field position is based on the Ps of the two teams. Brian Burke in 2008 said it wasn't linear. The graph he produced was based on data for all teams, but he also said "Another complication is that various teams have different curves." I think with Ps, we can determine what EPA by field position should look like from a theoretic point of view.

What no data?
This post doesn't have any data. I spend my days programming, so that muscle is exhausted when I have time off. So, if you like the idea, I'd love to see data too.

If we took a team's downs between their own 10 and the opponent's 20, I think we'd see some clear trends. Monte carlo techniques at the series-of-downs level (or at the down-by-down level) could be used to estimate the length of drives. Long plays that resulted in touchdowns (that is, those infinite runs on the infinite length field) could be sidestepped by computing the median length of drive rather than the expected. I'll be interested to see if the data is significant enough to measure differences.

Another important thing to measure is the finite field aspects. How well does an estimate of performance on an infinite field predict performance inside your own 10 and inside their 20? Brian Burke in 2008 said that QBs weren't statistically significantly better (or worse) inside the red zone. Good outside was good inside. Is that true for the whole offense? If so, I think the infinite field model is worth keeping around.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.