Thursday, December 20, 2012

Kelly Criterion on 4th down

by Tunesmith
It's 4th and 1 from your opponent's 43-yard line. You're up 3 points, and there is 5:16 remaining in the 1st quarter. Should you go for it?

According to the 4th Down Calculator, the answer appears clear. Based on history, there is an estimated 74% chance of converting a 4th down in that scenario. Success yields 0.68 WPA; punting yields 0.61 WPA, and failure yields 0.55 WPA.

These odds tell you that on average, it is a good decision to go for it - just the same as on average you'll make money if you take a bet with those odds and that probability of winning. The "Expected Value" (EV) in this scenario is 3.62. This means that on average, you will gain .0362 WPA by going for it.

However, average doesn't always cut it. Because if it's not certain, you could still lose.

EV enthusiasts often object to that observation, but let's briefly consider an alternate scenario. Pretend that you come across a certain state lottery. For $1, you have a chance of winning $500,000,000 profit. And your chances of winning are 1 in 350,000,000. (Also pretend, for the sake of argument, that there's a new identical lottery every second, and there can't be multiple winners in a round.) Since 500 > 350, those are good odds. Say you can play only once. Should you buy a ticket? What if you could play multiple times, or buy multiple tickets? Should you spend your $5,000 in hard-earned savings on lottery tickets?
The answer is, of course, no. If you buy one ticket, you'll probably lose. If you spend $5,000 on one lottery, you are still more than 99.99% likely to lose. If you spend five MILLION dollars on lottery tickets, you are still roughly 99% likely to lose. In all those cases, you're very likely to exhaust all your bankroll before you can reach the expected value, unless you are already extremely rich. And this is all true even though the lottery game has an obviously positive EV.

So clearly, a positive EV isn't enough. What else do we have to factor in if we make a decision based on EV? This illustrates two key points about Expected Value:
• There has to be an expectation of multiple rounds, probably very many.
• You have to be able to afford the ideal bet.

To calculate the ideal bet, one tool to use is the Kelly Criterion. And in a normal bet, if you bet too big, you might lose enough of your bankroll that you can't bet again in a later round. If you are faced with a +EV scenario, such as a bet with good odds, the Kelly Criterion tells you how much you should bet as a percentage of your bankroll. The Kelly Criterion formula is (pb - q) / b. p and q are the odds of success and failure, respectively. b is the odds. The reason the Kelly Criterion works is because it is mathematically calibrated to maximize one's expected growth rate. It will eventually outperform every other strategy of how to bet when the odds are in your favor.

So how could this be used for football? Well, a football team wants to maximize their "football dominance", as in their chances of winning. Let's look again at the 4th-and-1 situation described above. Treating the punt as the default, risk-free outcome, the 4th-down calculation is in effect considering a bet, where you bet .06 WPA (failure) for a .07 WPA profit (success). The odds become 7/6.

The calculation becomes ((0.74 * 7/6) - 0.26) / (7/6) = 51.7% In other words, in a bet with those odds and chances of winning, you should feel comfortable betting 51.7% of your bankroll - or less, to be conservative. But NOT more, because it can exhaust the bankroll too quickly. In fact, if you bet more than the Kelly Criterion suggests, your average expected growth will be negative even when the odds are in your favor!

So if you're John Fox facing 4th and 1 from Baltimore's 43-yard line, and your Broncos are up 3 points, and there is 5:16 remaining in the 1st quarter, you should feel comfortable betting 51% of your bankroll. But what is a football coach's bankroll?

That's tougher to gauge. Maybe a bankroll is defined as how many times a coach can make (and fail) a controversial football decision before being fired or losing credibility. That could make sense, because a secure coach would feel more latitude to make risky decisions than one that is on the hot seat. After all, "no one ever got fired for punting on 4th down", as the saying goes.

But maybe the bankroll is something else. Let's look again at what Expected Value really means. It means that on average, your results will be in line with Expected Value. If you miss some early on, that's okay, because you'll catch up later. It will average out in the long run. However, the long run is cold comfort if you miss while on the verge of the playoffs at 10-6, and "catch up" during a pointless 5-11 season.

The real rational goal of a football coach is to maximize the chances of winning *each* particular football game. If that is what the coach is trying to maximize, then the bankroll could be described as how much influence a coach has over one particular game. They are trying to convert their influence into WPA. And while EV assumes the presence of a long run, in reality, a coach's influence over a particular game is rather limited. One or two failures might be sufficient to guarantee a loss.

This gives new insight into why a coach might *justifiably* want to punt the ball even if the EV suggests he should go for it. And it doesn't rely on the common explanations of momentum, or gut feeling, or unquantifiable "context". It shows that if a coach's goal is to use Expected Value to maximize the odds of winning *each game*, the coach's "bankroll" of "coaching decisions" might not be large enough to justify taking the bet. After all, if there might only be one +EV 4th-down scenario in the game, that would be betting 100% of the bankroll. Additionally, to be reasonably sure that it is a safe bet, the coach would have to feel comfortable that they face a high enough number of "rounds" to have a reasonable chance of reaching their EV.

Even beyond the bankroll, when we're talking about the context of one game, we're starting to run afoul of one of the main requirements in paying attention to EV: The need for multiple rounds.

Whipping out the old probability calculator, we can see that our 74% bet has a high bar to clear.
• To be 99% sure that you will eventually win a 74% bet, you'd have to face the scenario not once or twice, but four times. And even then, if you lost the first three times and won the fourth, you would still be "under water" in terms of EV.
• To be 99% sure that you will break even in terms of EV (this just means a net positive, not that you will average your EV of 3.62 WPA points), you'd have to face the scenario eight times.
• To be only 90% sure that you will eventually reach cumulative EV (that's 3.62 WPA points for each round), you'd have to face the scenario sixteen times.

From *our* perspective, as stats admirers and football fans, we are looking at it in the context of many coaches, many plays, many seasons - many rounds. Our "bankroll" is effectively infinite and there is no cost if we're wrong, so we're right that plays should always be called in accordance to EV. But when you're a coach looking at it in terms of 1-2 seasons of employment, or just one game's worth of plays, it really does change the equation in completely valid ways. The next time you see a coach choose to kick on 4th down even when the +EV says otherwise, he might actually be making a probabilistically valid decision.


tunesmith said...

Just got my question answered from . To be 99% sure of reaching cumulative expected value in that scenario, you have to take the bet 1520 times.

Mike said...

Other thing about Kelly criterion or bankroll management is it requires you to have an edge. The idea is that you give up maximum EV in the short run to reduce the volatility of a decision to reduce your risk of ruin to zero. The Kelly criterion is designed for an unlimited amount of bets or maximizing your long term growth as The number of bets approach infinity. In the lotto example it would be more relevant if you could bet the same exact lotto conditions $1 at a time, or millions at a time. But the ticket drawn would have to be random otherwise you could guarantee a win by buying enough tickets and avoiding duplicates. Now if you had $1B simply buying 500M tickets wouldn't guarantee you a win, or even $1B. So you would actually reduce the amount you risk. However only betting a dollar wouldn't yield you a very large return on that $1B bankroll even if the ticket itself is EV. The idea of the Kelly might be to find a balance. Sometimes you have an absolute minimum bet allowed, such as a $1 ticket and do if your bankroll takes a hit you can't start using $.01 tickets. Sometimes your bankroll isn't large enough to justify the minimum bet because if you keep repeating that bet you will go broke due to volatility.

Mike said...

I guess in football the problem with using the Kelly criterion is that this is a one size fits all model at the moment with regards to WP and thus the team doesn't have a calculable edge. The scenario would be more adequate if a team like the 2000 ravens had a huge edge running the ball and stopping teams but were down 5 points on their own 20 yard line on 4th and 1 with time running out but still plenty left. Perhaps 3rdQ with 4M left. They should punt even if it is lower EV than to go for it because of risk management. In this case they will be likely down 8 points, possibly even 12 or 13 if they fail and they would have to deviate from their gameplan and take more risks to get back to even in time. That means passing the ball and they simply are not built to do that well. They would have a lower EV passing than running I believe, and of their high risk fails they soon cut their effective bankroll down to beyond where they can recover. If they punt they get the ball back and they keep running and have a chance to eventually put up a TD.
The Kelly criterion would be relevant in poker if you had entire bankroll at risk. Say you were 80% to win with aces. You would want to risk 60% of your bankroll. To lose 95% requires a 2000% gain to compensate. If you win 4 times in a row then lose the 5th by doubling up the amount at risk where are you at with 90% of bankroll at risk each time? 80% 40%? You can try this yourself and see that "less" is sometimes more. I am not entirely convinced this plays a role in this particular example in football... A few reasons...
1)it's not as if a team has any traditional form of economic risk or value that can be wagered. Risk perhaps the same, but if you were 90% to convert a first down, how could you increase the weightings and risk more? If you were 30% how could you risk less? You can take other or lower variance strategies and balance the variance, but how do you measure that?
2)The bets cannot be made at will. I'd you fail a 4th down you can't simply take a loss and immediately try again. There is a time limit. True, WP adjusts and function as bankroll and as time decreases the bankroll may decrease or increase if you have a lead or not, but still.
3)defining the bankroll cannot easily be done.
I think risk management is still important, and Kelly criterion is a great tool of risk management, but finding actual utility for it in this case may be a bit more difficult.

Nate said...

I've thought about this a little, and may have posted to the effect elsewhere, but applying the Kelly Criterion to fourth down decisions is incorrect: The Kelly Criterion is appropriately used in situations where you can (1) choose how much to wager, and (2) make as many wagers as you like. For a fourth down decision, neither of those is true: You only get one chance (barring penalties) to convert, and it's always the entire game that's on the line.

In addition, the normal EV models (ones that are
about the chance to win the game) by their nature
already incorporate the potential future opportunities to win the game. Assuming that the conversion chance, EV model, and primacy of winning the game are all correct, there's no valid justification not to go for it (or punt, depending on the numbers)

Instead of a poker analogy, here's a blackjack one: Early in a deep deck game, you've got 10,6 and the dealer's showing a 7. Do you hit, or stand? Does it make any sense stand as the "safe low risk option"? Or to conserve your "influence over this particular game"? Or to make some variance calculation using the marginal chance to win by hitting? Or do you think 'hitting only makes sense in the long run'?

No! Assuming you like money, as soon as you're confident that your chance to win by hitting is better than your chance to win by standing you hit.

The same way, if it increases your chance to win, you should go for it on fourth down. Now, in blackjack, our predictive model is very strong, and in fact, relatively common knowledge. By comparison, we're not really that certain how good our football model is, and it's relatively specialized knowledge.

After musing, I find myself thinking that the fourth down decision may be more about discounting behaviors than about variance aversion.

Topher Doll said...

Great look at the idea of "football capital" that coaches have and probabilities, interesting stuff tunesmith.

Anonymous said...

In blackjack, if the odds say you're better off hitting, you hit. The Kelly Criterion indicates how much you should bet - not whether you should hit. You don't conserve your influence over the gain by electing not to hit - you conserve your influence by limiting the size of your bet. It's not that hitting only makes sense in the long run. It's that it doesn't make sense in the short run to blow your entire bankroll on a bet that has good odds but isn't certain.

Luther said...

I might be totally misunderstanding the concept of a 'bankroll' as used by the author, but could the bankroll simply be the number of remaining 4th down attempts in a given game?

Here's an example I was imagining: It's a tie game at the final two-minute warning and your team has a 4th and inches call on your own 30. WP total and EP total both indicate that you should definitely go for it. Personally, I'm about sticking to the numbers.

However, at this point, with only two minutes remaining, you likely only have one more (4th down) bet to make: this one. If you fail, you're likely to lose your stack.

If you were faced with the same decision on your opening drive, then you might be more tolerant of variance knowing that you're likely to have future opportunities to bet.

Post a Comment

Note: Only a member of this blog may post a comment.