Two-Minute Drill: Value of the Unexpected Run

Should offenses be running the ball during two-minute drills when no one sees it coming? An in-depth statistical analysis using R Programming

Aug 20, 2023

Introduction/Summary

Among the plethora of reasons you might remember the ludicrous final play of the 2022 Raiders-Patriots game, there’s one that might slip from your mind: before everything fell apart, New England ran the ball. At their own 45-yard line, with time for only one more snap, the Patriots eschewed the traditional Hail Mary or “hook-and-ladder” attempt in favor of a draw to Rhamondre Stevenson. And for all intents and purposes, that part of the play was reasonably successful. With the Raiders in a “Prevent” defense and only having five players lined up within 15 yards of the line of scrimmage, Stevenson picked up 23 yards before the portion of the play that Patriots fans will choose not to remember.

But as it pertains to that play specifically, there’s one key aspect worth noting: the Patriots were in a tie game. And while Jakobi Meyers completely disregarded this aspect (after all, they got off the bus tied), it evidently was a factor in New England’s initial play call, as the Patriots did not need to score to keep their chances of winning alive. But suppose we instead consider the more conventional two-minute drill, where the offense needs a score to either send the game to overtime or win it right there. Is there value in “zigging where the others zag” by running the ball in what the defense perceives as an extremely clear passing situation? Using data from NFLFastR, I attempted to find out.

I started with all of the play-by-play data available via NFLFastR for the past 17 completed seasons, including playoffs (2006 to 2022); I’ll explain why I specifically started at 2006 later. I removed all pre-snap penalties, QB kneels, spikes, special teams plays, and two-point conversions, resulting in a set called pbp_ProjectPlays (size of 595,361). I then filtered that into a set of plays where the offense was trailing by one possession in the final two minutes (size = 10,148). For each of these groups, I analyzed how variables including but not limited to yards per play, Win Probability Added, and eventual win rate varied, based on how obvious of a passing situation the offense was in. The full R code used for this project can be seen on my GitHub profile here.

The “too long, didn’t read” summary: as football intuition would suggest, designed runs tend to gain more yards in two-minute drills than they do otherwise, while dropbacks aren’t as effective during two-minute drills as they are during other points of a game. However, the increase in yardage on rushes in these situations isn’t always accompanied by an increase in win probability – partially because defenses play softer against the run when they know a high-yardage run won’t hurt them, and partially because designed runs kill significantly more game clock than dropbacks during two-minute drills. Despite this, even when controlling for variables like clock, score deficit and “xpass” (more on what that means later), teams have won more often following designed runs than they have following dropbacks during two-minute drills. And while coaches haven’t outright ignored the option of the designed run in these spots, they have not utilized them quite as often as is recommended.

How Does a Play’s Efficiency Vary in Clear Pass Situations?

Before diving into the isolated situations of two-minute drills, we’ll start with looking at all scrimmage plays altogether. Keep in mind that I use NFLFastR’s distinction between “pass” vs. “rush” for the entire project, which is based on the intent of a play rather than its result (i.e., QB scrambles and sacks still fall under the “pass” category). With that being said, both for intended passes and rushes, how has the play’s outcome depended on how clear of a pass situation the offense is in?

Fortunately, NFLFastR has a built-in way to help us answer that question. Thanks to the work of The Athletic’s Ben Baldwin, NFLFastR includes a variable called “xpass” (expected dropback rate) for every play since 2006 (which is why my project’s data begins there). You can check out this link for a thorough description of what it entails, but the basic summary is that it provides an estimation of how likely a pass is on any play based on several surrounding factors including but not limited to down, distance and score. (However, it does not account for individual coach tendencies, nor does it account for any personnel metrics, such as how many defenders are in the box.)

With this formula in our back pocket, we can use it for the remainder of the project to quantifiably evaluate how much of a passing situation the offense was in on any given play. And, consequently, we can use it to gauge whether passing and/or rushing become easier in clear pass situations, as shown below:

First, observe the blue curves, which represent yards per play (YPP) across the past 17 seasons. As the “xpass” goes up – in other words, as the offense enters a more obvious passing situation – the average yards per rush rises at a noticeably steep rate. Meanwhile, for dropbacks, the average yardage has slightly decreased as “xpass” has risen, albeit with a trend that’s not as direct as it has been for designed runs. On its own, this is by no means groundbreaking information. Any casual football fan would assume that running becomes easier when the defense is strongly expecting a pass.

But from there, observe the red curves, which represent Expected Points Added (also known as EPA). As it pertains to dropbacks, the EPA trend is similar to the YPP trend; a slight decrease as the “xpass” gets larger, suggesting that it becomes slightly more difficult for the offense to pass when the defense is expecting a pass. However, as it pertains to the designed runs, we see an interesting divergence. Even though YPP rises at a sharp rate as “xpass” increases, the average EPA of a run play is relatively consistent, hovering just below zero regardless of what the “xpass” is. This leads us to a fascinating conclusion, one that will be relevant for the remainder of this project as we focus on late-game plays: on designed runs, more yards does not always mean more EPA, because a lot of the high-yardage rushes might come in situations where the defense isn’t very negatively impacted by those yards (e.g. third-and-long, or the final play before halftime).

If we look at Win Probability Added (WPA) in place of EPA, a similar trend emerges, as seen below:

Once again, the most notable takeaway here is the discrepancy that exists on the left side: more yardage on designed runs does not always mean more WPA. Like the EPA charts, this also matches up with general football intuition; if it’s fourth-and-25, the defense might have eight men lined up 15+ yards off the line of scrimmage at the snap, unconcerned about the possibility of allowing 10-15 yards on a draw play because such a result would not impact the offense’s chances of scoring or winning. This concept is important to keep in mind as our set of plays becomes more concentrated later on.

How Do Time and Score Make an Impact?

We’ve established that when it comes to gaining yards, running the ball becomes significantly easier in clear passing situations, but those yards don’t necessarily correlate with the offense having an increased chance of scoring or winning. Now, how do those conclusions hold up when we factor in the time and score of the game? To dive into this, we can look into the same concepts we did above, but with the set of plays where the offense was trailing by one possession in the final two minutes. For this set of plays, we’ll focus on WPA instead of EPA, because at that point of a game, since not all points are conducive to winning, WPA gives a much better grasp of the offense’s objective. In other words, offenses enter almost all drives throughout a football game willing to score points of any capacity, but if an offense specifically is trailing by five points with one minute left, getting a field goal isn’t very valuable.

With all of that being said, here are graphs that show how YPP and WPA/Play, both for dropbacks and designed rushes, vary based on “xpass” in two-minute drill situations:

Because we limit ourselves to the final two minutes of a game here, the sample sizes are much smaller. This is especially true for the dropbacks in low “xpass” situations, as displayed by the very wide gray confidence interval. In fact, out of 9,182 dropbacks when offenses were trailing by one possession in the final two minutes, only 105 of those came when the “xpass” was below 0.5, which makes sense given that two-minute drills are typically passing situations. The sample size makes it difficult to have any significant takeaways regarding those dropbacks, but for the more statistically inclined readers, we can use two-sample t-tests to get some intel there:

The "true difference in means" refers to the gap between the average values in each stat for dropbacks throughout a game, compared to dropbacks in two-minute drills. The fact that we have very small p-values in both pictures means we can confidently say that the differences in those mean values are significant, rather than being due to random chance. Don't worry about this jargon if statistics is not an interest of yours, as the point remains the same: both regarding YPP and WPA/Play, dropbacks fare significantly worse during two-minute drill situations than they do throughout other times of a game. This shouldn’t be particularly surprising, as conventional wisdom suggests that passing becomes slightly more difficult when the defense expects the pass, such as during two-minute drills. But, as always, it’s valuable to diagnose whether the data either supports or counters what general football intuition would offer.

As for the designed runs, the graph shows us an extremely steep slope in YPP as “xpass” gets larger in two-minute drill situations, which aligns with what we’ve seen elsewhere in this article – if the defense expects a pass, it’s easier to gain yards on a run. But WPA actually has a slightly negative slope on the same graph. This means that, even though designed runs with a high “xpass” are more likely to gain big yardage during two-minute drills, those plays actually typically add less win probability to the offense than running in more conventional “run situations.” In other words, if the defense is playing very softly against the run, it’s because a run isn’t likely to benefit the offense even if it gains solid yardage. In two-minute drill situations specifically, this has a lot to do with the clock. Every second counts when the trailing team is scrambling to score late in the game, and run plays can kill valuable clock even when they result in large gains.

To further demonstrate how the clock can hurt the offensive team when running in two-minute drill situations, we can analyze the seconds elapsed between any two plays, as shown below (with the black brackets representing 95% confidence intervals):

The average designed run has more seconds run off the clock before the next play than the average dropback does by a significant margin. This shouldn’t be surprising on the left side, but it’s notable that the same trend exists even when just looking at plays where the offense has a timeout. One would think that, if an offense chooses to run during a two-minute drill, it’s often because it is aware that it can use a timeout immediately after the play ends. But the data shows that regardless of whether the offense has a timeout, designed runs kill significantly more clock than dropbacks do in two-minute drill situations.

If the previous WPA charts are a little too abstract for you, we can instead look at how each team’s eventual win rate varies based on play type and “xpass,” as shown below:

The route of looking at which team won can often be a dangerous one to take in football discourse, since it's easy to get sucked into the faulty "running more often leads to winning" mindset. It’s somewhat safer here because we're isolating situations where the offensive team is trailing, but there’s still a necessary caveat that a low “xpass” likely means the offense is already in a decent position to win before the snap. For example, first-and-goal from the 3-yard line when trailing by two points with a minute left will have a far lower “xpass” than fourth-and-15 from the offense’s own 20 with the same score and clock. As such, it is best to take the following observation with a grain of salt, but the observation is there nonetheless: as “xpass” increases, the offense’s eventual win rate in two-minute drill situations declines at a sharper rate for designed runs than it does for dropbacks, despite designed runs with high “xpass” values often gaining significant yardage.

While that’s a noteworthy observation, it’s important to stress that the offense doesn’t control the “xpass” on any given play. What it does control is the play type it chooses. As such, a key question is: how does the offense’s eventual win rate vary based on play type, if we control for “xpass?” That can be seen in these two charts:

Here, we have some very interesting findings. The general football claim of “running leads to winning more than passing does” is misleading, because teams who run the ball are usually already in favorable situations (e.g., up by 20 in the second half). But now, take a glance at the top picture above. Keep in mind that we already have isolated situations where the offense trails by one possession in the final two minutes. From those given conditions, if we pick a fixed “xpass” value at any point on that graph, the eventual win rate for teams that have run the ball has almost unanimously been higher than it has been for teams that have passed, with an exception at approximately the 0.80-0.90 “xpass” range. It’s worth noting here that 77.9% of two-minute drill snaps since 2006 have had an “xpass” of at least 0.9, and another 9.7% have had an “xpass” between 0.8 and 0.9.

Does this mean we can outright say that designed runs are a better choice than dropbacks across the board? Of course not; among other things, the very small sample sizes for both play types (particularly dropbacks) in the sub-0.5 “xpass” range led to some very wide confidence intervals, as shown by the gray segments of the top graph and the black brackets in the second graph. But nonetheless, the point holds that when controlling for “xpass” in two-minute drills, overall, teams have won more often following designed runs than they have following dropbacks. (A similar trend exists if we use WPA in pace of eventual win rate, but the confidence intervals are even wider due to WPA’s larger volatility on a play-to-play basis.)

Piece it All Together: Using Regression Models to Make the Right Call

We’ve gotten some very useful information already. Among other things, we’ve seen that designed runs gain significantly more yards in clear passing situations than they do otherwise, but those yards don’t necessarily correlate with increased EPA or WPA values. We’ve seen that passing efficiency drops in two-minute drills compared to other points of a game, and we’ve seen that, despite taking up more time than dropbacks even when the offense has a timeout, designed runs have led to eventual victories more often than dropbacks have in most two-minute drill situations. Now, how do we combine those ingredients to determine the right course of action going forward?

The answer lies in the models. Feel free to skip these next two paragraphs altogether if you aren’t here for the stat nerd terms. But should you care, here’s a glimpse into my methodology. I created two separate predictive models, pertaining to two-minute drill situations specifically: one that predicted eventual win rate for a designed run, and one that predicted eventual win rate for a dropback. From there, I could use those models to take any football situation – say, offense at midfield, first-and-10, trailing by four points with 0:45 to go – and predict the offense’s eventual win rate based on whether it ran or passed in that situation, therefore determining the best play choice for the offense in that spot.

I tested out dozens of multivariate models, including but not limited to general linear models (GLMs), xgboost models, gradient boosting machine models, exponential regression models, and polynomial models, all with various combinations of predictor variables. To evaluate each model and eventually choose the best ones, my validation metrics included: r-squared, mean error, standard deviation, RMSE, variance inflation factors (VIFs), deviance, analysis of variance (ANOVA), Precision, Accuracy, and Recall. But just as important as any of those metrics was the “common sense” test to make sure that any model was reasonable and interpretable. For example, if a model told me that worse field position was correlated with an increase in win rate, I would know there was some issue to address before putting it to use.

How can these models tell whether a dropback or designed run is the better option based on the situation? Here are a few randomly picked examples of how they can be applied:

This is a straightforward glimpse at how the models can help. For example, take the top row: if you’re facing a third-and-4 from your own 15-yard line, trailing by at least four points with 20 seconds left and no timeouts, a pass is recommended. That should seem pretty obvious without a model’s help – after all, if you get tackled in bounds, that game might be over. But there are many more ambiguous situations where the models’ input can be vital. The “Down By 4+” and “Down By 1-3” classifications were made to simplify each situation by whether the offense needs a field goal or a touchdown. This improved the models’ accuracy, and beyond that, common sense supports grouping them together in such a way; being down by one isn’t any different for the offense than being down by two.

But, while those numbers are informative, they aren’t particularly aesthetically pleasing. What might be more effective in a real-game situation is a visually friendly graph telling you what your play-calling choice should be. Here’s an example of what such a chart would look like, with this specific chart referring to first down plays when the offense needs a touchdown. In a real-life game-planning situation, a coaching staff could be provided with various charts that combine to cover all down-and-distance combos, score deficits, field positions, etc., making use of whichever chart was relevant at that specific moment.

This is a direct way to tell coaches what play choice is most likely to optimize their win rate, based on data from the past 17 years. In the specific scenarios we are looking at here, we can see that running is generally a no-go on first down with has no timeouts, but otherwise, designed runs are a relatively valid play even with the offense needing a touchdown to score. You may notice that this graph excludes plays taking place in the final 20 seconds and/or inside the opponent’s 10-yard line. This is because, in those “extreme” situations, models won’t be very practical. If an offense has 0:07 left from its own 30, it’s not considering what a model says to do – it’s going to pick whatever trick play it has practiced that has a prayer’s chance of gaining 70 yards. Likewise, if it’s fourth-and-goal from the 3, the offense will certainly have prepared some “two-point” plays for that scenario.

Graphs like this can tell us almost everything we need to know, but there’s still one more question to dive into.

Have Coaches Been Getting it Right?

We’ve done all this work to evaluate whether designed runs or dropbacks are the preferred choice in various two-minute drill situations. As such, we are now equipped to answer a very relevant question: have coaches generally been handling these situations properly, or have they been either too pass-happy or too pass-averse in these spots? Once again, we can dive into the data to find out.

Each individual dot here represents one play. While the designed runs aren’t quite non-existent, they are certainly very rare, more so than the aforementioned models advised. As such, in the specific scenario of first downs where the offense needs a touchdown on that drive, coaches have indeed ran the ball less often than the models’ optimal rate. To evaluate whether this is true for all two-minute drill scenarios rather than just these specific parameters, we can look at the following table:

For the 10,148 two-minute drill plays in this project, this project gives both the actual ratio of dropbacks to designed runs, and what that ratio would have been if the teams followed the models’ recommendation on each play. The numbers align with what we saw in the immediately above graphs: while passes are still recommended more often than runs in two-minute drill situations, coaches have generally not taken advantage of the option to run as often as they should.

Conclusion/Sources of Error

Like any football analytics project, this shouldn't be blindly obeyed in all possible contexts. Analytics are used properly when they're helping teams make informed decisions in the moment, rather than forcing coaches to disregard all other factors at play. As it pertains to this project specifically, player personnel makes a major impact. This project looked at the NFL as a single entity rather than dissecting any individual teams, but not every offense or defense is exactly “average” in the real world, meaning a team should adjust to the strengths and weaknesses of itself and its opponent.

Additionally, both scouting of tendencies and in-game adjustments play major roles too. For example, if one team has discerned that the opposing defensive coordinator heavily blitzes when leading late in a game, it might exploit that defense’s aggressiveness by throwing the ball more often than a model would recommend. Likewise, during a game, if an offense notices pre-snap that the defense is playing with eight men lined up 15+ yards off the line of scrimmage, it has major incentive to run the ball even if the model, which doesn’t account for defensive alignment, suggests a pass.

The common caveat of small sample size is also worth noting. 10,148 is a large number in a lot of contexts – it wouldn’t be fun if you asked me to knock out that many push-ups – but as it pertains to training regression models, that’s not a particularly large data set to work with. This is especially true for the models that evaluated designed rushes, as there were only 966 of those in the project’s time span. While the models’ coefficients still did make sense – for example, we weren’t told that an increase in score deficit was correlated with an increased eventual win rate – they could’ve benefitted from a greater set of plays to work with.

Another important trait to point out is the unfortunate necessity of classifying every play as either a run or pass. Needless to say, not every play call is that black-and-white, particularly with the explosion of RPOs in recent seasons. It's not fair to label every play as a pass or run as if the categories are fully binary. Likewise, even among the pure designed runs, we don’t have the benefit of knowing what the specific play was. For example, an outside zone run or a speed option has a much better chance of getting out of bounds and stopping the clock than a play meant to operate between the tackles, like power or trap. But, as always, we have to do the best we can with the information supplied to us.

Cole Jacobson is a CFB Game Analyst for Championship Analytics, who has also worked for four seasons as a researcher for the NFL and been a guest analyst for outlets including Football Outsiders and SB Nation. He played varsity sprint football as a defensive lineman at the University of Pennsylvania, where he was a 2019 graduate as a mathematics major and statistics minor. With any questions, comments, or ideas, he can be contacted via email at jacole@alumni.upenn.edu and @ColeJacobson32 on Twitter.

Cole’s Substack

Discussion about this post