A few weeks ago, I wrote about MLB’s new bat tracking metrics and how they might fit into a framework we can use to evaluate hitters. I also covered some of the pitfalls that can come with interpreting some of these metrics.
Today, I’ll dive a little deeper. I’ve put together a model that strips away some of the noise around bat speed. It’s not perfect, but I think it’s a helpful starting point to get into the nuances of what makes hitting so much more complicated to evaluate than pitching.
I’ll use that model to identify a few rough categories of swings: the pure “guess swing” and a second type that I’ll call the fooled/redirect/caught-in-between swing. And I’ll close with the Biggest Swing of the Year.
Modeling Bat Speed
There’s an analogy that I’ve seen thrown around recently: bat speed is to hitting what a radar gun reading is to pitching.
I don’t think that’s quite right, though. Hitting is so much more complicated to analyze than pitching is. Context is not super important to interpreting pitch velocity. If you were offered a choice between a 100 mph fastball and a 93 mph fastball, you’d take the 100 mph more often than not. Just take a look at last year’s #1 overall pick.
But the same can’t be said for bat speed. Given a choice between two swings (one with top-end bat speed and one with mid-range bat speed), I'd say I need more information. What pitches are both hitters swinging at? In what counts? We need to know the hitter’s intent and the pitch he’s reacting to before we can really make an assessment.
The only way to truly know a hitter’s intent would be to ask him after each swing. But we can strip away some of the external factors that affect bat speed and are out of a hitter’s immediate control:
The pitch type (the speed and shape of the pitch)
The location of the pitch
The count
The contact depth/length of the swing. (A swing that catches a pitch out front will be faster than the same swing that catches a pitch deep. The longer the bat moves, the more it can accelerate. We’re just accounting for timing here.)
In addition to these important factors, we know that a hitter’s top-end bat speed is limited by his size, strength, and swing mechanics. For 6’7” Oneil Cruz, you might clock a partial swing at 70 mph. For 5’11” Andrés Giménez, 70 mph is what you’d see on a full swing.
To handle these situational factors and account for wide ranges in bat speeds across hitters, I set up a model for a hitter’s bat speed as a percentage of his own top-end bat speed1. The Cruz/Giménez example suggests that we can’t really predict the same bat speed across hitters for the same kind of swing. To better analyze individual swings, it makes more sense to translate raw bat speed of 70 mph into a 90% swing for Giménez and a 75% swing for Cruz.
The full model and the nerdy details can be found at the bottom of this post. But here’s the top-line takeaway: almost 50% of the variation in a hitter’s bat speed (on a given swing) can be explained by the situational factors referenced above2. After we control for these, we can get a better idea of what a hitter’s intent might’ve been.
Here’s a quick plot of the model’s predictions for bat speed (again, always as a percentage of a hitter’s top-end speed) compared to the actual reading:
The diagonal line tracks bat speeds that line up with the predicted value. Most swings fall into the 85%-95% range. Hitters also take check swings that are 25%-50% of their top-end speeds, though - I’ve cut those swings out of the visual. On the whole, the model doesn’t do a terrible job.
With this model in place, we can start to compare actual bat speeds to what we’d expect given the context. And that comparison can lead us to identify a few swing types, such as:
The guess swing
We’ll start with an easy one. 0-0 curveball, Yordan Alvarez:
The average left-handed hitter takes an “88% swing” at this 0-0 curveball. For Alvarez, that’s 73 mph. Alvarez takes an 82 mph swing, though. He was guessing curveball and he got one.
This category (as I’m defining it, at least) covers a swing that reflects the following:
The hitter is “sitting on” a particular pitch
The swing reflects an “I don’t care how bad I miss if I’m wrong” type of intent
How do we quantify these attributes? To capture the “I don’t care how bad I miss if I’m wrong” piece, I’ve filtered to swings that are at least 10 percentage points harder than we’d expect for a hitter in the same situation. Hold this definition loosely, though - it’s probably a little high (plus, this is just for fun).
The first bullet point suggests that we should look at hitters’ counts, because a hitter only really has the luxury to guess when he’s in the driver’s seat.
For now, those are the two filters I’ll use. They’ll pick up some swings that aren’t guess swings, too - hitters swing hard for other reasons. But I think the leaderboard below is an interesting one:
I’d expect to see Bryce Harper’s name up here. Guessing is a big part of his game; he’ll take big cuts at pitches that most hitters would show no interest in. But as pitches like this 0-0 curveball demonstrate, a successful guess requires getting both the pitch type correct and getting that pitch type in a hittable location.
This curveball would’ve been hammered if it were a strike - Harper guessed right. But it wasn’t thrown to a hittable location.
Corbin Carroll is an interesting name to see near the top. Quotes in April/May suggest that Carroll was chasing bat speed in his swing - maybe a little bit too much. By his own assessment, Carroll was over-rotating in his load (showing the pitcher more of the “Carroll” on the back of his uniform than he was in 2023). One effect of that coil-like move has been a flatter swing for Corbin, but his hitting coach identified something else, too:
“The more we turn back to the catcher, the more we gotta open up,” Mather explained. “We gotta make earlier decisions. All kinds of things can happen with that — velocity becomes more of a challenge. That was something that he was feeling.”
“Gotta make earlier decisions” equals “gotta guess”:
Carroll would’ve been done in by anything soft here.
A few other names at the top I find interesting: Marcell Ozuna’s approach sometimes mirrors a right-handed Harper. And Bobby Witt Jr. is having a breakout year, hitting the ball much harder than he did in 2023.
What about the other end of the spectrum? Who doesn’t guess?
Here are a few (number of “guess swings” in parenthesis):
Freddie Freeman (1)
Manny Machado (1)
Austin Riley (1)
Freddie Freeman doesn’t make a decision to swing before the pitch is thrown. He doesn’t typically make a decision right as he gets his first look at the pitch, either. He waits a really long time relative to other hitters:
This is a mid-swing adjustment: the good kind. And it provides a good segue into another category of swing…
The catch-and-redirect swing
The list above was full of hitters who swung harder than the average hitter would (given the situation). When a hitter swings softer than we’d expect, we can probably assume one of the following:
The hitter is taking a defensive swing (a two-strike approach)
The hitter is fooled by the pitch and/or location and is making a mid-swing adjustment
The hitter is a unicorn and makes ultra-late decisions (like Freeman)
I think the second explanation is most interesting, so let’s try to define the parameters of that one3. It will include (a) swings that were 10 percentage points softer than expected, (b) harder than a hitter’s “70% swing” (to rule out check/half swings), and (c) did not occur in a two-strike count.
This is the kind of swing I’m talking about (click the video to play):
Here, José Altuve is fooled. He starts his swing in this 1-1 count thinking he’s getting something else (a fastball, probably) and redirects to guide his barrel to the ball. According to the model, the average right-handed hitter who swings at this pitch takes a 90% swing at it (probably because the average hitter doesn’t make this type of adjustment). Altuve took an 80% swing (63 mph for him).
Altuve does this kind of thing more than any other hitter:
Altuve can put the ball in play on swings like this one, but for most hitters, the caught-in-between swing doesn’t usually end with a positive result. Here’s a more common outcome:
There’s a tiny hitch in Matt Olson’s swing as he adjusts to this 2-1 curveball. It might be hard to pick up if you aren’t looking for it, but Olson takes a 68 mph swing here (an 83% swing for him). The average hitter (one who doesn’t add the “adjust to breaking ball” hitch) takes a 94% swing - for Olson, that’s 10 mph harder this swing.
What next?
The swings above might prompt a question for you: which of these swings are “good”? Is it better to misidentify a slider and swing hard right through it, or to catch yourself and take an Olson-like swing (but still miss it)? Is guessing a good idea?
I think we have a long way to go before we can try to answer those questions. If anything, I hope these examples suggest that this new data can help us better understand how individualized hitting is.
That’s not a very inspiring conclusion, so I’ll close with the Biggest Swing of the Year After Accounting for Context and a Hitter’s Top-End Bat Speed. It’s Elehuris Montero on an 0-1 breaking ball that other hitters (those who swing at it, at least) would reconsider mid-swing. Not Montero:
Expected Bat Speed (for Montero): 63 mph
Actual Bat Speed: 86 mph
Big hack! Have a great weekend.
How do I define each hitter’s top-end speed? I took the average of the top 5% of each hitter’s swings.
The model outlined below explains 48% of the variation in a hitter’s bat speed (as a percentage of his top-end speed).
I used a generalized additive model (a GAM), which is the same type that’s used to model other popular baseball models - like the probability a pitch will be called a strike or whiffed at.
A few notes on the code pasted below:
Plate_X and Plate_Z are the x/y coordinates for the pitch
The model’s second argument is a combination of pitch break (horizontal and vertical) and velocity.
The third argument is the identity of the pitcher (used to approximate the effect of a pitcher’s broader pitch mix on hitter decisions)
The last argument is the hitter’s “swing length” (the new MLB metric) but measured relative to the hitter’s top-end (the longest 5% of his swings). This feature is used to account for point at which the “bat speed” snapshot is taken in the swing, and it is (by far) the most important feature in the model.
I think handedness is so important that I split this model into two (one for RHH and one for LHH). To not over-complicate things (to end up with two models instead of four), I only used data from right-handed pitching in both models.
gam(hitter_bat_speed_percent_of_max ~ s(plate_x, plate_z) + te(pfx_x, pfx_z, release_speed) + s(pitcher, bs = "re") + count + hitter_swing_length_percent_of_max
The third category is interesting too, but we can’t really separate those swings from the second one in the data. You’ll find Freddie on the leaderboard…
Thanks. That was interesting. Appreciate ya
Great stuff. Visuals were a huge help too.