Mining Your Own Business Podcast

Episode 8 - Data Analytics for Better Baseball with the Cincinnati Reds ft. Nick Wan

Data Analytics for Better Baseball with the Cincinnati Reds

This episode features Nick Wan, Director of Analytics for the Cincinnati Reds Baseball Team.

Evan and Nick’s conversation covers the value sports analytics provides to everyone involved, from hiring personnel to coaches to players.

You’ll hear about how specific baseball rule changes affect analytics and the accessibility of sports data to analysts and fans alike.

You’ll also learn how Nick’s work landed him on the front page of the New York Times and what he envisions for the future of sports analytics.

In this episode you will learn:

  • The breadth of analytics usage in the sports world
  • The demand for data that is applicable and actionable
  • How public research can add value to the analytics world
  • The importance of critical thinking to ensure sound methodology

Learn more about why we created the Mining Your Own Business podcast.

Nick Wan | Guest

Nick Wan is the Director of Analytics for the Cincinnati Reds Baseball Team. Nick has previously worked as a data scientist for the Reds, and he has also helped start and manage the analytics team at KFC.

He has a PhD in Psychology from Utah State University.

Nick actively shares analytics content on YouTube, Twitch, and Twitter, and his work has been featured on the front page of the New York Times.

Follow Nick on Twitter

Evan Wimpey | Host

Evan Wimpey is the Director of Analytics Strategy at Elder Research where he works with organizations to transform deficient data into tangible business value that advances their mission.

He is uniquely suited for this challenge by pairing his professional experience in management and economics at high-functioning organizations like the Marine Corps and Goldman Sachs with his technical prowess in data science. His analytics skillset was strengthened while earning his MS in Analytics from the Institute for Advanced Analytics at NC State University.

Evan almost always has a smile on his face, which is at it’s widest when he is helping organizations use data in innovative ways to solve complex problems. He is also, in a strictly technical sense, a “professional” comedian.

Follow Evan on LinkedIn


Key Moments from this Episode

01:17 How Nick got into sports analytics from a neuroscience background
05:49 How sports data analytics teams are organized functionally
08:28 Who are the end users of sports analytics?
12:00 Demand for applicable data
13:56 Accessibility of sports data
18:21 The value of the public research field of sports analytics
24:18 How do rule changes in the game affect analytics?
29:33 Neuroscience: the future of sports analytics?

Show Transcript

Evan: Hello, and welcome to the Mining Your Own Business podcast. I’m your host, Evan Wimpey. Today I am excited to introduce Nick Wan. Nick is the director of analytics for the Cincinnati Reds baseball organization. He studied psychology where he got a Ph.D. from Utah State University. He’s worked as a data scientist with the Reds organization. He’s also helped start and lead the analytics team at KFC. And now he directs analytics for the Cincinnati Reds. Nick, thanks so much for coming on the show. How are you?

Nick: I’m good. Thanks for having me.

Evan: Hey, absolutely. I gave the folks a super brief background, but maybe you could give us a little bit more about how you got into analytics and how you got here to the Reds organization.

Nick: Sure. That journey started not too long ago, I suppose, 2015 while I was in grad school, I was blogging like a lot of us in the sports analyst community have done. And I was primarily just trying to learn how to convert some of the stuff I was learning in my grad program into something that was something that people would be able to understand. And it wasn’t like, you know, the electroencephalograms and, you know, different things related to neuroscience. So to be able to communicate the stuff that I wanted to in my dissertation, I decided that a good way to really understand what I was working with was to use it with different data. And I used, so I applied a lot of the statistical methods. I learned sports data, primarily college basketball at the time. And I had this one blog post about Arizona State University’s Curtain of Distraction. And if you’re unfamiliar, for those wondering about the Curtain Distraction, it’s a student section at ASU and their whole purpose is to really distract the free throw shooter as the free throw shooter, shooting into the student section. I think, I don’t know if it’s the first half or the second half, but in any case, I wanted to know if there was an effect of their student section on away team shooting. And I put a blog post out there. And Justin Wolfers is at the New York Times, read it, and decided he wanted to put it on the upshot. And then after it went on the upshot, a few days later, call it a slow news day, it ended up being on the front page of the New York Times. So at that point, I figured maybe sports analytics as a career is actually a lot closer to my personal interest and also a career that’s attainable for me, as opposed to neuroscience, even though I was already in the process of earning my PhDs. So I decided to start going to conferences and talking to people in sports and analytics and in 2016, 2017 put my resume out there and got a bunch of calls back and decided to join the Reds.

Evan: Yeah, super exciting. And I’m sure you’ve been on the front page of the New York times on a weekly basis since then a lot of data scientists just, it’s probably, that’s a great way to kick off the career is well, I was on the front page of the New York Times, so I figured maybe it was a good fit for me. Is there a quick resolution, Arizona state, was it useful? Their fan antics?

Nick: I think it is debatable personally. I didn’t find a significant effect, but Justin did say that there is a significant effect based off his calculations. I lean more towards no effect, but who knows it’s been, I don’t know, almost seven years now. So maybe there’s a larger sample size.

Evan: Yeah, got some good data refresh there. I appreciate the skepticism too, before jumping into conclusions and we’ll link the blog post there in the show notes of this page so folks can go and try to draw their own conclusions there. But that’s a really cool intro story. You’ve gotten yourself to the Reds now, and I’m a baseball fan. When I think analytics at a baseball organization, I think the product that’s on the field and the analytics that can help drive decisions there. But it’s also an organization that has a lot of the same needs as other places. So I’m curious, do you touch just the baseball piece or are you also working with sales, marketing, stadium, operations and traditional business, things like that?

Nick: Not in my role. Baseball analytics in particular, I would find it pretty – it would be very rare if someone in baseball analytics was also doing business analytics-related projects in other sports that is definitely more common. I think it’s becoming less and less common simply because knowing our database team, which does work with both the business side and our baseball team, they always mention how there’s far more baseball data and requests that are coming from us than there are from business or ticketing or any of our pro shops or whatever.  And that’s simply because the investment is in the baseball product. And so we have all sorts of different data that needs to be analyzed from the baseball side. Whereas the business side they also have a bunch of data, but perhaps their needs aren’t necessarily, well, I don’t know. I don’t wanna speak for the business side, cause I really don’t know, but their needs definitely are different than our needs. And we tend not to cross paths, but I know I have friends in basketball and in hockey who do both, like I’ve had friends in hockey who are literally splitting their time between the business side and the hockey op side. So it is more common than not. But I think as data grows as more teams adopt more data sources and the demands from coaches and staff become larger in terms of player operations, then more than likely you’re gonna see that division in all sports.

Evan: Sure. Yep. That makes perfect sense. And yeah, we talk a lot about with folks that are trying to start up a data science organization and whether to embed data scientists with a specific function or sort of have this broad centralized team that touches a lot of things. There’s probably not a whole lot of overlap in the product that’s on the field and the traditional sales or ticketing. But you mentioned demand coming from coaches or players, and I’m curious. Who really are your end users of your analytics? And I guess I’m thinking my instinct is, it’s sort of the front office, that’s making personnel decisions, but I guess it could reach down to touch the coaches or even the players. So I’m curious what your interaction is like with those various folks.

Nick: I would say analytics, any analytics we produce in some form, it will touch everyone. So that’s everyone from the player all the way up to our GM. I think when a lot of teams tend to start their analytics programs just hearing from other people around the league and in different sports, they tend to focus on the front office first because it’s a smaller group of people. And the needs are, you know, pretty high when it comes to free agency acquisition or draft or player valuation going forward. So, it tends to be easier to convince the one or two people who you might be working with, who make those decisions on analytics behalf, as opposed to the entire coaching staff. And in baseball, it’s not just our major league staff, but it’s also all of our minor league affiliates and then all the trainers and all the assistant coaches as well. So we’re talking about a pretty large staff of just coaches and they all have their own opinions of what makes great baseball players.
So getting alignment there is obviously its own job. And, technically it’s my job, but the goal is really to produce work that is not just strong statistically, but also applicable. And there’s a tool that helps us apply it. And really, we have coaches and staff as part of our development process, because if they’re not there at the ground floor of the development cycles, then it’s hard to present someone with a brand new tool that’s supposed to do something and they don’t know what it is, right. Or they weren’t expecting it. So, our resources do touch everyone in different ways. Even all the way down to the player level, you know, they’re getting advanced reports and those events getting reports, you know, they’re reading. And a lot of that information comes from analytics teams. So no one can avoid it, I suppose. No one, no one can hide from the analytics here.

Evan: Yeah, I told myself I wasn’t gonna bring this up, but it’s so hard. And I think if you have a pop culture reference to analytics, Moneyball is the thing that so many people are familiar with. I know this is 20-plus years dated now, but just have the visions burned into my head of “here’s what the analytics says. Well, I’m a scout and I don’t believe that, I’m old school.” So I’m curious if you still experience it. Do you get a lot of pushback from folks that say “I don’t really care what the data says. I know that this is the way things should be”?

Nick: In a different way now. And that different way is simply because every single person wants data. And so there’s no one out there who says data is bad. Like that’s just, that doesn’t exist. And call it, you know, changing at the time or the era or call it just, you. People hiring specific knowledge sets or skill sets. I’d say the most important part of it is like we are hiring a lot more people who are analytically driven and that’s, industrywide, that’s not Reds-specific. So industrywide, you’re seeing a lot more people who can dive into numbers and understand things from a statistical point of view. But that’s not to say that every single person is as easily convinced by, like a fan graphs article that comes out there. A lot of people might want more, you know, evidence against a particular point. They might wanna see it over time. There’s so many new data sources that are being introduced in baseball. Every time a new data source comes in, it needs to be vetted. We need to understand the, like, what is truly the signal from a data source? What, how meaningful is it in the world of everything else we consider in terms of data? And that’s where you typically get the pushback now. It’s “how does this help me make a better decision?” Not necessarily, you know, “why would I use this over something else?” There’s not a black and white, it’s far more of a spectrum of information.

Evan: Certainly. Yeah, that makes perfect sense. And then I wanna ask about maybe about the data sources baseball is I think one of the places it’s a hobby of a lot of, I think analytically inclined folks, and there’s a lot of data that’s out there for Joe fan who wants to pull in some data and do some analysis. So I’m curious how much of the data that you use and you’re trying to decision on is something that’s either publicly available for somebody like me to hop on or is at least shared across, like you have the same information that the Cubs and the Cardinals and the Pirates have access to, or do you know, for whatever you’re allowed to say, do you compete on the data that you make available as well? The data that you collect and use?

Nick: On the very latter part of it, we do. Every team has some sort of proprietary or internal type of data and also their own style of doing their own analyses. And so every team is working on a solution to questions that exist that don’t necessarily relate to on-field performance or things that you could track in the box score so to say. And a lot of that information is publicly available. So you have you know, Baseball Savant is giving all sorts of different per pitch tracking information now. So you’re essentially working with something that’s very similar to the quality that all the teams are working with. So you create things like Eno Sarris has, you know, his stuff+ model. And Driveline, you know, is a contractor, they also have their own proprietary models, but that’s mostly based off of league provided data. And so the league does provide a large amount of data to all the teams equally. And that’s not just box core data, but that’s player tracking data and ball flight data and, and skeletal data. So there’s a ton of different data that all teams have, but then there is that mix of proprietary stuff that every team has. And some teams, like I mentioned, it’s sometimes just experimental seeing what sticks and then they buy into a vendor for a year and then they, you know, leave the contract because they didn’t see any improvements in their stuff to, you know, bringing on people who have particular ideas and they set up their own in-house data collection team. And they work on a project for a really long time. And then they see whether or not style it after a lot of data collection in-house. So it varies between whether people go down like a vendor source and you know, it’s a quick turnaround or people bring in essentially people who used to be vendors and then have them do it exclusively for them, you know? So lots of different ways to get new data sources, lots of different ways to lead whether or not something is important.
But it is, I always go back to the idea that if every team made all of their models open source, I don’t think it changes much. Because every single team would just look at everyone else’s code and they would think like, well, they’re doing, they’re making these assumptions. And so why are they making certain assumptions versus like a team that isn’t making those assumptions? Like, how did you go about validating that this was the right type of parameter set? Or this was the right ensemble of models or this. You know, you’re working with a four layer neural net as opposed to a three layer neural net. Everyone would, everyone’s models are slightly different and because they’re slightly different, I would assume that it actually wouldn’t change much of the operations anywhere.

Evan: Wow. Yeah. Maybe to that point there are a lot of modeling efforts that are open source in the baseball world because the hobbyist or the university student or whatever that has access to some data and can build some models. Is it a waste of time or is it ever useful to see what’s out there? I think maybe at least at some of the, you know, some of the bigger publish, like a fan graphs article or a 5 38 prediction, or, you know, are those things that you guys can use internally or those things that “Nope, they’ve got them for an assumptions and a different set of, of incentives for what they’re trying to do”?

Nick: I’d say every analytics person in baseball anyone who values analytics to some, they have a fan graphs tab open, that’s you know, whether it’s the first thing they read in the morning or like something they read on the plane while you go somewhere, everyone’s reading fan graphs in the industry and, and you know, it is a it truly is a bastion of the analytics community, not just in baseball, but in sports in general. And I would say no one I know necessarily goes on fan graphs and completely yoinks an article and says, we’re gonna use this research. It does go back to that spectrum of verification, like are the assumptions gonna be correct? Is this a replicable effect? If it is, to what extent does it replicate towards? Cause people do have access to all this major league data, but does that replicate at the minor leagues? Does that replicate at certain levels of the game? So that’s one issue, and then the other issue is less about how much, how intricate or a particular article maybe, and more about just the concept. Like was the concept found in a good methodological state? There’s a lot of articles that go out there and they say they, you know, have a different angle and a particular principle of baseball. And sure the numbers all like to kind of sing together, but the methods were the things that they kind of made big leaps on at first. And so that’s sort of, what’s kind of turning the gears. So you do have to be a pretty sharp, critical analyst or analytics contributor to recognize when the methods are off the bit. And, at that point it’s like, well, if this concept does hold water, then what would the test in a more controlled environment or an environment with less assumptions look like? So we’ve seen that too in the public sphere, you know, seam shifted wake is a big topic of discussion in pitching. That kinda started from academics and academic research over at Utah State. And then it’s kind of taken the pitching analytics realm by storm since then. It’s pretty new. But it did require verification to start at first. It is like, okay, well, this was an interesting effect due to the spin of the ball. And, you know, the fact that it’s not a perfect sphere. You have like these effects of the seams. How do we quantify that? How do we track that info? Does it actually have an effect? Is it scalable? To what effect does it have when the balls change or when the seams are raised versus flat. And so you have to do quite a bit of work, even though a very smart professor over at Utah state in the physics department – it’s not like all, like he’s risking his reputation to put it all out there. But in the baseball world, the gains, if you do have a competitive advantage, are pretty gigantic. So It’s important to make sure that if you’re telling, say, your pitching coach, like, “Hey, there’s this new pitch type that comes out of those seam shifted wake research. I think it’d be good for us to try to figure out a way to develop it.” You need to have a lot of evidence because your pitching coach might be someone who’s coached former Cy Young’s. Someone who’s made a lot of people millionaires and he didn’t or she didn’t know anything about this. And so, it’s up to you to be able to present that research and it’s up to them to be able to take that in and figure out a way to grow a new skill set for themselves too. So it’s all encompassing, I’d say the public research field is a lot more “concepts and methods” than it is “hard numbers”, but everyone’s taking a look at it.

Evan: Yeah. That makes perfect sense. And I think that maps well to analytics in any industry. There’s interesting academic research that comes out that is not plug and play into your industry into your field, but it’s just, here are some good ideas or some concepts or a new way of thinking about something and then see how well you can apply it. If it holds water, if it validates well in the place where you need to use it.
So we’ve got a couple of questions left for you, Nick. As a fan of baseball, there have been a few rule changes recently. There are few more that are proposed. I think next year, the shift is slated to go away. I feel like the defensive infield shift is something that has been really analytically driven. And so I’m curious how much rule changes or proposed rule changes influence the type of work that you guys are focused on and the way you’re thinking about things. And is there anything else besides – the shift is what comes to mind for me, but is there anything else, like this, that could really change the way we analyze this data?

Nick: I mean, I think the shift is the biggest one. There’s bands of the shift in the minor leagues already. I think people are cleaning at the high wedge. So like they have like these foul lines, that second base that cut out kind of a triangle. And that triangle essentially says your fielders have to be on one side of the triangle or the other side, but they can’t be in it and they can’t be, you can obviously have an imbalanced number of fielders on one side. So that’s how they’re kind of preventing the shift. I, you know, from the public standpoint, there’s a lot of research out there that suggests that the shift actually doesn’t do much in terms of run seating. So you could actually, there’s a few different researchers out there. There’s a few different points of research. I wanna say that it was either on the athletic or on sand graphs. Might have been on perspectives, but there’s a pretty telling article that says over the course of the season you might be saving at most three to five runs. And so when you’re talking about things in terms of the level of runs and not necessarily in terms of the level of wins, then the effect tends to be pretty tight because you could neutralize the effect of the shift over the course of an entire season, if you just hit, you know, over 500 or over five runs expected. So the idea that the shift is this, you know, evil thing, that’s ruining baseball. It just looks different. And I think at the tails, there are players who are definitely affected by it, but over the course of your entire like average game, when you get shifted on, it tends to be no different than if you aren’t shifted on. And it’s so I would say that that particular role, I don’t think it’s gonna have a gigantic effect. Just from the concept of watching baseball or experiencing baseball. Analyzing it, it is a little harder simply because now we have to code in a completely new rule. So, being able to code in a new rule in baseball is its own difficulty. You have probably a lot of teams out there who we’re trying to write that into their code this off season. But again, I don’t really think it’s gonna change much. I think, you know, at the very least teams are gonna just play straight up just as they’ve done since high school, since little league. And other times like teams that are gonna try to get creative with the shift, like bringing in shallow outfield, you know, I don’t know. There could be more concepts about like, you know, if a guy’s a ground hitter you do something specific about their hit trajectories as opposed to their, you know, the side that they hit towards. So I don’t know. I don’t think that that particular role is gonna be that crazy. In my opinion, the world that we’re all really excited about is like pitch clock and more pitch clock enforcement, because games are gonna be like, a lot shorter. So like, that’s a lot more interesting just from a pace of play perspective. I’m curious to see how, if that is implemented at the major league level, what that ends up looking like.

Evan: Yeah. And sure. Certainly, from a fan I’ll complain about all the rule changes like an old man already. I welcome the clock, the pitch clock, let ’em speed up the games. That’s great. Yeah. And I’m already walking back off the ledge if you say that the shift is not a huge differentiator in expected runs.
So Nick, I wanna ask, you’ve got a team there. You’ve got organizational ideas and goals, but if, you know, just thought experiment, it’s the Nick Wan show. You get to decide what your analytics team focuses on, what you think is like an interesting problem. And everybody from the GM down to the single lane players who are trying to get better, they’re aligned to your vision. They say, yes, work on this, try to help us solve this problem. What’s the thing where you want to point your resources, where you want to focus on?

Nick: I mean, I’m pretty biased, but it’d be neuroscience and psychology. I would love to put the degree to work.I mean, that’s an area of not just baseball, but all sports. That is, you know, there is no solution for it. There’s no, you know, fancy metric that can quantify it quite yet. You know, a lot of the information we get in terms of the psychological profile of the player comes from scouts or comes from essentially relatively unfounded personality tests. So, you know, if you focused a ton on, say the neuroscience of the game, maybe you get further, but the investment it would take I mean, you’re essentially starting a neuroscience lab inside of a sports organization. No one’s done that yet. As a person who used to work in neuroscience labs, they’re not cheap. So certainly I can’t imagine that, you know, a team would have the luxury to start up such an endeavor. And so I’m, you know, the state of neuroscience in baseball, it is still in its infancy. It’s maybe not even that. Probably still in the gestation period of it. We’re hoping that it exists sooner than later – everyone is. But how to collect the data, what it is, how do you implement it? If you pick the wrong stuff, do you just cut the program completely? Or how do you reinvent it?  Who gets ahead in that space first? I’m very interested and curious. I would say if it was, you know, my reign, then that would be an area that I’d like to emphasize, but it would definitely be like, it would truly be like some sort of wizardry to convince everyone that that’s the most important thing in baseball, you know. There are plenty of things that we do that are extremely important that I also find important. I just definitely think psychology is a part of that equation. And I guess I’m guessing no one else in this industry thinks the same way.

Evan: Hey, that would be a fun one. We’ll check back in a decade or so once all the, the lowest hanging for is gobbled up until we’re into sports psychology. I really wanna know if you tell a pitcher that he’s got a no hitter in the seventh or eighth inning. Does that really jinx him? Does that really play with the mind?  I’m not gonna mention.
Anyways, Nick, you’ve been great. Thanks so much for coming out to the show. Super insightful.  Nick is active on Twitch, YouTube, Twitter. We’ll link all of those in the show notes. Nick, is there anything else that I missed?

Nick: No. Just a quick shout out to those listeners in the Cincinnati or Ohio area. We do a meetup, me and Eric Deger over at pro football focus. We do an in-person meetup the first Friday of every month. It’s been going throughout the summer now. I think we’re on our fourth one this Friday. But yeah, if you are interested in meeting locally with people feel free to come out.

Evan: Awesome. Super exciting. Yeah. Hope you can get some folks out there, and for the folks that aren’t local, go to the next Reds game when they come to your city. Nick, thanks so much for coming on the show and we will talk to you later.

Nick: Appreciate it. Thanks for having me.