Evan: Hello and welcome to the Mining Your Own Business podcast. I’m your host, Evan Wimpey, and today I’m super excited to introduce Nechama Katan. Nechama is a director of data science at a large pharmaceutical company. She’ll be speaking with us today about analytics in pharmacy and analytics in general. Her background’s in math and science, and we’re excited to have her on the show. Nechama, welcome to the Mining Your Business podcast.
Nechama: Thank you. It’s great to be here.
Evan: Awesome. To get started, can you give us a little bit just about your background and how you got into analytics and pharmacy?
Nechama: Yeah, so, I call myself a wicked problem wizard. I used to be a data wizard with personality.
My degrees, as you said, are in math and statistics from Columbia and Courant. And then I did kind of a sideways path through Intel—in high tech for seven or eight years, really helping people use their data to solve hard problems. I left that industry, worked independently for a while, went back into industry, worked for a number of other companies—high tech, low tech, banking, et cetera—and join the pharmaceutical industry to do risk-based monitoring implementation about six and a half or so years ago, and that’s what got me into the data space and pharma.
Evan: Awesome. Very exciting. Maybe for those of us outside the industry, you can give us a little hint of what—and I’ve already lost the term …
Nechama: Risk-based monitoring—yes.
Evan: Yeah.
Nechama: So I’m in the clinical trial space. So you bring in all of your clinical trial data from the sites, and there’s an assumption, a statistical assumption, that every drug trial site is following the same protocol. And then you can aggregate all that data and get the results from your study. If you have a site that doesn’t understand the protocol or isn’t following it for some reason, or the protocol is confusing, or their software isn’t working and that’s causing different behavior at that site, then you end up with a problem with what’s called statistical power and you either lose your ability to have false positives or false negatives, and so your study becomes at risk.
ICH, which is our governing regulatory body, so we talk about ICH—all the types. So there’s ICH E this and Q that and I that. So ICH E6(R1) is about 10 to 15 years old at this point, and the regulatory people said you need to apply statistical methods and identify those things at which there is high risks and do more analysis there and less analysis where there’s less risks.
So pharmaceuticals traditionally has had a model of verifying data by—you go down to the drug site and the person at the trial site and someone opens up the case report form, which is where the data’s entered into, and someone opens up the medical record and verifies that what was entered matches one to one.
And it feels like you’re doing a hundred percent checking. It sounds like a great idea. What could possibly go wrong? And what goes wrong with that is called source data verification. What goes wrong with that is that just because I check each number across the other, all you’re really finding there are typos and transcription errors, and what you miss are big process problems.
Evan: Sure.
Nechama: You won’t notice that the nurse always uses the same blood pressure.
Evan: Gotcha.
Nechama: Or that the sites never—sometimes you’ll notice that the site is never entering something, but you don’t get to see any trends. You don’t get to see—you don’t get to see really kind of that process that’s going on behind the site, and so you lose a lot. But you feel good because you’ve done a hundred percent of SDV.
I call it cleaning a bathroom with a toothbrush. So you go through and you clean every single tile, but you ignore the fact that you’ve got a plumbing fixture that needs to be debugged because it’s overflowing. So that’s what—so risk-based monitoring is an attempt to address that problem—to say, we’re not gonna do data cleaning, we’re gonna stop the toothbrush activity, and instead we’re gonna look at what are the systemic process problems that are causing these different behaviors.
Either it’s complexity of the trial; it’s a lack of understanding; sometimes it’s social complexity. I’ve seen trials where we know that adverse events get reported differently across different geographies. For example, you’ll get different adverse event reporting in the Far East versus in the United States versus let’s say the former Soviet Union. So you’ll get different reporting there.
But we’ll also see other types of interesting behaviors. I’ve had a site of older people in Puerto Rico where nobody was reporting any sexual activity, alcohol, or smoking. And I felt like these young people were afraid to ask these old people these questions. I mean, like nobody at the site had ever had a drink. Like really, that’s not possible, right? That’s the kinds of things that you can find when you look at the data holistically.
Evan: Sure. Yeah, that makes perfect sense. And I love your analogy for it as well, the cleaning the bathroom. And I was thinking about scale, like to check the box that you’re verifying the data—that’s gotta be impossible to scale too. You’re manually checking each thing, but being able to look at these patterns, you’re able to ingest a lot more data, and folks with, with the data science chops can explore that.
So I’m curious. Folks with data science chops, your background is in statistics. Is most of your team who’s looking at these types of things, do they have a background in industry? Are they aware of pharmaceutical and drug trials? Are they more stats and data science based?
Nechama: So when we first set up the team that I am part of, the intention was to hire people from outside of the industry or hire people from different roles.
And that was because we knew that there was a huge difference in how we were doing the job. Now those roles are being filled more with people from inside of the industry. They probably wouldn’t be considered traditional, hardcore data scientists. If you think of—I think of data science as being, you know, those companies doing data science; everything is still in Excel—data science is really hard.
Then you have the really, really dated native companies where everything is a model. And then you kind of have those Fortune 500s in the middle where it’s more advanced analytics than data science, right? It’s, it’s that—what’s that term? So there are people who have analytical skills.
In our case, we’re writing a lot of SAS code. S-A-S, not S-A-A-S.
Evan: Good clarification.
Nechama: Right, there’s some python code to do data analysis, and so from that perspective, it’s gonna look a little bit like a data science role, but then you have to go and sell the results to the study team. So you have to say, “Hey, I found this. Can you now go act on it?” And that’s not been historically a data science role. That’s storyteller role. So we do both the storytelling and the story pushing if not telling, and the actual analytics leveraging industry tools.
Evan: Alright. Yeah, now just a quick sidebar. I hear SaaS, software as a service, all the time.
And when I hear SAS, I live in Cary, North Carolina. I’m looking out the window.
Nechama: Oh, you’re right. You’re looking at SAS.
Evan: I can’t quite see the campus, but I know it’s right over there. And so I have to translate in my head every time somebody says SAS, they’re not talking about SAS the company, but now you are. So it makes it easy for me.
Nechama: Yes.
Evan: Perfect. Yeah, and I want to go back to the way you opened this with, when you were first hiring for the team, when you were first building out the team, you were looking for folks outside of the industry. Can you talk about sort of what drove that decision? And then in hindsight is that—was that the right way to start the team?
Nechama: That’s a good question. We needed people who had a different mindset than traditional pharmaceuticals and clinical trial analysis—given how far we needed to go. We needed to go from this—really, I mean, literally people sit and check things manually all the way to statistical methods. And to do that, it was necessary to really take people from outside of the industry.
Where were we challenged? Some of our best people—one person was a—one of the original team members who’s still with us, had worked for a CRO, but I think he was pre-law and had decided not to go to law school and went to do something else. So it was a great example where liberal arts degrees were really helpful.
So you had someone with a liberal arts problem-solving, creative reason—creative problem-solving structured reasoning. That kind of stuff was really helpful even though we then needed to teach them in the pharma. So, and in data science in general, you can either think and reason about data or you can’t, and then you can either learn the business or you can’t.
And if you can learn the business, you can learn any business. If you’re motivated and curious enough about the business, you can learn the business. If all you care about is what’s the latest package in Python, then you’re not gonna learn the business. And that’s where you get stuck. So we had a couple people we brought in who were programmers.
We had one person, great programmer, he wrote great SAS code. We couldn’t justify a hundred percent programmer. And so that didn’t end up working. But it’s that—I was finding that balance of people who can have the technical conversations but also have those other business conversations.
And over time the industry is really splitting those roles between people who are the more technical people—I’m starting to call them story builders, the people who build out those pieces, and the storytellers who tell that compelling story.
Evan: Yeah. I, I think, I think that’s a very useful way to think about it.
I think data storytelling is, is a pretty popular term, and I think rightfully so. Being able to express that is good, but I’ve never heard the story builders, and I think that’s a good characterization.
Nechama: Let’s build that story outta the data.
Evan: Yeah.
Nechama: And it’s not gonna be the same person who’s telling it to you typically.
Evan: Yeah, I think that’s great. I’ll mention, as you were talking about the advantages that an outside perspective brings in being able to learn the business, it made me think back. We had April Wilson on the show several episodes ago. And she’s a graduate director for a master’s program in analytics, talking about her students that come in with no background or with very—with backgrounds that are not related at all to data science and then get the master’s in data science.
Like often, that’s the most in demand because of their ability to think, to learn the business. And certainly, you know, some time and resource involved in teaching someone the business. But you’ve got these story builders and these storytellers now. And I think back to some of my experience at rather large companies, and there’s, for lack of a better way to put it, there’s often people who don’t want to listen to the story, and there’s often some friction with getting something that you’ve found in the data to make a change to the process—to fix that broken pipe. So can you talk about how you go about that? How do you get people to listen to your stories?
Nechama: So I had someone in a meeting last week say it takes a lot of courage. So because you’re never in a position of authority in those teams when those conversations are happening.
So you need to have the courage to say, “Hey, something’s not going right here. Let’s have, let’s stop, let’s have a conversation. But you need to be speaking in their language. So I was at a conference last Wednesday, and someone whose name has escaped me brought up a concept. She came from Saks, the clothing store.
So she brought up this concept of—we talked about data literacy, but from a linguistics model, in order to have a—to understand a conversation, you need to know about 30% of the vocabulary of a particular language. And so often the data storytellers will sit at a table and not understand the business.
And the business also needs to understand. So if you think about it from that side, you say, all right, if I’m trying to explain something, I need to know at least 30% of the business language. But probably to influence people, I need to know 50%; I need to know more. Right? And then I need to teach the business that 30% of the technical speak so they know what I’m talking about and to be very aware of that and to talk about it.
What works for me is often nobody’s gonna raise their hand and say, “I have no idea what you just said.” I have some very strong technical people, and they go off on Bayesian models and this and this and that and the other, and you name it. And I’m like, dude—I don’t—like I can vaguely guess at what maybe he’s saying, but I don’t—I can’t have a meaningful conversation with him on the top, right?
So you need to be able to—you can watch a room and when they start going like that, you know, you’ve lost them, you’ve lost them. They don’t know what they’re—what you’re talking about. They don’t care. They just want the problem to go away. It’s easier to dismiss it. So it’s that—learn their language, and then always be a teacher because they’re not gonna go sit down and spend their free time reading books on data science.
So it’s go teach them. Use everything as a learning opportunity. People learn with analogies. Always have an analogy or two or three. I used to teach calculus like this too. Always have a simple example. They can say, “Oh yeah, okay, now you’re talking about toothbrushes again, aren’t you, Nechama?” And I’m like, “Uhhuh. Yep.”
And so it’s that you have to—you have to step up and do that extra bit from that 30 to 50% to learn that business and then—and use that language, and then you need to teach them your language.
Evan: Yeah …
Nechama: And it’s not dumb it down, but there is some amount of that as well.
Evan: Yeah, when you talk about the modeling process and a generative—some sort of Bayesian generative process, maybe you have to dumb it down a little bit.
Nechama: Explain it to me. Show me how it works, right? I want to use ChatGPT. Let’s pop it up and pull a few different prompts, right? Make it real. Show them a demo. Get them into the tool; get them engaged.
Evan: Yeah, that’s great. And I think the way you spoke about that—it sounds very generalizable. I’m thinking, you know, I work at a—
Nechama: There’s nothing pharma specific.
Evan: Okay. Yeah, that’s—I’ve lost the acronym, you gave an acronym for one of your regulatory …
Nechama: ICH E
Evan: … documents early on. Yeah, I’m curious how much—in my mind in the pharmaceutical industry—the industry language—there’s more specific industry language versus, I don’t know, other places.
Nechama: Not for analytics.
Evan: No.
Nechama: It’s trust-based culture speaking up, statistical methods, really generic technical language.
Evan: Yeah, but I guess from the the business speak, I feel like …
Nechama: From the business speak, yes. I have to phrase everything under GCP, Good Clinical Practice. If it’s not a risk that I can tie to a risk to a patient. And by the way, if I run a clinical trial, and I have data that I’m not gonna be able to use, now I’ve dosed that patient without any possible benefit.
I’ve put that patient at risk for no reason. So I have to phrase things as hey, if the site’s data isn’t gonna be any good, that’s not good GCP. All of a sudden, oh GCP, we have to do something about GCP.
Evan: Oh, wow. That’s a great concrete example of knowing the business.
Nechama: What is that thing that they care about? If it’s money, time, quality—whatever that thing is that they care about—data privacy, data security. We all have those problems, right?
Evan: Yeah, yeah, that’s great. Yeah, it’s, you’ve learned and—
Nechama: No one wants to to end up on the newspaper—front page of a newspaper particularly. I mean the chances of it being good are really bad, so yeah.
Evan: Very fair. You mentioned, sort of offhand in jest, you know, Hey, ChatGPT can do this. But generative AI, the suite of tools that are available and that are front and center, not just in our analytic world, but available to everybody.
I’m curious if you’ve gotten requests for—you do analytics, give us ChatGPT, use ChatGPT to do something for us. Is there an increased appetite for some of these impressive technical tools?
Nechama: So I am trying to get a way of using it within the inside of the firewalls inside of my organization, and I haven’t gotten there yet. There are some people doing it. It’s more the technical people saying, “Guys, we could replace so much work with this,” right? I have personally seen an example outside of my day job in in marketing where you can, with the right prompts, you can replace basically an entire marketing department writing copy and months worth of work in hours of time. It’s mind boggling.
So yeah, it’s incredibly powerful. It’s still being pushed more by the technologist than it is by the business user, by the business leaders in my space. And technologists are falling 50-50 between, oh, it’s not plagiarism, versus yeah, it is. And it depends on whether or not you’ve seen it used with the right prompts or not.
Evan: Do the right prompts make you more optimistic or more pessimistic?
Nechama: Oh, it’s going to eliminate an enormous number of jobs, but rightfully so. Everyone is always telling me they can’t do critical reasoning because they don’t have the time to sit and think. And I’m like, great. This is a great example tool that’s going to create a lot of time for you.
Evan: Yeah, that’s good. That’s a very positive spin on how to look at it.
Nechama: And so then you can do the interesting work. Americans have always had to compete against their ability to do critical reasoning and to think outside of the box. Because if it was just could we brute force program, that can always be done better by someone who gets paid half as much as we do or a third or a quarter because I can’t program, right? I’m not gonna program twice as fast.
Evan: Yep.
Nechama: Programming doesn’t scale; thinking does. And so anything that cuts out and lets you think and leverage kind of what we do well then is worth it.
Evan: I love it. That’s a great attitude. And maybe just very generally, you said it’s more embraced by the technologists still than on the business side. And my assumption, and you can maybe add flavor to this, is that pharmaceutical in general are more conservative related to change, like adopting a new tool or a new thing, or changing processes.
Nechama: Yeah, I am in the only regulated industry I have ever been in when the regulatory agencies are saying, “Would you guys please do things more, whatever, progressively?” And the industry’s going, oh, I like my toothbrush. Toothbrush. I can control my toothbrush. I know what it’s doing—don’t … So yeah, I mean, there’s some challenges clearly from the regulatory. So any place where you’ve got unexplainable models aren’t gonna fly. But by and large, I think regulatory—the regulatory agencies are looking for ways to show that we’re doing more critical reasoning.
Evan: Awesome.
Nechama: I have the writing it up in all of any ICH you wanna read, we’ll show it.
Evan: Yeah, that’s seems encouraging—from outside your perspective, it seems encouraging. Now I wanna ask you one last question, and feel free to give a pharmaceutical take on it or just your general perspective, but there are a lot of new tools. There’s a lot of new data that you have—that you have access to.
If you could sort of zoom out and you can focus analytic efforts, development, your think time wherever you want. And you’ve got, you know, the resources and the stakeholders, the people that you need are on board with your vision. They wanna support; they want to help where they can.
Where’s your interest? Where do you think there’s maybe good benefit—something that’s personal interesting—something that you would want to push forward on?
Nechama: Yeah, so it’s exploratory data analysis. So what do I mean by that? The traditional IT stack goes as well as you do data pipelines, which is the modern way of saying cubes.
So cubes were a complete disaster if you’re old enough. They took years to write and then they were out of date before they got done. So, okay, so from a data perspective, you have a data, you have a transactional database, you have a reporting layer. You rewrite your reporting layer in a cube so that you’re can get access to the data, and you’ve done all the joints you need to do for your common analysis, okay?
So we’ve replaced that with the data pipeline and a data lake and a data mesh and data fabric. But there’s still this whole data pipelining activity that in any reasonable-size organization will take six months to a year at least probably, you know, to do and six months to a year to kick off. So now you’ve got two years, right?
You have to justify the program, fund it, and then you have to do all this data pipelining work. Well, if I come in as a data user and I have people who are not the highest skill level for data usage. So they can do a VLOOKUP in an Excel file and they can understand. And let’s say I teach them what a relational database looks like and how to do—and what a join would look like.
And I say, all right, I’ve got some problems I need to solve. So I’ve collected some data from a clinical trial in this case, and the answer is this dashboard system has identified a problem down here. And that goes deep. I go deep. I find the problem. Now someone says, oh, that problem is because those patients are green and it’s Tuesday.
Now you need to go wide. You need to join a bunch of data sets together and say, okay, is there a correlation between the weight of the patient and whether or not they’re green and whether or not it’s Tuesday? And I have to join in a bunch of data sets. Once I’ve joined in those data sets, I can plot it—X-Y plots—great. Plotting is done.
Everybody in fifth grade in a public school in America can do plotting and Google Docs, Google Sheets, right? So plotting the data isn’t a problem, but the data munging is still considered an IT function, and the tools are varied inside of pipeline tools. So there’s no place where I can—and they said, we’ll just build out your pipeline.
I’m like, great. I can build out that data pipeline for my first wipe. I’m doing five Ys. So the first time, nope, it’s not because they’re green on Tuesday. And they said, well what about this other condition and that other condition? I can’t build out—the more questions you ask, the more factorial possibilities of data joints you could possibly have.
I can’t build out that pipeline. So what I really need is a tool that says, all right, well, what I need is chat, and five years from now it should exist, right? I can say, hey, physician thinks green patients are supposed to have this metric. Is that true? Can you show me, you know, show me the statistical results that would support or disprove that particular fact.
And that the tool would then say, okay, well tell me what are your data sets? And you’ll be like, well, I’ve got a demography data set and I have a critical endpoint data set, and I have a visit data set. And it says, well, okay, demography is one record per patient. The visits have multiple records per patient. How do you wanna join it?
Do you want to take your visits and add your demography? Do you wanna take your demography and add your visits, right? Is it a right joint, a left joint? Is it a one to many? Is it—do we aggregate this one and then join in? If I’m aggregating it, how do I wanna aggregate it to join it?
What is the problem you’re asking? There are tools that guide people through this. And then they need on the top right corner this big flashing row count as if you’ve joined data sets, right? And you’ll go all the way through it and you’re like, oh my, I did left joins off a key table and I’ve lost 10 rows. Where did I lose my 10 rows? And you’re like, oh, they don’t matter. It doesn’t matter how big your data set is; those 10 rows matter because when you go to production, those 10 rows become a thousand rows, and now you’ve got a problem, right?
So just that defensive programming around it that says, hey or you joined—JMP will do this on occasion—I had a situation where it was like, did you really mean to take a table that 4,000 rows and create one with 400,000 rows in it? What were you thinking, right? And the answer is, no, I don’t. So can I hit cancel now?
So Cartesian joins the programs will protect you from because of the compute power involved in them, but we need to have tools that democratize that data munging and accept the fact that the people who do an exploratory data analysis do not have a spec. And the people running the data science tools all want a spec. And the business needs—by the time the business explains the spec to a data scientist, it’s easier to teach the business the data science or the simplified data science to play with stuff than it is to teach the technical people the business. And to write all those specs is just not worth your while.
I need to be able to, in two hours, go through 10 questions. I can’t do that without a tool. I can’t do that with modern tools, so I’m looking for one. I’m asking everyone, so if you’ve got a tool, let me know.
Evan: Alright. Nechama, I think you painted a very clear picture of a very common issue there. And the time and effort and the un-scalability of sort of your data profiling—your exploratory data analysis. So listeners, if you’ve got the answer, you’ve got the perfect package or framework, try to sell it to Nechama. You can find her—she’s on LinkedIn. She’s also the Wicked Problems Wizard.
Nechama: Wizards, yeah.
Evan: Wicked Problem Wizards.
Nechama: WickedProblemWizards.com. Yep.
Evan: Okay, perfect. You can find her there. Nechama Katan has been our guest today. Nechama, thank you so much for coming on the show.
Nechama: Thank you.