Blog

Be Smarter Than Your Devices: Learn About Big Data

Peter Bruce

October 4, 2019

BLOG_Be Smarter Than Your Devices-Learn About Big Data

When Apple CEO Tim Cook finally unveiled his company’s new Apple Watch in a widely-publicized rollout, most of the press coverage centered on its cost ($349 to start) and whether it would be as popular among consumers as the iPod or iMac. Nitin Indurkhya saw things differently.

“I think the most significant revelation was that of ResearchKit,” Indurkhya said. “It allows the iWatch to gather huge amounts of health-related data from its sensors that could then be used for medical research, an area that has traditionally been plagued by small samples and inconsistent and costly data collection, and for preventive care.”

Indurkhya is in a perfect position to know. He teaches text mining and other online courses for Statistics.com and the Institute for Statistics Education. And if you’ve ever wondered about the origins of a term we hear everywhere today — Big Data — the mystery is over. Indurkhya, along with Sholom Weiss, first coined “Big Data” in a predictive data mining book in 1998. (I never anticipated Big Data becoming a buzzword,” he said. “Although we did expect the concept to take off.”)

red-smart-watchThe ResearchKit already has five apps that link users to studies on Parkinson’s disease, diabetes, asthma, breast cancer and heart disease. Cook has touted other health benefits from Apple Watch, including its ability to tap users with a reminder to get up and move around if they have been sitting for a while. “We’ve taken (the mobile operating system) iOS and extended it into your car, into your home, into your health. All of these are really critical parts of your life,” Cook told a Goldman Sachs technology and Internet conference recently.

That helps explain the media fascination over another new Apple product. But it also tells us the importance of learning about Big Data. Having access to large amounts of raw numbers alone doesn’t necessarily change our lives. The transformation occurs when we master the skills needed to understand both the potential and the limitations of that information.

The Apple Watch exemplifies this because the ResearchKit essentially recruits test subjects for research studies through iPhone apps and taps into Apple Watch data. The implications for privacy, consent, sharing of data, and other ethical issues, are enormous. The Apple Watch likely won’t be the only device in the near future to prompt these kinds of concerns. It all leads to the realization that we need to be on a far more familiar basis with how data is collected and used than we’ve ever had to be in the past.

“We are increasingly relying on decisions, often from “smart” devices and apps that we accept and even demand, that arise from data-based analyses,” Indurkhya said. “ So we do need to know when to, for example, manually override them in particular instances.

“Allowing our data to be pooled with others has benefits as well as risks. A person would need to understand these if they are to opt for a disclosure level that they are comfortable with. Otherwise the danger is that one would go to one or the other extreme, full or no participation, and have to deal with unexpected consequences.”

The Big Data questions raised by the Apple Watch are similar to the concerns over access to and disclosure of other reams of personal information. Edward Snowden’s leaks most famously brought these kinds of worries into play, publicizing the spying on ordinary Americans by the National Security Agency. There’s also commonly expressed fear that Big Data is dehumanizing, and that it’s used more for evil than for good.

These fears, Indurkhya noted, have seeped into the popular culture. Consider this list of Big Data movies: War Games, in which a super computer is given control of all United States defense assets. Live Free or Die Hard, in which a data scientist hacker hopes to eventually bringing down the entire U.S. financial system. Even Batman gets into the act, hacking every cell phone in Gotham.

Little wonder people might shy away from studying big data. But that would be a mistake, said Indurkhya, who has a rebuttal for all the Hollywood hyped-fears.

First, he said, there are strong parallels between the Big Data revolution and the industrial revolution. Look at history. Despite all the dire predictions, machines aren’t “taking over the world” and neither will Big Data.

Big DataSecond, it’s also helpful to appreciate what Big Data gives us. It provides us with better estimates — they are more accurate and our confidence in them is higher. Perhaps more importantly, it provides estimates in situations where, in the absence of Big Data, answers were not obtainable at all, or not readily accessible. Think about searching the web for “Little Red Riding Hood and Ricky Ricardo.” Even in the early days of the internet, you would have gotten lots of results individually for “Little Red Riding Hood” and “Ricky Ricardo,” but it was not until Google had accumulated a massive enough data set, and perfected its Big Data search techniques, that you could reliably get directed to the “I Love Lucy” episode where Ricky dramatically reenacts the story for little Ricky.

Data specialists can set policies and procedures that protect us from some of the risks of Big Data. But we also need to become much more familiar with how our data is collected, analyzed, and distributed. If the Apple Watch rollout proves anything, it might be this: Going forward, we’ll all have to be as smart about data as our devices.


Strategy_crop 2_shutterstock_316710440Need help getting started with analytics? Elder Research's on-site half-day Analytics Executive Strategy session delivers strategies for using analytics to improve organizational decision-making and recommendations on how to grow your analytics capabilities. Learn more. 


Related

What is a Data Detective? How to go Deeper With Your Data

Yet Another Big Data Article - Why Does it Matter?

What is Data Wrangling and Why Does it Take So Long

The Power of Open Data and Crowdsourcing Analytics


About the Author

Peter Bruce Peter Bruce is Founder and President of The Institute for Statistics Education at Statistics.com. Previously he taught statistics at the University of Maryland, and served in the U.S. Foreign Service. He is a co-author of Data Mining for Business Analytics, with Galit Shmueli and Nitin R. Patel (Wiley, 3rd ed. 2016; also JMP version 2017, R version 2018, Python version 2019; plus translations into Korean and Chinese), Introductory Statistics and Analytics (Wiley, 2015), and Practical Statistics for Data Scientists, with Andrew Bruce, (O'Reilly 2016). His blogs on statistics are featured regularly in Scientific American online.