Big Data, is it an 800 pound gorilla?

Big data: Healthcare, Marketing, Finance, Consumer Behavior and Social—all sectors are very excited but worried as no one wants to be left out! Everyone is exploring the best possible way to leverage the big data in their favor. So the million-dollar question is—does anyone understand the big data? Or how will the big data be relevant in business or strategy? My answer will be probably “no” but could be—as a typical MBA would answer —“it depends.”

Being in healthcare data analytics by occupation and being a biologist by education, I will first pick the classic example of big data—the human genome project in which the whole DNA was sequenced and tons of data and information were generated. At the start of the project, everyone was very excited that it would revolutionize healthcare and lead to tailored and personalized medicines. And I agree, it should have! Recently President Obama has initiated “precision medicine,” which is more or less same thing as personalized medicine. But even after fifteen years of decoding our whole genome along with hundreds of other species’ genomes, do we have personalized medicine? No. Other examples are the two historically data-rich fields: stock market and the NBA. So can we pick better stocks now, or can we predict the baseball champion team in NBA league? Answer, again, would be “no!” Similarly, can human resources, having access to millions of candidates’ database and analytics, hire better employees? Or can we pick better life partner from the pool of millions of profiles available with all cool matchmaking analytics? Answer will be, again, “no!” Similarly there are many others.

Then what’s wrong with the data? Why is the data not helping us to predict better future outcomes or performance in stock market or in baseball games or hire better employees or find better life partners? Why do we still not have personalized or precision medicines despite having the whole genome sequences figuratively on our fingertips? Why are we not able to predict more accurately despite having huge data, information and analytics? Is the problem lies with the data, or with the analytics approach?

My answer would be—there is nothing wrong with the data, or analytics. It’s just that we need to better understand the big data and how it should used. In simple terms—or rather in sophisticated statistics terms—the “big data” means population data. Historically, we have not had the technology or resources to collect population data. The exception being the census where we did, and still do, count the whole population and collected data of the whole population. For the census, we use huge resources and time to do it, but it is necessary and we only need to do it once in a decade. Before the “technology,” we used to rely on sample data, where the fewer, randomized, stratified, best normalized, statistically significant and relevant but small set of data would be good enough to analyze and predict the trend or outcome. The best example of sample approach is the exit polls of the U.S. president elections. These exits polls, usually based of 1000 to 10,000 responses, have been pretty good recently in predicting the election results and outcome of about 150 to 200 million votes!

The problem with big data is that there is too much data and information to handle and to analyze—and often, either most of the information is just repetitive, or the information is not significant or relevant. That’s why the big data approach is not a much better than a “good” sample based approach. The key is to design and construct the good sample. Whereas in big data we can to skip the step of designing the sample and can analyze and visualize the whole data, which usually has lots of noise. Why? Because we try to automate the big data analytics by use of technology and we skip to design good research methodology as we do in case of designing the good sample and analyzing it. But when the methodology is good, the big data analytics has been useful. The best example is the Netflix hit series of “House of Cards” or “Breaking Bad,” which were designed after analysis of millions of customer data and their behavior. But that remains to be seen and validated as Netflix continues to produce repeated “hit” shows or movies.

But most of the time it is difficult to design or rather program a good methodology for analysis of big data and requires more expensive resources than to design the methodology for analysis of sample data. Machine and technology can help to drill down the big data, find trends, find clusters and could be a quicker and cheaper option than sample design approach. In the cases of trend and clusters analysis, the big data could be useful and efficient. But if the trend and clusters are not part of objective, then big data might not necessary be significant or relevant to your business or part of your strategy. The sample-based analysis will be still more useful, effective and efficient than the big data analytics. As even in the age of big data and analytics, the human resource still heavily relies on good referrals for employee hires, and there is more probability of finding a good life partner among friends and acquaintances. So unless you have unlimited resources and time, I would suggest to go sample-based analytical approach than to run after the mirage of big data.