February 1, 2016 - David M Fishman | Big Data Ecosystem

Top 12 Explanations You’ll Hear in 2016 for Why Big Data Isn’t Paying Off

Excerpt adapted from InsideBigData.com. Read the full article here.

To read the headlines, it’s easy to think that the only thing bigger than ‘big data’ is big talk about big data, but in fact, it’s been about 3 years since Gartner’s Svetlana Sicular called the start of Hadoop’s “trough of disillusionment.” So it should come as no surprise that big data skeptics with entrenched interests in the status quo — IT specialists and their suppliers alike — are working overtime in waving flags with question marks.

However, no less an authority than Gartner recognizes that the trough of disillusionment is not the end of the road; rather, it’s traverses the inflection point leading to the “slope of enlightenment”. And with the accumulation of big data and analytic experience in the last several years, the shift shows no signs of slowing down. In 2016, big data will quite literally be the elephant in the room. Here’s what I predict you’ll hear from those who think they can continue to ignore it.

BI on BIg Data | Hadoop Business Intelligence #1Business Users Can’t Get Access to Big Data

Until business users get well-structured access to ever increasing sources of information, self-service tools are really an exercise in rearranging the slices on the pie chart. Without direct, well-managed, granular big data access, visualization puts the chart cart before the horse.

BI on BIg Data | Hadoop Business Intelligence #2BI Tools Can’t Process at Data-lake Scale

With the large scale of repositories made possible by Hadoop and the broad range of data sources it can cache, the classic data warehouse is beginning to look like modest lake-front property. BI tools need to start with billions of records and up.

BI on BIg Data | Hadoop Business Intelligence #3All the Data Scientists are Busy

Have you exchanged one IT labor shortage for another? You probably shouldn’t count on a PhD statistician in a white lab coat to show up and save the day.



BI on BIg Data | Hadoop Business Intelligence #4Big Data Isn’t Reliable, So We Can’t Rely On It

The pressure of direct access, consumption, and discovery will do far more to expose and address the risks in big data than taking a wait-and-see attitude. Putting big data in front of more people more often is likely the fastest way to expose its flaws and drive improvement. No data becomes reliable until you rely on it.

BI on BIg Data | Hadoop Business Intelligence #5Hadoop’s Not Mature Enough for Business Users

Open source pours gasoline on the Hadoop arms race, with transparent roadmaps and code-bases – fundamentally different from the central planning approach behinds the roadmaps for proprietary databases that predate the Blackberry.

FreeRoman6-300x200ETL for that Much Data is Slow and Complex

Good news: business process automation; bad news: data silos. Seeking coherence across independently developed formats and schemas has always required coding. Now? Even Ralph Kimball, father of the Data Warehouse, argues that Hadoop will rapidly a become a leading player in enterprise ETL.

BI on BIg Data | Hadoop Business Intelligence #7The Answers We Need Aren’t in the Places We’re Looking for Them

Looking for lost car keys under the streetlamp because it’s dark everywhere else is no different than using legacy BI tools that can’t span large-scale datasets with billions of data points. Collaboration and social media make it less difficult for people to find each other and ask each other questions. Why can’t we ask questions of the data just as easily?

BI on BIg Data | Hadoop Business Intelligence #8We Need to Move Our Data Out of Hadoop before we can use it for BI/Analytics

It’s at the very least ironic to use Hadoop as the place you keep big data until you need to use it. Having to maintain a parallel data processing infrastructure to keep up with Hadoop is swamping these traditional architectures. Using Hadoop’s native capabilities to run analytics at scale can be a critical success factor.

BI on BIg Data | Hadoop Business Intelligence #9We Can’t Set Up Data Until We Know What End Users Will Do

Hadoop’s fundamental shift to schema-on-read via HDFS provides a faster, more flexible mechanism for exposing the structure of the underlying data without tying it down — in effect, decoupling what the data does from how it does it.

BI on BIg Data | Hadoop Business Intelligence #10Just Putting Data in the Lake Has All the R.O.I. You Need

The Hadoop platform provides extensibility that’s comparable to an operating system: metadata via HDFS;native security with authentication, authorization and encryption services; coherent execution management via YARN; there are many examples. A cost-centered big data strategy is only a stepping stone to real big data advantage.

BI on BIg Data | Hadoop Business Intelligence #11Cube First, Ask Questions Later

Cubes have long been the cure for slow joins; one way they do this is reduce granularity, a reasonable approach for small data. But in big data, you lose valuable insight. Hadoop’s delivers distributed compute horsepower to the data in place, rather than pre-diluting to fit dated data management methods.

BI on BIg Data | Hadoop Business Intelligence #12I Can’t Safely Grant Access to the Hadoop Cluster to End Users and Ensure Security/Compliance

This is a myth. Hadoop actually supports best of class primitives for security with Kerberos, LDAP/AD, file-level access control. Holding Hadoop to the same security threshold as your Data Warehouse means you’d be lowering your standards.


No More Excuses

2016 marks a decade since Google’s papers on MapReduce and BigTable inspired Doug Cutting and what became the Hadoop community to rethink what was once called “data processing”. You need only look back to where Oracle was in 1988, when it reached its tenth year — and what the relational model did to the data landscape in the decade that followed — to give you an excuse to rethink the objections to big data you’ll hear this year.

Related Posts