September 20, 2016 - David Fishman | Big Data Ecosystem

Forrester Gives BI a Hadoop-scale Shot in the Arm for Big Data

It’s certainly gratifying that Arcadia Data has been recognized as a Strong Performer in the Forrester Wave for Hadoop-native Business Intelligence. The real credit goes to our customers – P&G, Neustar, Kaiser Permanente, RBC, to name just a few – who recognized the problem of visual analytics at scale and worked with us to hammer out the best way to make Native Hadoop Visualization succeed.

The Forrester Wave™: Native Hadoop BI Platforms, Q3 2016

Up-and-to-the-right notwithstanding, Forrester VP and Principal Analyst Boris Evelson has also delivered additional research that lays out a clear path for BI – a classic enterprise application – into the needs and opportunities that Hadoop and Big Data provide. You can download “How To Scale Business Intelligence With Hadoop-Based Platforms” here.

Self-service visualization has been the talk of the BI marketplace for the last couple of years. Peak self-service hype was reached when Gartner famously removed Oracle from its BI Magic Quadrant in February 2016.

Let there be zero doubt: it’s absolutely necessary that ‘business leaders get rapid access to intelligence unconstrained by central services bottlenecks,’ as that analysis suggested. Necessary, but not sufficient: 21st century self-service visualization cannot avoid dealing with data at massive scale.

What Evelson’s research articulates is the set of constraints which causes some of those ‘central services bottlenecks’, constraints that will undermine agility and performance of any of the vendors in the upper right of the ‘classic’ BI Magic Quadrant. How do you know you need visual analytics at Hadoop scale? Evelson recommends you look at three questions.

  • Do our business requirements call for linear scalability?
  • Is network traffic causing a bottleneck in our BI applications?
  • Do we need to keep data and applications together?

To be a little contrarian, let’s take a closer look at the second question, network traffic – rarely at the top of a data architect’s agenda. Evelson’s explanation cuts right to the heart of the problem with the existing set of self-service BI tools.

“While storage, RAM, and CPU capacities have increased significantly over the last several decades, the rate at which applications can read data from drives has not kept up. … Most BI applications are smart and only request aggregate, not detailed, result sets, minimizing network traffic between the app and database servers. But as big data volumes increase and enterprises mature their data mining and exploration applications, there’ll be an increasing requirement to analyze data at a detailed level, putting strain on network bandwidth.”

Individually, end users don’t immediately see the infrastructure tradeoffs in creating aggregates. One user might create a join based on a primary key for geography, another for product lines, and another for customer types. Each is going to require its own aggregates; each aggregate lives on its own system (anyone remember datamarts?). But each time those three users meet to compare notes and come up with new analytic lines of inquiry to visualize, the structure and granularity of their aggregates will change, again and again. Each of those systems gets bigger, fatter, and slower; when they need to access each others’ data, the network effect (pun intended!) will force a compromise: either give up granularity or give up agility. Everyone wants agility with any data. But when it comes to Big Data insights, granularity is the new black.

Rethinking BI architecture is the only real way to make the most of data as a common resource at the massive scale generated by modern sources across all sides of the business. That brings us to keeping data and applications together, which provides another benefit: compliance. Evelson explains:

“Client/server-based BI tools with data and result sets residing in multiple physical locations cause a HIPAA-compliance nightmare. Because Arcadia Data runs entirely inside a Hadoop cluster, proper security and auditing of that cluster is all that’s needed for HIPAA compliance.”

Evelson’s analysis takes the agile-self-service imperative beyond the small data constraints that existing visual analytics vendors have already tackled. By taking this more modern, architecture-centric approach, Hadoop-native BI can deliver more immediate business impact, as cited in the report, to:

  • Reduce reliance on expensive data scientists.
  • Make compliance easier.
  • Leverage available data sources, including unstructured data.
  • Break free from the limitations of SQL.
  • Broaden the variety of data sources and uses cases.
  • Deliver low-latency interactive access to big data.

In fact, there’s a way to summarize all that: linear scalability. Incremental spending on resources – in this case, putting resources behind ever-richer raw data about your customers and operations – ought to be matched by continuously generating ever-richer insights for everyone in your organization who needs to care about your customers and operations. (Hint: that’s not just a handful of accountants or executives.)

If your self-service visual analytics tools can’t do that, it’s time to give your BI a Hadoop shot in the arm.


Related Posts