Chapter 5:
Common Approaches to Big Data Analytics

Let’s look at four popular approaches to business intelligence architecture incorporating big data and the strengths and weaknesses of each.

Native Visual Analytics and BI for Big Data

This is a relatively new approach to BI that is made possible by embracing distributed scale-out architectures such as Hadoop, Spark, and cloud-based object storage such as Amazon S3. As with SQL-on-Hadoop and big data OLAP, native visual analytics keeps data in the big data platform. It also leverages the power and scalability of the platform to run large-scale analytics. The BI and visualization UI is written from the ground up to take advantage of the analytics platform, but the system can optionally leverage existing BI tools as well. This approach gets the benefit of having learned from prior BI approaches, and improves upon known challenges to boost performance and scale while reducing complexity and latency.

Native Visual Analytics and BI for Big Data
Native Visual Analytics Architecture and Process Flow

Pros:

  • No dedicated servers required. Analyzing data directly in the big data platform leads to significant advantages. First, with no data movement from the platform to dedicated BI servers, the overall architecture is simpler and easier to maintain. Second, no external BI servers or external ETL means lower hardware/administrative costs. Third, all raw data is immediately available to allow details on fine-grained data, unlike the summaries and aggregations used in dedicated BI servers. Finally, governance and compliance frameworks are easier to support by not creating separate copies of data in external repositories.
  • Unified security. Related to the above, since there is no data movement, data can be secured by the platform’s security controls, thus simplifying data protection and lowering the costs of securing your data. In the case of Hadoop, integration with security technologies like Apache Sentry and Apache Ranger enable the unified security model.
  • Lower learning curve. As an option, standard BI tools can be used with native visual analytics. That way, both “citizen data scientists” and RDBMS power users take advantage of tools and skills they already have. This leads to easier adoption since users don’t have to learn unfamiliar new tools.
  • Maximize current investments. As an option, companies can leverage existing BI tools, skills, and training, thus avoiding additional expenditures on new BI tools.
  • Self-service. With minimal IT intervention to run news types of queries, and no time-consuming and resource-intensive cube building, native visualization analytics can effectively provide self-service analytics that give much more flexibility to business users.
  • Query acceleration for performance and high user concurrency. Native analytics entails optimizations such as caching and pre-computing subsets of queries that accelerate a wide range of end user queries. This also enables hundreds and even thousands of concurrent users on the system.
  • Cost-effective scalability. By deploying analytics that leverage a big data architecture, you can easily achieve cost-effective scale-out by incrementally adding more commodity nodes to a cluster.

Cons:

  • Fear of the unknown. Since this approach is relatively new, organizations may encounter resistance due to a lack of understanding why this approach is superior to others.
  • New skills and unfamiliarity with Hadoop. The concept of accessing large amounts of structured and unstructured data for analysis is new, and will require some ramp-up for users and IT organizations alike. Basic familiarity with Hadoop, NoSQL, or cloud system skills are needed set up and maintain the data store.
Traditional BI SQL-on-Hadoop OLAP on Big Data Native Visual Analytics
Market Maturity Full Quarter Quarter Quarter
Use of existing/common skills Full Half Half Half
Architectural simplicity Quarter Three-Quarters Half Full
Enables self-service BI with access to granular detail None Three-Quarters Quarter Full
Works with legacy BI software Full Full Full Full
Ad-hoc query performance at scale Quarter Half Half Half
Prepared/dashboard query performance at scale Half Three Quarters Full Full
Cost-effective scale None Half Half Full
User concurrency Quarter Half Full Full
Data granularity support Quarter Quarter Quarter Full
Unified security support None Full Quarter Full

Get the PDF version for easy access to read offline or print.