May 3, 2018 - Dale Kim | Big Data Ecosystem

How Native BI Platforms for Data Lakes Can Propel Your Analytics to the Next Level

If you are part of a large modern enterprise, you probably consider your company to be data-aware, meaning that you can transform raw data into useful information using a variety of data integration, data management, and business intelligence (BI) tools. But being data-aware doesn’t necessarily mean that you are insights-driven. These days, your organization must move from data-driven to insights-driven. In our recent webinar, Take Your Enterprise Analytics to the Next Level with Native BI Platforms for Data Lakes, Boris Evelson, VP and Principal Analyst at Forrester, and Alex Gutow, Director of Product Marketing at Cloudera, joined our own Steve Wooledge to discuss what it takes to move a BI environment to the next level by harnessing the power of a data lake to drive new insights and business agility.


One of the main challenges that modern enterprises face today is that business and technology professionals are not in complete alignment. Business professionals just want to get their job done quickly and effectively. An objective like “single version of the truth” is not their top priority. “Good enough but timely” information is often good enough for them. Technology professionals, on the other hand, have different priorities such as a single BI platform, a streamlined data architecture, and centralized support. In the overall scheme of things, it’s important that your IT professionals realize that getting the job done is the number one priority.

Interestingly, Boris asserts that a majority of BI analytical applications are not being performed on enterprise-grade BI platforms. A majority of analytical apps are still being built using spreadsheets. In fact, according to Forrester Research, 66% of respondents reported that over 50% of their BI content was still being created using spreadsheets, while 15% reported that over 80% of their BI content was based in spreadsheets. Use of spreadsheets for BI most certainly does not promote “getting the job done quickly and effectively.”

analytical apps

Dimensions of Business Agility

We have entered the Age of the Customer, which means that enterprises need to run from the “outside in.” Consumers don’t care about a company’s internal processes such as risk management or supply chain processes. Consumers use their mobile phones to easily access your competitors’ product information, and these empowered buyers demand a new level of customer obsession. Therefore, being agile and extremely responsive to customer needs is a key success factor in the age of the customer.

Age of the customer

In fact, agile enterprises are much more likely to be industry leaders. According to Forrester, there is a direct correlation between agile enterprises and higher performers (those that are growing faster than their competition and industry averages). These high performers are in the “formidable” category in the chart below, and they understand that business agility is a key capability. The left side of the chart shows low performers that don’t understand the value of business agility.

Agile enterprises

A Modern BI Data Architecture Is the Key to Gaining Insights From All Your Data

To get insights from all your data, it’s important to have a modern BI data architecture. That means employing different treatments for different data layers. Look at your requirements for latency, data quality, tolerance for risk, etc. You’ll also need to figure out who will be accessing the data in a data lake, versus a data warehouse. The chart below is a generalization of how organizations have traditionally enabled use cases across data lakes versus data hubs or data warehouses. Data warehouses are appropriate for mission-critical, low latency applications, but not all data needs to live there. Data hubs are used for agile insights applications. Finally, in a data lake, you can have a staging area, and can perform data mining, searching, exploration, profiling, and cataloging. This is where you have all of your data.

insights from all your data

Depending on the type of BI architecture used, the data lake becomes capable of supporting a variety of analytical workloads.

Earlier Generation BI Architecture – Bring the Data to BI

Earlier generation BI architectures sit outside the data lake, so data still has to be brought into the BI platform. This means you’re moving data in and out of clusters through a single choke point. Any metadata that is used inside or outside the clusters is not shareable.

Earlier gen BI architecture

Next Generation BI Architecture (V1) – Bring BI to the Data

With the newer generation BI architecture, BI tools push more and more over to their components; you are bringing BI directly to the data. This eliminates moving data in and out of the cluster, and there’s no extra WAN/LAN traffic, no JDBC choke point, you’re not limited to SQL, and you now have rich metadata. A lot of the components that were outside of the dotted line in the chart above are now included inside the cluster. In this early generation BI architecture, you still run some of the components on an edge node, so there still is a bottleneck.

Next gen BI architecture

Next Generation BI Architecture (V2) – Bring BI to the Data

In version 2 of a next generation BI architecture, the only thing the edge node is doing is rendering visualizations, and all of the semantic layers, cubes, indexes, and queries are pushed down to the individual data nodes, so it’s 100% distributed and 100% scalable. This modern data architecture allows you to analyze data and derive insights from all of your data—both structured and semi-structured.

bring bi to the data

The Data Lake BI Architecture – The Native BI and Analytics Way

Arcadia Data was built to run natively within data lakes, so you can easily plug in different processing engines into these scale-out storage architectures. Arcadia Data runs directly on the data nodes where the data exists, allowing you to create better performance schemas and dimensional models, and scale out in a very linear way. Because of this, you don’t have to model data in multiple locations, move data into multiple tiers, or secure it in multiple ways.

Data lake bi architecture

Smart Acceleration Leverages What is Learned During Data Discovery

One of the challenges with legacy BI and middleware applications is that you wind up building cubes in advance based on what you think the business requirements are. That requires a lot of planning and setup time, and you have to lock yourself in to what dimensions and views people want to have. Smart Acceleration allows people to query data in its granular state, and then gives administrators suggestions on what accelerators to build. These accelerators are data structures known as Analytical Views which speed up common queries and enable high end-user concurrency on the data lake. This is “just right” data engineering based on actual versus assumed needs and usage, meaning that only the data/aggregates actually used by end users are accelerated.

Query acceleration

BI Native to Data Lakes Provides Faster Time to Value

With Arcadia Data, you can query unstructured data, perform analysis and discovery, and then optimize for performance after you have figured out what needs to be accelerated. Smart Acceleration gives you that information about what to accelerate. This is a much simpler process than the traditional data modeling life cycles that require continual back-and-forth discussions between IT and business teams, and thus are very time consuming and costly. Our native BI architecture allows value to be realized much faster than traditional BI architectures designed for relational databases.

BI native to data lakes

Check out the webinar for more details.

Here are a few more resources to dive into for more information:

Related Posts