In the blog post titled Beyond the Cube: Embrace Analytical Views, we discussed the limitations of traditional cubes and how analytical views provide more agile methods for analytics while preserving the performance benefits of cubes. In this blog post, we’ll take a closer look into analytical views and how they enable faster query provisioning.
Analytical views are a semantic caching mechanism that allows you to pre-compute and cache the results of expensive SQL operations, such as grouping and aggregation. Query acceleration on data lakes is an increasingly popular topic, especially with new business requirements around high volumes of data and high levels of concurrent users. Businesses are finding that without query acceleration, they can’t fully address the analytical demands of their end users. But using traditional approaches like moving data to a separate, dedicated BI platform is not a sustainable approach. Analytical views represent a unique, native approach to query acceleration in data lakes and provide you with significant performance benefits:
- Queries from apps are automatically routed to analytical views with matching SQL expressions.
- Predictable workloads (querying) can be optimized and completed within a few seconds.
- As the workload becomes more predictable, the automatic use of analytical views increases.
- Analytical views that are well-partitioned (and partitioned identically to the base tables) enable incremental refresh.
To give you an example of the performance boost, in recent lab tests we saw analytical views speed up queries in the range of 20x to 800x. These tests were run on a 7-node cluster with tables containing billions of rows using a range of typical queries that analysts would run in production. We expect to publish details soon on our test configuration and the results with independent third-party validation.
The beauty of analytical views is that you don’t need to create and continuously refine your own data cube. Analytical views kick in automatically and can optimize for joins, distinct counts, medians, etc. Since your BI applications are built against the base data, you work with a single unified view of the data with access to all fields, even though specific reports may be supported by different analytical views.
The core analytics engine in Arcadia Enterprise, known as ArcEngine, handles more than just delivering the accelerated query results. An embedded recommendation engine simplifies and mostly automates the creation of analytical views based on query history. It is an automated complement to manually creating analytical views. Its key advantages include:
- Suggesting analytical views that accelerate applications and reports.
- Maximizing application acceleration while minimizing the disk space occupied by the analytical views.
- Refreshing analytical views incrementally as data gets updated or incrementally added.
- Identifying dimensions/measures/filters across a large set of reports, ensuring that each set is handled by an analytical view.
Defining Analytical Views
An analytical view gathers and maintains aggregated data based on the query used to create it. Consider it to be a shadow to the base tables. It is built using syntax similar to creating a logical view. An analytical view tracks aggregates for columns represented in its query and keeps them updated. Queries using an analytical view have a significant performance benefit and utilize fewer system resources as compared to running the query against base tables.
When a query is run that can be partially or entirely answered by an analytical view, ArcEngine automatically uses that analytical view. The end user of the query is completely unaware of the analytical view. This is helpful since the dashboards that run the queries are built against base tables, and not separate data structures, so end users don’t need to think about how best to run their queries.
Here is an example of a base table and an analytical view:
CREATE EXTERNAL TABLE events
PARTITIONED BY (year INT, month INT, day INT);
CREATE ANALYTICAL VIEW events_month_platform_view
PARTITIONED BY (year, month) STORED AS PARQUET AS
(SELECT count(device_id) as count_device_id,
count(user_id) as count_user_id,
GROUP BY month, platform);
Refresh Analytical View:
REFRESH ANALYTICAL VIEW events_month_platform_view;
In this example, the “events” table is comprised of data generated by a sensor. This data can have multiple dashboards built on it. An analytical view called “events_month_platform_view” has been created which tracks the count of users and devices by month and platform. If an analytical view has been outdated due to new or modified data, it can be updated incrementally by refreshing it (as long as the partitions of the analytical view are a subset of the partitions of the base table). As you would expect, refreshing incrementally will reduce the time for making the analytical view fully up to date.
UI Options for Analytical Views
Defining analytical views starts with an understanding by a BI developer/analyst on which dashboards and the underlying queries will be deployed to business users. If you know which queries need acceleration, you can create analytical views manually. They can be created via the command line as shown in the example above, or using the Arcadia Data “Create Analytical View” UI as shown in Figure 1.
Figure 1. Arcadia Enterprise provides a UI for creating analytical views.
Automatically Defining Analytical Views
Arcadia Enterprise also provides a recommendation engine known as Smart Acceleration™ for building analytical views. This feature greatly reduces the planning effort required for accelerating end user queries, and is one of the ways customers can reduce the time to insight in their analytical lifecycle. So while the manual creation of analytical views works well when you know which queries are most used by the user community, Smart Acceleration helps to identify the analytical views that accelerate other common queries in production. No human examination of query logs is necessary, as the system will learn from end user behavior and then give administrators an easy way to build those analytical views.
In summary, Smart Acceleration helps to:
- Automatically identify the analytical views to build for any dashboard.
- Demonstrate average query runtime before acceleration.
- Simplify administration with a UI to create, delete, or refresh analytical views easily with a click of a button.
- Demonstrate performance benefits after analytical views are created and used.
- Consolidate analytical views where possible, instead of creating multiple ones.
By now you should have a better sense of what analytical views and Smart Acceleration can do for you. If you face long delays in getting data to an analytics-ready state for your business users due to extensive performance modeling, and/or you aren’t getting the query responsiveness that your end users need, then analytical views is likely the technology you need.
In an upcoming blog post, we will go into more detail on the Smart Acceleration recommendation engine. Not only can you get the query acceleration that your users need, but you also benefit from the ease of administration that reduces IT intervention and promotes self-service BI.