May 29, 2018 - Dale Kim | Big Data Ecosystem

Simplifying Big Data Analytics Acceleration

In the blog post titled, Beyond the Cube: Embrace Analytical Views, we discussed how analytical views represent a new way to accelerate queries in a production environment. The next blog in the series, A Closer Look at Query Acceleration with Analytical Views, discussed analytical views in more detail in how to set them up. In this blog post, we’ll discuss how easy Arcadia Data makes it to create your analytical views. We’ll explore details on the recommendation engine known as Smart Acceleration™ which defines analytical views for you based on actual usage.

First, let’s take a look at the two levels of acceleration technology in Arcadia Enterprise (not available in Arcadia Instant, which is only intended as a desktop-based visualization tool):

  1. Arcadia Enterprise Visualization Server (“ArcViz”) coherent cache. This caches in memory the query results from the analytics engine (“ArcEngine”).
  2. Analytical Views. These data structures are stored in the data platform alongside the raw data.

Both levels honor up-to-date permissions by rechecking the permissions on the underlying data prior to returning results. This ensures that any changes in user permissions will not be circumvented by the data residing in the cache or analytical views.

Queries that exactly match results data in the ArcViz cache will automatically use those cached results. If no exact cached results exist, then queries are checked against existing analytical views. Queries that have aggregations that can be answered by an analytical view will automatically use that analytical view. All other queries will be run on the unaccelerated source data.

ArcEngine Diagram

You probably understand the concepts behind caching, so you might not be surprised that ArcViz caching can boost user concurrency 10 times or more, while also greatly improving response times. But caching can’t address all queries, and that’s where analytical views come in. Analytical views pre-compute the most expensive types of operations in a known workload, so you can get faster responses on sets of similar, but not identical, queries. Benchmark tests have show performance improvements ranging from 2x to 780x, depending on the amount of data returned from the query.

Smart Acceleration Overview

When building a set of query accelerators, you can’t take a brute force approach with big data. For example, you can’t just pre-index every column in your tables. Nor can you pre-compute every query your enterprise runs. Nor can you simply load everything into memory. None of these approaches can realistically scale, as you’ll need significantly more resources and administrative effort to handle the extra brute force load.

This means you need to spend quite a bit of time with planning and modeling. This often entails a lot of time-consuming discussion and coordination between business and IT in an iterative/repetitive cycle. Or instead, you can leverage a system that can analyze live query usage and create optimal query accelerators for your entire workload.

As described in the aforementioned blog posts, you can define and create analytical views manually for the base tables and logical views, and the process is typically much easier than creating functionally similar data structures like OLAP cubes. But to simplify the process of creating analytical views even further, Arcadia Enterprise has a feature called “Smart Acceleration” to make the system define and recommend analytical views for you. The planning effort for creating accelerated queries is eliminated, and at the same time, there’s no brute force indexing, no excessing pre-computation, and no heavy expenditures on in-memory deployments. Simply build your dashboards in Arcadia Enterprise (or in an upcoming release, almost any BI tool), let a subset of users (or all of them if you’d like) take it for a spin, and soon enough with the collected usage data, you can easily accelerate your queries to give all your users the responsiveness they need.

Getting Your Recommendations

The recommendation process typically begins after users are given a chance to run some real-world scenarios on the dashboards. This lets the system learn what types of queries are run, and identify the commonalities across queries to build an efficient query acceleration model. Your administrators can revisit the state of analytical views at any time, to further refine the set of accelerators.

The recommendation UI lets you accelerate queries within the context of dashboards and their associated components (visuals). You can select existing dashboards or components that should be accelerated with analytical views. This recommendation process takes just a few simple steps:

  1. On the VISUALS page, you get a list of workspaces and applications in the left rail. Click on a workspace or application.workspace
  2. Next, in the main panel of the UI, click on the check mark in the upper right corner of the tiles for the dashboards you want to accelerate. In the Selected visuals menu bar that appears, click on the ellipses () menu item and then select Smart Acceleration.smart acceleration
  3. The Recommendation Manager page appears. On that page, you see a list of dashboards and their elements, along with their respective execution histories and recommendation summaries. The information includes:
  • The Local Cache column shows the number of times an app/visual request was served from the web-server local cache.
  • The Standard column shows the count and average run time for the app/visual requests that were served by the database connection but not accelerated by an analytical view.
  • The Accelerated column shows the count and average run time for those requests that were accelerated by analytical views.
  • The Last Engine Execution column shows whether or not the most recent ArcEngine request execution for the app/visual was accelerated by an analytical view.

The Recommendation Summary can be one of:

  • Accelerated. Already accelerated by existing analytical view.
  • Needs refresh. Would be accelerated by existing analytical view, but a refresh is required for the view to be usable.
  • Recommendation. Analytical view is recommended.
  • Unable to accelerate. The query does not meet the criteria for acceleration and provides the reason, such as not having an aggregation, grouping, or distinct operation.

Recommendation Manager

  1. Select the dashboards/elements for which you want recommendations, then click the GET ANALYTICAL VIEW RECOMMENDATIONS button.

Visual and filter components within each application are processed to yield a set of queries that are delivered to the ArcEngine recommendation logic. The ArcEngine response contains a set of analytical view definitions (both existing and recommended) as well as the acceleration status in the State column for each component query in the request. This status values are similar to the those in the Recommendation Summary mentioned above.

Recommendation Summary

In addition to recommending which analytical views to build, the engine also provides:

  • A star rating on the recommended analytical view strength, to provide more information on which views to build. Ratings are based on factors such as:
    • Degree of acceleration. Faster expected acceleration will get higher ratings.
    • Ability to incrementally refresh. If the underlying tables are partitioned, analytical views can be refreshed incrementally, resulting in a faster refresh.
  • Cardinality and size information of the analytical view’s underlying base tables or logical views.
  • Cardinality and size information of the analytical view.
  • The SQL statement for the analytical view.
  • The query or sub-query for which the analytical view is recommended.

 

Recommendation Details

Once analytical views are created, they are not automatically refreshed. They will be outdated if there are any updates to the underlying base tables. Remember to refresh them in the UI, or set up a scheduler job to automatically refresh. Incremental refreshes usually take much less time, since it is aggregating data from new or updated partitions. From this point onwards, your dashboards are ready to be operationalized.

There are a few requirements prior to using Smart Acceleration:

  • Overall access to the recommendation UI must be enabled via the site setting Enable Analytical View Recommendation.
  • Only application components (visuals and filters) that are built upon datasets sourced from an ArcEngine data connection can be accelerated.
  • For each component being accelerated, you must set the level permission to Manage Dataset AND the data connection level permission to Manage Analytical Views.
  • The recommendation functionality in ArcEngine requires ArcEngine V2.2.7 or later.

Example Recommended Analytical Views

As mentioned earlier, you don’t want each query to have its own pre-computed view, as that will lead to excessive consumption of resources that will limit the acceleration. Smart Acceleration makes sure you have the right set of analytical views for your entire query workload. To illustrate this with a simple example, suppose we have the following queries from two different visuals:

SELECT TA_0.`calendar_day` AS `calendar_day`,
       TA_0.`plant` AS `plant`,
       concat(CAST ((TA_0.`material`) AS string),' ',
       (TA_0.`material_description`)) AS `Material Expanded`,
       TA_0.`subsector` AS `subsector`,
       TA_0.`segment` AS `segment`,
       sum((TA_0.`mrp_available_dfc_site_`)) AS `MRP Available`
FROM `moneyball`.`isc_data_sv` TA_0
GROUP BY 1, 2, 3, 4, 5
LIMIT 10;

SELECT TA_0.`plant` AS `plant`,
       TA_0.`material` AS `material`,
       TA_0.`subsector` AS `subsector`,
       TA_0.`material_description` AS `material_description`,
       TA_0.`calendar_day` AS `calendar_day`,
       concat(CAST ((TA_0.`material`) AS string),' ',
       (TA_0.`material_description`)) AS `Material Expanded`,
       TA_0.`segment` AS `segment`,
       sum((TA_0.`buom_max_stock`)) AS `Max`,
       sum((TA_0.`buom_mrp_available__csp_agg__`)) AS `MRP Available`,
       sum((TA_0.`buom_safety_stock`)) AS `Safety`
FROM `moneyball`.`isc_data_sv` TA_0
GROUP BY 1, 2, 3, 4, 5, 6, 7
LIMIT 10;

 

Can you identify the commonality between these queries that would allow a single analytical view to accelerate both? You can calculate the ideal analytical view if you spend enough time studying the queries, but that’s the problem. You don’t want to spend a lot of time analyzing queries, especially when you have many to pore over, and they may change over time. You also need to check whether the queries are frequently used, and thus whether they are worth accelerating.

Rather than go through all that manual effort, Smart Acceleration will do the work for you. The analytical view definition below is the recommendation for the two queries above:


SELECT TA_0.calendar_day,
       TA_0.material,
       TA_0.material_description,
       TA_0.plant,
       TA_0.segment,
       TA_0.subsector,
       sum((TA_0.buom_max_stock)),
       sum((TA_0.buom_mrp_available__csp_agg__)),
       sum((TA_0.buom_safety_stock)),
       sum((TA_0.mrp_available_dfc_site_))
FROM moneyball.isc_data_sv ta_0
GROUP BY TA_0.calendar_day,
         TA_0.material,
         TA_0.material_description,
         TA_0.plant,
         TA_0.segment,
         TA_0.subsector;

Conclusion

Hopefully you now have a good sense of how the Arcadia Enterprise can help boost performance in a large-scale analytics environment, and how it can be done easily. You might have seen firsthand that running unaccelerated queries in a data lake is too slow, and that other approaches to query acceleration are unsustainable.

You should explore the power of analytical views and Smart Acceleration in your data lake. These two features are just part of the overall advantage of having a data-native BI/analytics platform. To learn more about getting the most out of your data lake with a data-native approach, check out our webinar, Data Lakes Are Worth Saving, and the accompanying white paper, Saving the Data Lake.


Related Posts