Today, the Arcadia Data team is proud to move modern BI beyond the basics with the release of Arcadia Enterprise 3.3. While this release has an impressive list of features that will delight the business analyst and enterprise BI architect communities alike, I want to dive deep into two key capabilities which enable BI users to break free from the shackles of big data scale that are often limiting with legacy BI technology trying to access data in Hadoop, Apache Solr, and other data platforms.
1. Micro-Segmentation Analytics
The ability for analysts to slice a population of entities (customers, products, cars, set-top boxes, sensors, …) into a very large number of small segments is critical to understanding the behavior of these entities at a fine-grained level. However, the scale challenge with this type of analysis has been choosing between:
- creating a large number of small segments, or
- creating a small number of large segments
The latter, of course, is a level of coarseness that is not as effective for understanding behavior in today’s digital economy. But the former (micro-segments) are very quickly limited when the number of members in each segment (set) grows too quickly. Not to mention how tracking that segment over time for reasonable historical analysis exacerbates the scale issue further. Most BI tools have a simplistic approach to this micro-segmentation problem – cache all the granular behavior data in the BI server and limit the analysis to the scale of data that this BI server can handle.
As an example, one of our customers tracks about 20 million set-top boxes generating usage data for each device of upwards of a 125 TBs of logs per week. To be able to just analyze last month’s viewing history from these set top boxes, and then segment these users for better ad targeting, the data quickly approaches petabyte scale. Another customer of Arcadia Data analyzes Omniture data dumps which are describing anonymous visitors to their web properties. To be able to perform analysis of the segments for even a month, requires analysis of about 6.5 billion records.
Arcadia Enterprise 3.3 addresses this challenge by:
- Analyzing the granular data on the scalable Hadoop tier, instead of extracting it to an external system.
- Applying set/segment evaluations intelligently at runtime using joins and common table expressions, instead of storing them in temp tables in the database and duplicating granular behavior data
- Allowing each segment to be tracked over a deep history of data, enabling marketers and product managers to see long-term trends that would otherwise be misinterpreted.
Micro-segmentation applies equally whether you are tasked with analyzing Omniture data to understand visit frequency based cohorts, or analyzing network endpoints for packet flows that can be maintained collectivity or when using set-top box based viewership data to personalize the customer experience.
Runtime segment evaluation, integrated through Hadoop visualization interface
2. Direct S3 Visualizations
Organizations today are considering approaches to data which are a hybrid of on-premises and cloud infrastructure. Amazon S3 happens to be one of the common low cost data storage methods that is used today. Analysts who want to start visualizing data stored in S3 now have a way to directly perform the analysis with Arcadia Enterprise, instead of waiting for the data to be moved into relational databases like Amazon Redshift. The benefit of accessing data on the object store are two fold:
- reduce the costly copies of the data that needs to be made in databases like Amazon Redshift, and
- increase data agility by providing big data discovery capabilities on the S3 object store data.
With Arcadia Enterprise 3.3 analysts can now point directly to data stored in S3 for fast and ad-hoc visualizations. For customers who are rightly worried about high access latencies of Amazon S3, Arcadia also enables a sampling mode which effectively samples the data from S3 to make rapid progress on the visualizations. With sampling, the system will only sample a percentage of the underlying data for discovery purposes. As the analyst builds a stronger intuition of the data and completes the discovery process, sampling can be turned off to run on all the underlying data.
Sampling mode on S3 data
We are super excited with the value that customers will be able to derive with Arcadia Enterprise 3.3 capabilities around real-time visualizations, cloud and Hadoop-native deployments, as well as our new Arcadia Smart Acceleration™. You can watch some of these features in our demo video showcasing a connected car scenario.