We’re seeing a phenomenon in which enterprises are choosing two separate corporate standards for their business intelligence tools. They have a standard to handle analytics requirements on their traditional environments—data warehouses, data marts, etc. But they are also choosing a separate standard for their data lakes. In other words, they are using “data warehouse BI” for data warehouses, and “data lake BI” (aka, native analytics and BI for big data) for their data lakes.
Why? The rationale is very similar to why they chose data lakes in the first place; not as a replacement, but as a complement to data warehouses. Data has evolved, requirements have changed, and you have to adapt. Data lakes represent a (relatively) new way of leveraging data to handle those changes. BI users hope that data lakes can provide them with greater information sharing/collaboration, data correlations, and agility. But with the changing data landscape, you can’t simply shoehorn your traditional data platforms into existing BI tools. Data lakes require new platforms which typically entail a combination of Apache Hadoop, Apache Spark, Amazon S3, and others. For BI, conventional wisdom says that if you already have a set of perfectly fine BI tools, then there’s nothing wrong with using those same tools for your data lake. But we intuitively know that’s not the right approach, especially when there are advantages to be gained by having an analytical platform that runs directly on your data platform.
Anyone looking to promote information sharing and agility with a data lake knows that traditional tools aren’t aligned with those objectives. The data movement for extracts, the extensive modeling, the security and governance challenges, the scale trade-offs, etc., all inhibit the value you seek.
Just as data warehouses, Hadoop/Spark, NoSQL, and search technologies are complementary to relational databases, so are data lake-oriented analytics and BI tools with traditional BI tools. By embracing a native approach to data lake analytics, you are recognizing the need for a natural way of leveraging your data lake. This approach still lets you retain your traditional BI tools as the solution for your data warehouses and data marts. “Data warehouse BI” and “data lake BI” are the two enterprise-wide standards you should adopt.
If your initial reaction to this approach is that it sounds redundant, then let me elaborate. Understanding a native approach to big data and data lakes is important. Simply “bolting on” new technologies to your data lake as middleware to your traditional BI tools is a seemingly attractive, but suboptimal option. On one hand, it’s appealing because you can extend the reach of your traditional BI tools by using them with data lakes. It feels like you are amortizing the cost of your tools, as well as the learning curve, across multiple data platforms and use cases. At the same time, they’re not ideal since the overhead of semantic and performance modeling remains, as the bolt-on tools still require quite a bit of IT intervention. You also have to manage a separate set of dedicated nodes to run queries. The native approach entails a platform that delivers a seamless analytic lifecycle, from discovery and semantic modeling, to dashboard creation, to performance modeling, to production deployment of analytical applications.
With the latest release of Arcadia Enterprise, we’ve set a stake in the ground that if you’re looking to build a modern data platform, then you need a modern analytics and BI platform. We’ve already provided critical capabilities for data lake users—semantic layering, data discovery, easy-to-build dashboards and applications, recommendations-based query optimizations, etc.—and we are accelerating the analytic lifecycle for our customers via new features such as Instant Visuals and support for complex data types. Instant Visuals is an AI-driven feature that recommends charts by providing a side-by-side comparison of visuals on your live data. Support for complex data types helps to avoid the time-consuming ETL and “data flattening” that results in duplication of data across rows, which not only introduces IT as a bottleneck, but also negatively affects query performance.
Another exciting feature is our early access integration with Confluent’s KSQL. With this integration, we provide non-technical users with access to real-time data for a variety of use cases that are driven by Apache Kafka. We hope our powerful visualizations on Kafka topics will open even more opportunities for analytics on real-time streams.
With this new release, we hope to provide even more value to our partner community as they continue to help customers solve their big data challenges. Arcadia Data is proud to partner with leading big data platform vendors such as Cloudera, Confluent, Hortonworks, and MapR, and we look forward to more work with them as we make big data value a reality for joint customers. This latest product release gives our customers more ways to quickly and easily find important business insights in their big data environments.
Read the press release on our latest product announcement here.
To check out our visualization capabilities, download the free Arcadia Instant for desktop analytics.