Apache Kudu® for BI and Analytics on a Modern Data Warehouse

Arcadia Data integrates with Kudu, which provides fast inserts, random data access, and high analytics throughput for visual analytics and BI.

What is Apache Kudu?

Apache Kudu is an open-source, columnar storage layer designed for fast analytics on fast data. In other words, Kudu is built for both rapid data ingestion and rapid analytics. This makes Kudu a great tool for addressing the velocity of data in a big data business intelligence and analytics environment.

How Does Apache Kudu Support BI?

Kudu provides three key differentiators versus BI on Hadoop for deploying big data analytics.

Late Arriving Data

Apache Kudu enables fast inserts and updates against very large data sets, and it is ideal for handling late-arriving data for BI. Its efficient columnar scans enable multiple real-time analytic workloads on one storage layer. You get faster queries from Apache Impala and Apache Spark execution engines with Kudu, compared to other big data analytics platforms. For a detailed comparison, please see the Cloudera Engineering Blog article “Performance comparison of different formats and storage engines in the Apache Hadoop ecosystem.”

The capability of Apache Kudu to handle late-arriving data makes it a strong contender for IoT, machine analytics and other use cases where real-time analytics are critical.

Fast Changing Data

Many BI jobs involve fast moving data, for which Kudu excels. But data ingestion is not only about fast inserts. In many cases, we need to update fields while maintaining their historical values. Names and addresses change over time, as do product categories. Even our coordinates change as we drive through cities. At some point we may need to see what those values were at a specific point in time for cases such as comparing real-time and historical data.

In other storage systems, such as Hadoop Distributed File System (HDFS), once a record is closed, you can not modify it. In cases where you data changes — especially in cases where that data quickly changes — consider Kudu for your storage layer.

Simplified Architecture

Kudu is a strong contender for real-time use cases based on the characteristics outlined above. The columnar orientation and significant compression capabilities of Kudu also make analytics on historical data extremely fast. Therefore, Kudu can provide a simplified storage architecture for use cases involving real-time data alongside historical BI and analytics.

Kudu can be used for use cases like predictive maintenance in which you can identify not only the leading indicators of maintenance but also the patterns that historically precede those indicators.

See how Arcadia Data and Apache Kudu can easily integrate into your business intelligence environment.

Arcadia Data and Apache Kudu

For users of Arcadia Enterprise, Apache Kudu acts like a modern data warehouse. You can connect to Kudu tables in the same manner as any other storage layer to build visualizations that take advantage of the benefits of Kudu for BI described above.

For architects, you can deliver all of the benefits of Kudu while enabling business teams to seamlessly interact with and uncover insights from fast moving data via visual analytics and BI with Arcadia Enterprise.