April 23, 2018 - Dale Kim | Big Data Ecosystem

Visualizations on Apache Kafka for the Business Folks

For those of you who are always on the move, I’ll not hold you down and instead quickly tell you that you can easily get started with visualizations on Apache Kafka topics with KSQL without a big time investment, and you can get help from us in our new community forum at alexandria.arcadiadata.com.

For the rest of you, let me elaborate…

Visualizations for Apache Kafka Kit

As you might know, Apache Kafka is an extremely powerful platform for managing streaming data. You save data in its most granular form, and each data point contains details around specific events. An “event” can be anything from a temperature reading of a machine part, to the sale of a specific product, to the call record of a mobile phone user. These events typically have timestamps so you can naturally assess trends and patterns in your data points over time.

Processing and analyzing data in Kafka typically involves writing code, but I expect that story to change soon. Last fall, Confluent, the company founded by the creators of Kafka, released their KSQL product which added a SQL processing engine on top of Kafka. Not only did that open up Kafka to a broader technical audience who could now use their SQL skills to query streaming data, but it also led to opportunities for non-technical users to get in on the fun. This is because KSQL opened doors for general-purpose analytics platforms to query streaming data. I’m not referring to just any analytics platform, of course, but ones with architectures well-suited for streaming data. The platform needs to efficiently update visualizations to reflect real-time data, it needs to include windowing extensions to SQL, and it needs to treat streams in the interface almost as if they were standard tables.

Now here’s the kicker: having a visualization platform that’s tightly integrated with KSQL means that you no longer have to spend development time to ETL your streaming data into a database just to visualize it. You retain the real-time value of streaming data by directly querying the stream. You keep the value of granular event data that doesn’t get lost in a separate “state-based” database. You eliminate a backend ETL process that you used only because you didn’t previously have a Kafka-ready analytics platform. Arcadia Data is that platform.

As a Confluent partner, Arcadia Data made a commitment to help showcase KSQL, and we simultaneously announced our support for KSQL last fall. We shipped our latest version of Arcadia Enterprise on March 20 which offered the new KSQL integration, and the next logical step for us was to include KSQL integration in our freely downloadable product, Arcadia Instant.

Our goal was to show the world that streaming analytics had greater applicability beyond application development shops. But for all the streaming enthusiasts who want to get started with a desktop-based analytical tool on Kafka data, merely providing the visualization technology wouldn’t be enough. Having access to the data source was another key ingredient. Not surprisingly, we decided to provide enough bits so even those without a KSQL/Kafka cluster could get started. We’ve simplified the process by offering a freely available, downloadable package that includes KSQL and Kafka that you can run in Docker containers. Exploring visualizations on Kafka topics no longer requires setting up a cluster of servers or provisioning cloud instances – you can do this on your desktop, and likely even your laptop. If you have about 8 GB RAM on your system, then you have plenty of power to run Arcadia Data on Kafka. We also have a few walkthroughs that will guide you through the process of setting up a visualization environment.

Start with the Get Running with KSQL Guide which walks you through these steps:

    1. Install Arcadia Instant
    2. Install Docker
    3. Run a few commands in Docker, which will download more stuff
    4. Connect Arcadia Instant to the KSQL/Kafka processes running in Docker containers
    5. Start visualizing Kafka topics

After you have the basic configuration set up, you can build your own visualizations by following along in this guide, A Day in the Life of a Business Analyst Using Streaming Analytics.

If you need any help along the way, I’m happy to announce our new community forum at alexandria.arcadiadata.com. This forum is not only about Kafka, it will be a resource for anyone who wants to participate in discussions on visualizations, analytics, BI, and all things big data.

Although a primary goal was to make visualizations on Kafka easy, it certainly was not our only goal. We’re providing capabilities to help you get the most value from streaming data. We are also putting in place additional enhancements to make visualizations on Kafka more powerful. One upcoming feature is the support for complex data types in Kafka. This is important because more streaming data sources are publishing nested data, so you need an analytics platform that can read that data natively. If you can’t read the data natively, then you need to transform/flatten the data and then store it in an intermediary store for analysis. Such a configuration removes the real-time aspect of your Kafka streams, and also inserts more dependence on IT. Support for complex data types was recently released in Arcadia Data on non-streaming data, so much of the heavy lifting is already in place, and our next step is to extend the support to cover streaming data.

Another upcoming feature is what we refer to as “Time Warp,” which lets you specify any window of time in a Kafka topic to analyze. While streaming data is often associated with real-time processing, there is also value in the history of the data, so Time Warp lets you compare trends and patterns in the past to what’s happening in the present.

There’s much more to come from Arcadia Data with regard to visualizations on Kafka topics, so stay tuned. But in the meantime, get a jump on everyone else and check out our visualization starter kit.


Related Posts