A Day in the Life of a Business Analyst Using Streaming Analytics

Published on April 23, 2018

Before using this walkthrough, you need a working Arcadia Data and KSQL environment. Read this knowledge base article to quickly set one up. The accompanying video for this article is available here.  

ARCADIA INSTANT – Online Fraud Analysis Dashboard

About Arcadia Data and Confluent

Streaming data environments related to the Internet of Things, change data capture, and other time-series data sources keep growing in popularity. A technology-based catalyst for such environments is Apache Kafka, a platform for managing ordered event data in a publish-subscribe model. And while Kafka has been vital for the data management aspect of streaming data, challenges remain around how business analysts can easily derive real-time data analytics insights.

The dashboard above is a final rendition of a Streaming Clickstream example using Arcadia Data. The dashboard highlights suspicious activities relating to click fraud. Arcadia Data makes it easy to build streaming analytic applications in a matter of hours and not weeks or months. Before building solutions like this one required highly skilled infrastructure and programming resources. Arcadia Data and Confluent KSQL have simplified the way to build streaming analytic applications.

In the following sections you will learn how simple it is to construct your own streaming analytic apps using Arcadia Data and Confluent KSQL.

PREAMBLE: Chief Security Office and Click Fraud Use Case

What is Click Fraud?

About one-in-five (19.9%) of desktop clicks were fraudulent in January 2017. These clicks may be done by real people who physically attack your account, also known as manual click fraud, or by robots designed to click on your ads over and over without you ever knowing it. ‘Clickbots’ can hijack everyday consumers’ identities by operating home machines remotely. The bots closely mimicked human activity, clicking only a little more frequently than an actual person would and often dragging a mouse over an image or placing an item in a shopping basket without purchasing it. In this way the bots could imitate the kind of purchases an advertiser most desired.

Travel Booking Site:  CSO Call to Action

The Chief Security Officer(CSO) has initiated a new program to understand how click fraud could be impacting our advertising partners within our travel booking site. The CSO knows this is an expensive problem and many of their advertising partners are complaining and not purchasing ad space on the booking site. One advertiser just pulled a six-figure ad plan and there are several others with similar intent. The CSO wants answers now and has a call to action to get a real-time solution up and running by the end of the month. He has brought in Confluent and Arcadia Data to get started but wants the business analysts to be able to build solutions due to ever-changing requirements and business goals.  

The Day in the Life:  Business Analysts and Streaming Analytics

The Confluent KSQL environment is all setup and Streaming Topics for the booking site have been established. Arcadia Data has also been setup and some light training was given to the IT and business analysts. The business analyst wants to test how easy it is to construct streaming visuals and then share results with partners in an agile fashion. The rest of this document will demonstrate how easy it is to work with Arcadia Data and Confluent.

ARCADIA INSTANT – BUILD YOUR FIRST STREAMING VISUAL

Start the Arcadia Instant Server and Launch a Browser

Click the ‘DATA Tab’ and Click ‘NEW CONNECTION’

Fill in the Create New Data Connection Details

The Create New Data Connection dialog box will appear. Fill in the details and click ‘CONNECT’.

 

Connection Type:  Confluent KSQL

Connection Name:  Demo KSQL

Hostname or IP address:  localhost

Port #: 8082

 

Username and Password:  {blank}

 

Once connected to a Kafka data system you can easily browse existing topics and review streams and tables. For this exercise we are interested in topic: CLICKSTREAM_TIME. CLICKSTREAM_TIME contains a real time stream of user clicks on a website. In the next section we will show you how easy it is to establish a connection and build a dataset.

What is a Dataset: In Arcadia Data, datasets are defined on the connections to your data, and provide access to the specific tables in the data store. In Arcadia Enterprise, visualizations are built from datasets.

Other than providing access to data, datasets enhance data access in many ways such as table joins, hiding fields, changing data types, changing default aggregations, and providing user-friendly names.

 

Create a Streaming Dataset

Click ‘Demo KSQL’ on the left side of the browser under ‘NEW CONNECTION’.

Click ‘Connection Explorer’.

Find and click ‘CLICKSTREAM_TIME’ – Notice the icon to the left of the object:

Click ‘CLICKSTREAM_TIME’ – Notice the icon to the left of the object.

 

You can now explore the Sample Data within the Kafka Stream.

Click ‘New dataset’ to the right of the ‘CLICKSTREAM_TIME’ Stream Name.

The ‘New Dataset’ dialog box appears.

Type ‘Clickstream’ as the Dataset title and click ‘CREATE’.

Success!  You have created the Clickstream Dataset.

 

Now, let’s make a few adjustments and build a dashboard.

Click on the ‘Clickstream’ Dataset.

Click on ‘Fields’ and click ‘EDIT FIELDS’ button.

Let’s set some of the fields from Measures to Dimensions by clicking on the ‘Mes’ icon to the left of the following fields:

 

ROWTIME

STATUS

USERID

Click ‘SAVE’ and we are now ready to build our first Streaming Dashboard.

 

Your screen should look like the one below:

Create Your First Streaming Dashboard

Click ‘NEW DASHBOARD’ on the upper right-side of the browser window.

The Arcadia Instant Visual Designer Interface appears. The default type of visual is a table, but let’s create something more visual. Double click on the table in the green box below and we will develop our first streaming visual.

We want to see if there are any particular IP addresses sending more requests than others. We also want to see if there are any outliers related to STATUS. Let’s create a stacked bar visual showing the IP address of the user on the Y-Axis and the Record Count measure on the X-Axis. We will then drag the STATUS Dimension field to the Colors shelf.

 

Click the ‘Bars Visual’ type.

Click and drag the ‘Record Count’ measure to the X-Axis shelf.

Click and drag the ‘IP’ Dimension to the Y-Axis shelf.

Click and drag the ‘STATUS’ dimension field to the ‘Colors’ shelf.

Click ‘REFRESH VISUAL’.

Next, we want to sort by ‘Record Count’ in descending order and show the top 100 IP addresses with the most clicks.

 

Click the ‘Record Count’ field in the X-Axis.  The ‘FIELD PROPERTIES’ dialog box appears.

Click and expand ‘Orders and TOP K’ in the FIELD PROPERTIES dialog box.

Click ‘Descending’.

Type 100 in the Top K field.

Click ‘SAVE’ and ‘CLOSE’ in the upper left corner of the Visual Designer.

You are now back at the Dashboard Visual Designer; your screen should look like the one below:

Click ‘SAVE’.

Let’s add a title ‘KSQL Clickstream’ to the dashboard.

Click ‘SAVE’ when finished.

In just three steps we created a streaming dashboard and we show this to the Security team which leads to more questions? The security team has recognized that users who generate 50kb or more data seem to be up to nefarious activities. They want to know ‘who’ is doing this and ‘where’ are these users located.

To that end we have to go back to our dataset ‘Clickstream’ and join the stream to the ‘WEB_USERS’ table and build some new visuals.

 

Modify the ‘Clickstream’ Dataset

Click the ‘DATA’ tab in the grey bar.

Click the ‘Clickstream’ Dataset inside of the ‘Demo KSQL’ connection.

Click ‘Data Model’ tab.

Click ‘EDIT DATA MODEL’ and let’s join to the ‘WEB_USERS’ table.

Click the ‘+’ sign inside of the CLICKSTREAM_TIME stream object.

Select the following Topic Name: ‘clickstream_users’.

The Table or Stream name will automatically populate as: ‘WEB_USERS’.

Click ‘SELECT’ when done.

The ‘Edit Join’ dialog box will appear.  

Click ‘CLEAR FIELDS’ to remove the default fields selected by the tool.

Select ‘USERID’ and ‘USER_ID’ fields to specify the ‘join’ fields.

Click ‘APPLY’ when complete.

Click ‘SAVE’ and ‘SHOW DATA’ to review the results of the join.

Your screen should look like the one below.

 

Modify ‘Clickstream’ Dataset – Edit the Schema

Next let’s hide some unnecessary fields and put modify other schema elements.

Click on ‘Fields’ on the left menu.

Click ‘EDIT FIELDS’.

First, let’s hide some fields we don’t need that could cause confusion with our business analyst.

To hide fields, you simply Click (eye) icon next to the field.  See below.

Let’s hide the following fields:

 

Dimensions>CLICKSTREAM_TIME

ROWTIME

ROWKEY

Dimensions>WEB_USERS

WEB_USERS ROWKEY

Measures>WEB_USERS

WEB_USERS ROWTIME

Next, we will flip some of the ‘Mes’ (Measures) into ‘Dim’ (Dimensions) fields.

Click the ‘Mes’ icon in the Measures: ‘WEB_USERS’.

 

WEB_USERS ROWTIME

USER ID

REGISTERED AT

 

Click ‘SAVE’ when complete.

Add Some Visuals – Modify ‘KSQL Clickstream’ Dashboard

Click on the ‘Related Dashboards’.

Click the KSQL Clickstream Dashboard.

This will take us to the Dashboard Visual Editor.

Click ‘EDIT’ in the Dashboard Visual Designer.

Click the ‘Visuals’ Tab on the right.

Click the ‘NEW VISUAL’ button.

A new table visual will appear to the right of our stacked bar chart.

Double click in the new table visual and let’s build a new visual.

In the Visual Designer we will do the following to build the Click Count by City by Minutes Line chart.

Notice that our new fields are on the DATA shelf based on our Dataset change.

 

Click ‘Lines Visual Type’.

Drag the Dimension field ‘ROWTIMEHHMM’ in the X-Axis shelf.

Drag the Measure field ‘Record Count’ into the Y-Axis shelf.

Drag the Dimension field ‘CITY’ into the ‘Colors’ shelf.

Click ‘REFRESH VISUAL’ to see the data start to stream in and appear.

Click ‘SAVE’ and ‘CLOSE’.

Click ‘SAVE’ to save the work in the Dashboard.

Let’s now add a third visual to our Dashboard. This visual will detail users of the website that have generated more than 50 kb of data. These are heavy users of the site and they could be up to something.

Click ‘NEW VISUAL’ on the right side of the screen.

You will see the default table visual appear below the stacked bar chart.

Double click in the new table visual to enter the Visual Designer.

Click ‘Packed Bubbles Visual Type’.

Drag the Dimension field ‘USERNAME in the Dimensions shelf.

Drag the Measure field ‘BYTES’ into the Measures shelf.

Drag the Dimension field ‘BYTES’ into the Filters shelf.

Click ‘REFRESH VISUAL’ to see the data start to stream in and appear.

The next step we are going to add a calculation to the Measure and the Filter fields. This will allow us to see our users that have used 50 kb or more bytes.

 

The calculation we enter will be:

Measure: sum([BYTES]/1024)

Filters: sum([BYTES]/1024)>50

 

First, we will add the expression to the Measure:

Click on the sum([BYTES]) field on the Measure shelf.

The FIELD PROPERTIES dialog box will appear.

Click on ‘[ ] Enter/Edit Expression’.

The ‘Enter/Edit Expression’ dialog box appears.

Enter ‘sum([BYTES]/1024)’.

Click ‘VALIDATE EXPRESSION’.

Click ‘SAVE’ when complete.

Next, we will add the expression to the Filter:

Click on the ‘BYTES’ field on the Filter shelf.

The FIELD PROPERTIES dialog box will appear.

Click on ‘[] Enter/Edit Expression’.

 

Enter sum([BYTES]/1024)>50

Back at the Visual Editor Click ‘SAVE’ and ‘CLOSE’.

Once back at the Dashboard Click ‘SAVE’ and ‘VIEW’ to see our work.

 

Customizing Visuals – Changing Colors and Other Visual Properties

Add a Visual Title and Subtitle

Click the ‘EDIT’ icon on the dashboard.

Click the ‘EDIT’ icon on the dashboard to enter the Visual Design Editor.

Double click the ‘Stacked Horizontal Bar Chart’ and let’s make some changes.

Click the ‘Pencil Icon’ next to the top title bar of the Visual. Pencil Icon: 

Enter the title: ‘IP by Status’.

Below the ‘Top Title Bar’ in the sub-title type:

‘Status codes are issued by a server in response to a client’s request made to the server’

Click ‘SAVE’.

Let’s Modify the Colors of the Visual

Note that we are still in the Visual Designer of the stacked bar chart. Now we will change the colors of the bars.

Click the ‘Colors’ icon on the right side of the screen. 

Feel free to select any color pallet setting you wish. In this example we selected the blue monochromatic style and selected the ‘Reverse order of colors in the palettes’ option.

Click ‘REFRESH VISUAL’ to review.

Click ‘SAVE’ when done.

Change the Axis Titles for Improved Readability

Finally, we will modify the X-Axis title: ‘Record Count [limit 100]’ so that it reads: ‘Click Count [limit 100]. This will more accurately set the context of our visual.

Click the ‘Record Count’ field in the X-Axis shelf.  The ‘FIELD PROPERTIES’ will show to the right of the field.

Click ‘Alias’ and replace ‘Record Count’ with ‘Click Count’ and press enter on the keyboard.

Click ‘REFRESH VISUAL’.

Click ‘SAVE’ and ‘CLOSE’ and let’s review our dashboard.

At the Dashboard click ‘SAVE’ and ‘VIEW.

 

Make It Your Own

Feel free to change visual types, change colors, and other settings to create your own look and feel of the dashboard.  

ARCADIA INSTANT – Importing Visual Artifacts

Import the ‘Online Fraud Analysis’ Dashboard

At the beginning of this article we reviewed the ‘Online Fraud Analysis’ dashboard and are providing it to you for your own review. In this section of the document we will demonstrate how to import this dashboard below:

Why is This Important

This set of features address the following real-world scenarios:

  1. Porting visual artifacts from development/testing environments to production deployments.
  2. Creating back-ups of ArcViz artifacts.
  3. Sharing these artifacts in communities.

In an enterprise environment, it is a common practice to develop and test applications in one environment before making them widely available in a client-facing production environment.

Arcadia Enterprise provides an easy UI method to export visual artifacts (visuals, dashboards, and apps, plus their dependencies) by saving them to a *.json format file. This file can then be used to import the visual artifacts to a new system, where they can be successfully deployed (provided the new environment has identical data connections).

Arcadia Data visual artifacts do not store the actual raw data, or results of queries; instead, they contain information about the layout and settings, the associated dataset, style customizations, and the query. When saving artifact information for export, Arcadia Enterprise also captures all these dependencies. For example, when you export a visual that uses a custom style, the export includes that custom style, as well as the dataset. As a result, when you import the visual into a new system, it looks exactly the same as it did at the source.

 

How to Import Visual Artifacts

Download the OnlineFraudAnalysis.json file to your local file system.

Click the ‘DATA’ tab on the top grey bar.

Click the ‘…’ icon and click ‘Import Visual Artifacts’.

The ‘Import Visual Artifacts’ dialog box appears.

Click ‘Choose File’ and select the ‘OnlineFraudAnalysis.json’ file you downloaded in the previous step.

Click ‘Import’.

The manifest of objects included in the JSON file will appear.

Click ‘ACCEPT AND IMPORT’.

When completed you should see the following message:

Click ‘VISUALS’ in the upper grey bar.

Click the ‘Online Fraud Analysis’ Dashboard.

If the data is streaming you should see the dashboard come to life: 

Summary: Arcadia Instant and Confluent

Real-time Streaming Visuals for Everyone

Let’s quickly summarize: in about 15 to 20 minutes we built a Streaming Analytic App using Arcadia Instant and Confluent KSQL. Until now real-time streaming applications and analytics were challenging to build. They required some very sophisticated programming skills and other talents that most business analysts never had time to master.  

Arcadia Data has lifted these barriers enabling business intelligence on streaming datasets. In this exercise we didn’t have to write a single line of code. Everything was drag and drop and provided through a single and easy to use web interface. This capability opens up a whole new set of use cases to explore in order to optimize your organization as the speed of data.

Arcadia Data allows you to focus on the business challenge and not all the hard stuff.  

By the way, Arcadia Data is not just for streaming data but is also a powerful business intelligence tool for your modern data lake environment. Give it a try for yourself with Arcadia Instant.