The Power and Flexibility of Derived Data, Part 2

Published on January 18, 2017

In part one of the Derived Data feature deep dive we looked at how to perform a multi-step analysis that classified airlines into bins based on departure delays at airports.  In this post, we will continue to explore the power of Derived Data with a marketing use case.  Unique Visitors and Pageviews are two important metrics that indicate the overall health of a website. Using data generated by a site analytics vendor, let’s study usage patterns for a given date range.

A marketer would like to group the number of unique visitors by the number of times they view pages – in other words, “How many visitors had x many page views?”.  This kind of grouping will quickly give an approximation of how deep a user’s interest level is for the web site’s content.  We’ll start by summing the page views by Visitor ID:

We want to use ‘Pageviews’ on the Y Shelf as an attribute in the final output.  In order to do so, we’ll need to save this expression as Derived Data.  The marketer has asked for flexibility with analyzing different date ranges, so we’re including a parameterized date filter in the expression.  This feature is new to the Arcadia Data 3.3 release, increasing the flexibility with which the marketer can wield the power of Derived Data.

We indicate to the product it should accept parameter values for calendar_key by using angle brackets like this: “<<parametername:defaultvalue>>”.  The parametername corresponds to the Output Parameter value specified in the Application Filter configuration (in this example, it’s called “dateparam”).  The defaultvalue will be applied if nothing is selected from the Application filter.  Since we are filtering on a date column and using the calendar widget to specify a range, dateparam has attributes called “.start” and “.end” that allow us to identify the minimum and maximum date values.

When we view the saved expression, we see how it will accept the values sent to the visual by the Application Filters, thus enabling a dynamic Derived Data definition.  As the user filters for different date values in the Application, the query result will change based on those selections.

Now we can group the number of unique visitors by the number of pageviews that occurred in a given date range.  We drag ‘Pageviews DD’ to the Dimensions Shelf and ‘Visitor ID to the Measures Shelf, where we apply a Count(Distinct) aggregation.

From this point we have numerous options for how we choose to interpret the results and whether we’d like to augment the analysis with additional information.  It’s likely that, for this particular website, no human, non-administrative visitor had over 200 page views in a two day period (the range we filtered for in the original Derived Data query), so we’ll focus on page view counts below that number.  The marketer would like to see the largest group of unique visitors first, so we sort by that measure descending, followed by the page views:

Let’s include a count of videos started, also an indicator of interest and user engagement, as an additional measure.

When we add this visual to an app and include a date filter, we allow the marketer to choose a specific date range for which the analysis should occur.  By parameterizing the Derived Data expression that calculates page views to accept filter selections from the parent Application, we make that analysis seamlessly dynamic.

While this analysis does not imply a correlation between page views and video starts, it does highlight some interesting data points (for example, the spike of video starts for visitors with 86 page views) that provoke further analysis and possibly some valuable insights (“What content did we feature on those days?”, “Why was that content popular with more frequent visitors?”, etc.).

Summary

In this post, we covered how to gauge the general health of a website over a particular date range.  Using parameterized Derived Data we enabled a complex, multi-pass analysis to calculate one measure (Pageviews by Visitor ID) and subsequently group by those aggregated results to calculate a different measure (Number of Unique Visitors by Pageviews).  By defining parameters in the Derived Data filter shelf we created a dynamic expression that changed with the user’s filter selection in the parent Application.  Parameterized Derived Data, available with Arcadia Enterprise 3.3, opens new levels of flexibility and depth for analyzing massive datasets.  How will you choose to unleash its power? Try Arcadia Instant today.


Arcadia Instant, Release 4.2.1
Copyright © 2018, Arcadia Data Inc. All rights reserved.
Category: Big Data Analysts, How To