January 25, 2018 - Richard Tomlinson | Big Data Ecosystem

Big Data Discovery Mode vs. Production Mode

When users talk about the features they require of their BI tools when working with big data, they typically end up describing their requirements in terms of two overarching themes relating to high-level analytical techniques, or modes of analysis. These can be summarized as follows:

Discovery Mode: Explore and Experiment – Ask new business questions on unknown data.

Production Mode. Monitor and Adjust – Provide trusted answers from data that is understood.

This article discusses these two modes of analysis, the differences between them and how they ultimately work together for better decision making and new business innovation. These guidelines can also be used when evaluating new and existing BI tools for your big data environment.

Before we look in detail at how these modes of analysis differ, we should note that they are highly complementary and always work together. Typically, new insights resulting from efforts in discovery mode become operationalized in production mode. For example, in discovery mode, an online retailer develops a new way to segment and score potential customers based on their social media preferences. In production mode, purchasing behavior within the new customer segments is monitored to understand effectiveness so the segmentation strategy can be adjusted accordingly.

Conversely, what we learn in production mode generates new questions requiring new data that first needs prototyping in discovery mode before operationalizing. In our example above, we decide our segments need enhancing with some TV viewing data that recently became available. In this case, we need to re-enter discovery mode to decide how to best incorporate this data into our segmentation model.

Given the definition of these modes of analysis are and how they work together it is worth taking a look at the differences between them as this drives the need for different sets of product capabilities required of our BI tools.

To help us explain the differences we will look at the modes of analysis across 4 distinct angles. These are:

  • The organizational factors driving the need for analytics
  • The types of applications and features being developed
  • The requirements for the data
  • The desired technology environment

Below is how modes of analysis differ across these categories:

Discovery Mode Production Mode
Organization Project Sponsorship Innovation Lab / CDO / Big Data Team LOB / IT Supporting LOB
Strategic Objective Create New Innovations and Processes Enhance Existing Processes
Business Requirements Mostly Unknown Upfront Mostly Understood In Advance
Application Deliverable Sandbox / Prototype Hardened / Live
Usage One-off / Occasional Regular / Frequent
Life Span Short Enduring
User Personas Citizen Data Scientist, Business Analyst Business Analyst, Business User
User Volume Small (a few specialists) Large (hundreds or thousands)
User Experience Open Ended / Exploratory Mostly Curated / Light Interactivity
Data Scope Samples / Slices Full / All
Granularity Atomic / Lowest Aggregated / Summarized
Volume Many data sets / billions of rows A few data sets / million of rows
Structure Raw / Diverse Curated / Modelled
Freshness Static / Not Changing Up to Date / Current
Security Mostly Open / Dataset Level Secure / Record and Attribute Level
Environment Platform Hadoop-Centric EDW-Centric
HA / DR Not Usually Required Usually Required
Governance Loose / Distributed Tight / Centralized

A BI tool should provide product capabilities to enable both of these modes of analysis. It is important that modern BI tools appreciate the full breadth of the analytics lifecycle and this often begins in discovery mode then heads into production mode later on (and iteratively back into discovery mode as time passes – as discussed above), especially when organizations aim to deliver brand new innovation through analytics on big data. These types of projects often start with little or no understanding of requirements upfront and no prior knowledge of the data sources and structures to be incorporated.

Organizations should not have to switch BI tools to move between discovery and production modes. There should be no artificial tradeoffs imposed between data and user volume, for example, modern BI tools should have the fundamental capabilities in place to handle both types of analytic scenarios.

Will your existing BI tool stand up to these tests?

This article originally appeared on Datamation.

Related Posts