January 25, 2018 - Richard Tomlinson | Big Data Ecosystem

Big Data Discovery Mode vs. Production Mode

When users talk about the features they require of their BI tools when working with big data, they typically end up describing their requirements in terms of two overarching themes relating to high-level analytical techniques, or modes of analysis. These can be summarized as follows:

Discovery Mode: Explore and Experiment – Ask new business questions on unknown data.

Production Mode. Monitor and Adjust – Provide trusted answers from data that is understood.

This article discusses these two modes of analysis, the differences between them and how they ultimately work together for better decision making and new business innovation. These guidelines can also be used when evaluating new and existing BI tools for your big data environment.

Before we look in detail at how these modes of analysis differ, we should note that they are highly complementary and always work together. Typically, new insights resulting from efforts in discovery mode become operationalized in production mode. For example, in discovery mode, an online retailer develops a new way to segment and score potential customers based on their social media preferences. In production mode, purchasing behavior within the new customer segments is monitored to understand effectiveness so the segmentation strategy can be adjusted accordingly.

Conversely, what we learn in production mode generates new questions requiring new data that first needs prototyping in discovery mode before operationalizing. In our example above, we decide our segments need enhancing with some TV viewing data that recently became available. In this case, we need to re-enter discovery mode to decide how to best incorporate this data into our segmentation model.

Given the definition of these modes of analysis are and how they work together it is worth taking a look at the differences between them as this drives the need for different sets of product capabilities required of our BI tools.

To help us explain the differences we will look at the modes of analysis across 4 distinct angles. These are:

  • The organizational factors driving the need for analytics
  • The types of applications and features being developed
  • The requirements for the data
  • The desired technology environment

Below is how modes of analysis differ across these categories:

Discovery ModeProduction Mode
OrganizationProject SponsorshipInnovation Lab / CDO / Big Data TeamLOB / IT Supporting LOB
Strategic ObjectiveCreate New Innovations and ProcessesEnhance Existing Processes
Business RequirementsMostly Unknown UpfrontMostly Understood In Advance
ApplicationDeliverableSandbox / PrototypeHardened / Live
UsageOne-off / OccasionalRegular / Frequent
Life SpanShortEnduring
User PersonasCitizen Data Scientist, Business AnalystBusiness Analyst, Business User
User VolumeSmall (a few specialists)Large (hundreds or thousands)
User ExperienceOpen Ended / ExploratoryMostly Curated / Light Interactivity
DataScopeSamples / SlicesFull / All
GranularityAtomic / LowestAggregated / Summarized
VolumeMany data sets / billions of rowsA few data sets / million of rows
StructureRaw / DiverseCurated / Modelled
FreshnessStatic / Not ChangingUp to Date / Current
SecurityMostly Open / Dataset LevelSecure / Record and Attribute Level
EnvironmentPlatformHadoop-CentricEDW-Centric
HA / DRNot Usually RequiredUsually Required
GovernanceLoose / DistributedTight / Centralized

A BI tool should provide product capabilities to enable both of these modes of analysis. It is important that modern BI tools appreciate the full breadth of the analytics lifecycle and this often begins in discovery mode then heads into production mode later on (and iteratively back into discovery mode as time passes – as discussed above), especially when organizations aim to deliver brand new innovation through analytics on big data. These types of projects often start with little or no understanding of requirements upfront and no prior knowledge of the data sources and structures to be incorporated.

Organizations should not have to switch BI tools to move between discovery and production modes. There should be no artificial tradeoffs imposed between data and user volume, for example, modern BI tools should have the fundamental capabilities in place to handle both types of analytic scenarios.

Will your existing BI tool stand up to these tests?

This article originally appeared on Datamation.


Related Posts