This post was guest-written by Dan Woods, CEO of Evolved Media & Chief Analyst of Early Adopter Research. Dan creates ideas about technology products, based on a broad technical understanding. By writing as an analyst in Forbes and working with Evolved Media’s clients, he sees the magic in technology and why it matters to IT buyers.
In one sense, all data analysis is about finding nuggets of information that are useful within a business context. That’s as much the foundation for the promise of AI and machine learning, as it is for almost any other type of new data-related technology, including data lakes. The goal is always to extract the signal from the noise, so that businesses do not just collect data, but have a way to put it to use.
We’re currently living in a big data world. Yet despite the advent of new technology to support this deluge of data, many companies are still operating from a small data mindset. Paramount to this approach is asking the wrong type of questions of your big data, and underutilizing the more potent technology designed to handle it.
The difference between big data and small data is not that you’re only looking for important signals from big data. Rather, it’s the type of questions you’re answering, the type of signals you’re trying to uncover, and the ways you’re going about uncovering those insights. The world of the data warehouse has been with us for so long and even today, remains so dominant in our thinking that it’s sometimes easy to think of our big data world and data lakes through the lens of the data warehouse. But to fully operationalize big data from your data lake, you need to avoid this trap by fully grasping the difference between big and small data questions. Only then can you make sure you are using your data lake to its fullest potential.
Why Big Data Questions Matter for the Data Lake
Data lakes first appeared a few years ago as an antidote to the challenges of storing and analyzing big data. They promised a way to overcome the limitations of the data warehouse (which was never designed nor equipped to handle the volume, variety, and velocity of big data) by offering affordable storage of all big data in one place, the ability to extract insights at scale from this data, to then use new technology to blend this data with existing workloads in real-time, and to do all of this in a cheaper, more flexible architecture than that provided by the data warehouse.
However, many companies have grown frustrated over time by a perceived lack of ROI from their data lakes. Data lakes have seemingly failed to live up to their potential.
But one of the primary problems with data lakes is that companies and analysts have treated them with a data warehouse approach, rather than recognizing the new opportunities they open up. Nowhere is this more clear than when companies continue to ask small data questions of the big data within data lakes.
The difference between big and small data questions
To get the maximum value out of their data lakes, companies need to learn how to ask big data questions. The difference between big and small data questions boils down to the depth and granularity of what’s being asked. Big data questions require a deeper investigation of data. They are not just looking for summary answers that provide broad, general perspectives. For these reasons, big data questions cannot be answered with simple descriptions like small data questions. They are questions that can deeply affect processes and the understanding of the business.
By asking small data questions, companies not only are underutilizing the power of data lakes and the big data stored within them; they’re also resigning themselves to a low resolution view of the business when high resolution is available.
How to know you’re asking the right big data questions
Ensuring that your company is asking big data questions of data in its data lake may take some practice. Many analysts have become so accustomed to only being able to ask small data questions, and therefore having to suffice with the answers and insights offered by these questions, that they need to learn how to operate in a world where big data questions are possible. To make certain you’re asking the right questions, it’s helpful to compare real examples of big and small data questions:
Example 1: Sales
- Small data question: What is the trend for sales over the past two years?
- Big data question: What is the segment of customers active in the past two years who are most likely to churn?
Example 2: Security
- Small data question: What is the login/logout behavior of our contractors?
- Big data question: What are the risks associated with the content my contractors are accessing? Correlate online actions using multiple sources (packets, logins, IP addresses, web pages, data accessed).
Learning to ask big data questions is one of the best ways to ensure you’re getting the most value out of your data lake and big data. Data lakes have more power than older tools, but that power will remain dormant if your analysts are not asking questions that take advantage of it. This requires a change that is as much cultural as technological.