This post was guest-written by Dan Woods, CEO of Evolved Media & Chief Analyst of Early Adopter Research. Dan creates ideas about technology products, based on a broad technical understanding. By writing as an analyst in Forbes and working with Evolved Media’s clients, he sees the magic in technology and why it matters to IT buyers.
We have entered an interesting turning point in the history of data lakes. The first wave of implementation of data lakes was beset by a variety of problems that were typical of those encountered when a powerful new technology arrives on the market. It takes time for users to understand how best to maximize the value of a new technology and also to change business cultures and processes that have long been accustomed to operating with previous technology.
So, while the challenges companies have experienced thus far with data lakes have been real, it’s important to make sure that these early frustrations don’t obscure or prevent the ultimate victory data lakes can provide. That achievement will be about using data lakes to exploit big data and new forms of analysis to generate a new category of signals that were never before possible to obtain. As a result, many companies are now facing the task of retooling their data lake projects to ensure they are positioned to get the value from them that they had originally hoped.
This article lays out some recommendations on how you can make your data lake 2.0 finally live up to your expectations.
The Five Key Ingredients to Making Your Data Lake Thrive
It’s worth the effort to reinvest in your data lake because, used effectively, data lakes offer a tremendous impact on analytics. Data lakes can shift the amount of value companies are getting out of their big data, something that cannot be done solely with a data warehouse. They enable more users of all technical experience to directly work with data in a way that means more context will be added to data analysis. This leads to data more directly impacting business decisions.
There are four main things companies must do to get the most value from their data lakes.
- Take a data native approach
- Ask big data questions
- Have a vision for operationalizing data
- Organize your data lake
- Change your culture
Each of these is outlined in more detail below.
Taking a Data-Native Approach
Data lakes offer technological capabilities that data warehouses simply do not possess. It’s not only that data lakes can store all big data, regardless of form or type. It’s also that they allow for data to be stored and processed in the same place. Doing so is adopting a data-native approach that is essential to saving your data lake.
A data-native approach gets the most out of data lakes because data is analyzed in place with big data tools. This allows companies to get high resolution views of their data and to cut down on the data extracts that are hallmarks of small data approaches. Additionally, a data-native approach empowers business analysts to do semantic modeling on the data themselves, empowering them to apply their their business context and understanding to the data themselves, without needing to wait for IT. Business analysts can explore data and make semantic model adjustments immediately, without having to wait to get feedback from other groups. This direct interaction with the data allows them, and the entire organization, to understand and operationalize data in a faster, more agile way. Having data and analytics in one place also certifies that the same data is being used across the organization.
Asking Big Data Questions
There’s no point in buying a more powerful tool if you’re only going to use it as you would the more limited one you had in the past. For instance, you don’t purchase a Lamborghini to drive 25 miles per hour to complete your daily errands.
In the same way, companies have failed to get the most out of their data lakes because they’ve operated with them from a small data mindset. They have not recognized the increased granularity big data provides, and therefore continued to ask small data questions of big data. As a result, the impact of the answers they’ve received has been limited.
Big data questions take advantage of the depth of big data and provide complex, descriptive answers. A small data question would be asking the total revenue for your Northeast Region. A big data question would be asking for detailed customer journey information on high-value customers who have contacted customer service over the past three years and who have churned. Both questions offer you some insight into your business, but only the big data question can supply transformative details.
Moving to a place where you’re consistently asking big data questions takes time and involves a cultural change. Analysts and managers must learn how to ask these questions and overcome their experience with having to make do with only small data answers.
Have a Vision for Operationalizing Data
Many companies have become dissatisfied with their data lakes because they do not implement a plan from the beginning that outlines how they will operationalize the insights from big data. To experience the greatest impact, companies must have such a plan. Analysis and data exploration should lead to continuous improvement and adjustments to drive business results. And companies should automate the incorporation of insights into transactional applications and business workflows to drive a steady flow of business benefits so that big data analytics became a de facto way of guiding decisions.
Organize Your Data Lake
Companies must ensure data lakes do not become data swamps. It’s a tremendous step forward that companies have a tool that can collect and store all their big data, but if businesses are simply using data lakes as a way to place data on a shelf, never to be looked at again, the value of their data lake will be undermined.
Thus, companies must adopt tools that help them to organize the data within their data lakes so users can find what they need in an expedited fashion. This helps to keep data fresh and encourages real-time use.
Change Your Culture
Getting the most out of your data lake involves not just technological innovation, but cultural innovation as well. Companies must recognize they have a Lamborghini, not a Toyota at their disposal, and must change business practices and processes accordingly. Data lakes will never be valuable if they’re treated like data warehouses. Small data thinking, processes, and questions should not be carried over to the data lake. If users are trying to work with extracts of information from the data lake in Excel, that’s a sign of small data thinking.
To cultivate effective cultural change that empowers them to take full advantage of their data lake, companies should focus on:
- What tools to use
- What processes to put in place
- What types of questions you can answer
- What outputs you can expect
While this cultural shift will take time, it’s not all-or-nothing. An incremental, agile approach offers the benefit of short-term wins, which in turn fuel and inspire more people in the organization to use and contribute to data analytics. The positive results compound over time, increasing the value of your data lake.
Ultimately, by investing in a strategic way, companies will find that their data lakes can live up to their original potential and provide:
- A playground for analysts with all of the data they need
- Easy integration into analytic platforms and business applications
- Operationalized data in a broad timescale from batch and interactive usage to real time
- Incorporation of all big data, including unstructured data and via schema on read