May 10, 2018 - Dan Woods - Evolved Media | Big Data Ecosystem

The Top Six Reasons Data Lakes Have Failed to Live Up To Expectations

This post was guest-written by Dan Woods, CEO of Evolved Media & Chief Analyst of Early Adopter Research. Dan creates ideas about technology products, based on a broad technical understanding. By writing as an analyst in Forbes and working with Evolved Media’s clients, he sees the magic in technology and why it matters to IT buyers.

It was just a few years ago that the excitement around the opportunities offered by data lakes peaked. But since that time, companies have struggled to capitalize on the potential of data lakes to derive new value for their business.

In most companies, data lake projects began based on the tantalizing idea that new signals could be gathered from the explosion of big data available to businesses of all sizes. The ability to store and process big data to find these signals using new analytic techniques, brought about many victories across industries.

However, for many companies, these victories were essentially custom projects that worked basically as pilots — they didn’t extend to making data lakes integral to the entire organization. It soon became clear that companies would face a difficult challenge in expanding the use of data in data lakes to a wider range of people, while efficiently operationalizing this data to promote the easy discovery of new signals and subsequent incorporation of those insights into business processes.

That challenge has left many companies with the belief that data lakes have failed altogether. That trend can be altered, and data lakes can still provide immense value. But to ensure that companies can realize this value, it is important for us to understand what went wrong with data lakes initially, so that we can figure out how to fix the problems.

As companies found themselves overwhelmed with massive amounts of big data that their data warehouses could not handle or process, data lakes emerged as a potential solution. Data lakes promised to store all data, and to do so cheaper than traditional data warehouses. They also offered the ability to find and scale insights from this data.

Looking back, data lakes have frustrated companies primarily because while they have served as valuable storage locations for data, the promised ability to perform analytics in the data lake hasn’t materialized.

For businesses that have implemented data lakes, here are the most important failure modes we’ve seen. If these sound familiar, it’s probably time to re-envision your data lake and retool it to remedy these challenges.

Failure mode 1: Data lakes become data swamps

Companies have struggled to use data lakes as more than just near endless repositories of data. The result is that they end up hoarding data in data lakes, but without any organization or structure, so that analysts who want to use the data have no idea how to do so. The data ends up just sitting in the data lake and is almost never used.

Failure mode 2: Data never put into production

Another main failure mode of data lakes has been that because of how disorganized they are in most businesses, data is allowed to fester in data lakes. As a result, the process to extract signals from it is cumbersome and the data is never fresh enough, or relevant in real-time, to actually be put into production. So the data in the data lake remains in pilot mode.

Failure mode 3: Asking small, not big data questions

Companies have also undermined the value of their data lakes by viewing them and big data in general within the confined perspective of the data warehouse. Users only ask the types of questions they’ve asked in the past, and fail to recognize how much more powerful data lakes can be. They fail to understand how much more signal can be extracted from big data because they only ask questions they know could be answered in the past. This requires as much a mindset as an operational change to correct.

Failure mode 4: Failing to gain added value

When companies treat their data lakes like updated versions of their data warehouses, they fail to experience value from their data lakes. Running data warehouse-era processes on data lakes, like moving data across clusters to separate data marts or BI servers, creating schemas, or extracting subsets, are all ways businesses limit the value of their data lakes. These older processes lengthen the time it takes to process and analyze data. Data lakes require new processes that capitalize on the granularity of raw big data and the ability to perform analysis in the same place as storage.

Failure mode 5: A lack of business impact

The fifth failure mode that companies commonly have experienced with data lakes is an imbalance between the significant investment they’ve made in data lakes and the relative lack of impact that data from the data lake is having on business decisions. For the insights generated from data lakes to matter, they have to drive changes in behavior. For this to occur, companies must empower managers to act and make decisions based on analytics from the data lake. Additionally, companies must implement ways to operationalize data into real-time business processes so that data lakes can truly impact the bottom line.

Failure mode 6: An inability to mine data lakes for analytics

The final major failure mode of data lakes has been one that’s particularly problematic for analysts: Namely, the difficulty of being able to exploit the data in data lakes and put it to use for analytics. The cause of this problem has been that too often, someone other than the analyst owns the data, and they have to check with them before using it. Companies need to streamline processes and remove barriers for analysts. Additionally, businesses need to ensure that analysts have the tools they need to work with data in the data lake in a way that works best for them.

All six of these failure modes point to how companies need to rethink their processes and data architecture to fully utilize data lakes. Data must be better organized within the data lake and analyzed in place for it to really offer value. If these changes occur, the failure modes can be overcome and tremendous benefits can be derived from data lakes.

To learn more, download the white paper, “Saving the Data Lake.”

Related Posts