Recent headlines would have you believe that Hadoop is falling or that big data growth is stalling because fewer companies are looking to invest (Gartner). But with two of the big three Hadoop / “big data” vendors now public, we have a lot more visibility into the real financials and investments companies are making in Apache Hadoop, Apache Spark, and other big data technologies, platforms, and services. It’s too early to call big data a winner or loser, just as you can’t call a winner in a baseball game in the 4th inning.
Cloudera released earnings which showed Q1 subscription revenue up 59% year-over-year, and a net expansion rate of 142%. MapR announced revenue growth up 81% year-over-year for fiscal Q1. This points to rapid growth of the data management and “data science platform” market, which one could argue is just one part of the total market and only represents the initial growth of the big data market which is the data platform/storage foundation and perhaps the FIRST use case on the big data platform. A Gartner survey from late 2016 shows that the largest segment of organizations who are already investing or planning to invest in big data are in the “pilot” phase (30%), which can be interpreted to say that they have not even begun to realize the value from their investment:
Yet, a recent article in Forbes talks about the CAGR for big data being around 50%, which is extremely strong, even if margins are lower than in previous software markets. Ovum Research pegs big data (including parts of data management and BI/analytics) forecasts at 48% in a recent report, but goes on to state (highlight is mine),
“This might appear conservative compared to market hype, but we believe that, given its nascence, we need to monitor this market closely; the likelihood is we will revise up the forecasted growth in future years.”
What happens when organizations have the tools and applications which unlock the value of big data to more audiences internally, as well as customer-facing, data-centric applications? More value is yet to come.
Big Data is in the 3rd or 4th Inning – What’s Next?
John Schroeder, Executive Chairman and Founder at MapR Technologies, takes a long view on the big data market and has said (in late 2015) that we are in the 2nd or 3rd inning of the game, to use a baseball analogy. What excites me about the big data market is I think we are entering the “big data analytics/applications” growth phase where we will see the explosion of different uses/applications for the data we have not been traditionally able to cost-effectively capture and process. This means we’re now in the 3rd or 4th inning, to use John’s analogy, and there is a lot more action and scoring yet to come as Hadoop and big data shift from being primarily a data storage / process / statistics platform to a true application platform. A new breed of business user / analyst tools are emerging which provide self-service access and analysis to this new data, but (more importantly) enable easy development of business applications.
We have seen a similar stair-step growth where relational databases initially proliferated in the 80s and 90s to capture transactional data from point-of-sale and other accounting-based financial applications. The next phase of growth was the data warehousing and BI market which developed to create other insights from this (primarily) transactional data in addition to simply closing the accounting books. Hadoop and other big data platforms represent the next-generation data platforms with the agility to handle modern data formats and volume, and just like the RDBMS, they had rapid early adoption for an initial set of use cases such as storage offload, data transformation, and deep data science. However, just like the data warehouse architecture has shown that it is not well suited (i.e., cost effective) for new data types and volumes, it is becoming apparent that the traditional client-server architecture of legacy BI tools is inadequate and is limiting the potential value of big data.
It’s a Parallel World – Bring on MPP Analytics
At the end of the day, we are in a parallel world (not to be confused with a parallel universe). What do I mean by that? We all know that digitization of our lives has created massive volumes and variety of data, requiring a new approach. Google and Yahoo showed the way with massively parallel processing (MPP) or distributed computing platforms to accelerate search and 1000s of applications we now depend on in the digital age. The real explosion of growth and value came with applications for email, maps, translation services, advertising, and more. These applications took advantage of low cost, commodity hardware which scaled horizontally as well as parallel processing frameworks such as MapReduce.
The industry and open source community have been busily working to create easier ways to access the data in systems such as HDFS using familiar, business-friendly access methods like SQL. Think Apache Hive, Apache Impala, and Apache Drill. But if you’re just treating Hadoop or other big data platforms as a data source and sucking data out into traditional reporting and BI tools which only scale “up” in a single server or desktop environment, isn’t that self-limiting? Can you even analyze the full set of data, or will you have to sample the data or build complex data transformation/aggregation pipelines? Or are you going to hire a team of data engineers to develop applications on these modern data platforms in order for your business users to actually gain insights?
What if you could easily visualize and analyze data as well as develop rich UX-driven applications directly on your big data platform? That’s what companies like Procter & Gamble and Kaiser Permanente are doing with Arcadia Data, and it’s why I am very excited about the “big data” future and the value we can deliver to industries as we couple modern UI / analytics / applications with modern data platforms built to handle the opportunity of big data.
As companies build their 2nd, 3rd, or 4th application on this shared and scalable data architecture, the value from their data compounds exponentially. For example, the data you collect on customers, products, promotions, financials, and location can help you understand how marketing is driving sales in different locations but also supply chain and logistics optimization to ultimately enable lower costs to consumers. There are economies of scale with supporting more applications and users on a single massive data lake / hub. The trick is making it easier to give more people secure access, and enable rapid application development and deployment to customers, suppliers and partners. This is where Arcadia Data is focused.
In summary, big data vendors are showing very strong growth with a CAGR of 50%. The hype is slowing down, but the real value and growth is only just beginning as we’re still in the early innings. As organizations “load the bases” with more applications on big data platforms and use next-generation visual analytics and application platforms like Arcadia Data which are native to big data, we can expect to see more runs and even a grand slam or two. Stay tuned.