Chapter 2:
BI and Analytics Meet Business Transformation

The Present and Future of Enterprise Reporting

Traditional data warehouses and enterprise data hubs have little (if any) support for streaming or unstructured data in their native format unless IT endures the pain and expense of preparing that data for the structured warehouse. Organizations are also finding it desirable to move some little-used warehouse data over to Hadoop where it can be stored more economically.

So the bottom line is that both environments—Hadoop and RDBMS/enterprise data warehouses—each have unique, prized attributes. RDBMs are the past and present. Big data analytics on Hadoop represent the present and future, especially due to the massive growth of valuable data that businesses collect. As Forrester VP and Principal Analyst Boris Evelson summed it up, “Most BI applications are smart and only request aggregate, not detailed, result sets, minimizing network traffic between the app and database servers. But as big data volumes increase and enterprises mature their data mining and exploration applications, there’ll be an increasing requirement to analyze data at a detailed level, putting strain on network bandwidth.” Big data technologies like Hadoop were designed to handle this growing requirement.

Self-Service BI

Another important characteristic of this new BI era is the concept of self-service BI—essentially a take on analytics allowing business users to analyze mission-critical enterprise data with minimal or no IT intervention. According to Gartner, by 2017, most business users will actually have access to the self-service tools they need to do this. Gartner also asserts that by 2020, self-service BI will comprise a full 80% of all enterprise reporting.

It is important to note that self-service BI does not take humans out of the business decision equation. The specialized knowledge and insights business users have developed will be integral in solving business problems for years to come. Also, self-service BI should not be confused with self-sufficient BI. IT or some other entity still has to provide trusted data to be analyzed, so data quality remains pre-eminent. For organizations that did not purchase a cloud-based BI tool, the BI systems themselves must be maintained and updated when necessary. Still, the importance of self-service BI cannot be overstated, given the predictions of very substantial growth in this critical area of enterprise reporting.

Hadoop Analytics Case Study

Consider the case of a company known as a leader in information technology. This large vendor has a division that produces and sells efficient application-integrated data storage solutions. A large volume of data gets created by their solutions in customer deployments, representing a wealth of information that could be mined to assist customer support, product design, and even marketing activities. This environment is a classic IoT analytics environment where large streams of data from many remote sources are continually analyzed to benefit both the company and the customer base. For example, quick identification of a failed drive can enable rapid support, providing a positive customer experience, not to mention the avoidance of customer data loss. Also, an analysis of which product features are most used and most underused can help the company provide customers with guidance on how to get the most value out of their deployment.

The company was collecting hundreds of millions of data points in half a million files per day from tens of thousands of servers. The data collection, however, was not the real problem. The company used Hadoop as the data platform as a means to cost-effectively scale for the growing volumes of data. While this division was new to Hadoop, the challenge was to offer the data to end users in a way that was granular, comprehensive, consistent, and quickly accessible across business teams. When they implemented an in-cluster analytics solution that was architected for Hadoop, they were able to identify potential sales opportunities, underutilized and poorly provisioned systems, historical and real-time details on equipment reliability, customer usage patterns, and more. Users were able to create interactive data applications with no coding required, allowing a powerful self-service BI environment that enabled agility and collaboration across teams.

Get the PDF version for easy access to read offline or print.