Common Approaches to Big Data Analytics
Let’s look at four popular approaches to business intelligence architecture incorporating big data and the strengths and weaknesses of each.
The Dedicated BI Server (a.k.a. “Traditional BI”)
This architecture is common to legacy BI platforms such as SAP BusinessObjects, IBM Cognos, OBIEE, MicroStrategy, and more recently, Qlikview, and Tableau. Traditional BI employs a dedicated middle tier BI server with connectors to back-end data sources. Users then access data via a local desktop application or web browser that is primarily fueled with data from the BI server. In terms of the data architecture, at the end of a long and complex ETL process, the most granular data is typically stored in the data warehouse, then aggregated (usually for performance reasons) and stored in data marts. The BI server will then query data from the data marts or the data warehouse directly, and cache results locally (often in memory) for consumption by the end-user clients as needed.
- Incumbent platform. Most organizations already have made BI investments and have the resources in-house to support and maintain them. Barriers to adoption are typically low as a result. Often, BI tools are used to simply extract or embed results into other desktop tools like Excel or PowerPoint for convenience and easy sharing with more across the enterprise which again leverages existing investments and reduces user training needs.
- Predictable environment. Since ETL systems will produce clean (albeit limited) data sets, BI systems are designed to efficiently answer questions that their user communities have defined in advance.
- Semantics makes “sense.” Since traditional BI clients (client server or web-based) provide access via a predefined semantic layer that puts in place formal business rules and metadata definitions, it facilitates common understanding across the enterprise.
- Solid performance. Performance is usually good on the desktop client and/or BI server, assuming the query results fit nicely within the physical resources of the desktop or BI server hardware.
- Significant scaling costs. An increased number of users means a significant increase in costs from a hardware, software, and administrative perspective. Since traditional BI architectures required dedicated resources, these updated architectures often complicate management as well as introduce new potential security and administration weaknesses because they are additive to the security within the modern big data platform itself.
- Tradeoff between data granularity and performance. Traditional BI architectures cause organizations to make frequent tradeoffs between the high fidelity data and end-user performance. Why? Because they are scale-up, SMP servers which can be clustered, at best, but are unable to handle the volume and complexity (i.e., semi-structured, schema-less data) available in massively parallel, distributed data platforms. As discussed above, these data volumes cause SQL queries to run in minutes or hours, so organizations typically create extracts and roll-ups (i.e., a cube or data mart) on top of the original data to enable fast BI performance and keep business users happy. This aggregation causes valuable granular detail to be lost along the way, which limits the value of the analytics and the ability to innovate. For example, a retail organization wanting to report on sales transactions, for performance reasons might decide to aggregate the underlying data to exclude the individual customer detail. The impact of this aggregation would remove the ability of an analyst to later join in demographic or social media data collected on individual customers from other sources.
- Architectural complexity. Since the typical BI configuration is to add separate BI servers for extracts, more distinct data silos are added to the overall data architecture. This leads to administrative, maintenance, and security complexity with more chance for error.