Chapter 6:
Making Big Data Actionable

Use Cases


Clickstream analysis is a critical tool for understanding how users traverse a website, which paths lead to a transaction, and which cause them to hit a dead end. The most effective form of clickstream analysis combines server analytics, such as load times and data transmission volumes, with e-commerce analytics such as the amount of time shoppers spend on certain pages, which items they add to or remove from their shopping carts, coupon use, and payment preferences. This involves huge amounts of data, which is why Hadoop is commonly used as a foundation for analysis. By working directly with Hadoop, marketers can more quickly identify sales opportunities or impediments. When streaming data is added, they can conduct live A/B and multivariate tests to compare multiple offers with each other and with historical performance.

Management of complex multi-channel marketing campaigns involves many moving parts, including clickstream analysis, offer codes, audience segmentation, time-series analysis, and multivariate testing. This complexity makes it all but impossible to anticipate which views and reports will be needed. Marketing analysts need to explore raw data and create visualizations on the fly, and the faster the better, particularly when time-sensitive actions like ad-bidding decisions need to be made in seconds.

Customer 360 profiles, considered the Holy Grail of marketing, integrate demographic, psychographic, behavioral, and preferential data derived from potentially thousands of sources. This gives companies a holistic view of customers to see new opportunities as well as avoid redundant communications that cause customers to lose faith. The more data marketers can integrate, the better the customer profile, leading to a greater customer experience.

Financial Services

Financial risk analysis can involve a nearly unlimited number of variables that affect the behavior of markets and investment vehicles. It requires tolerance for both volume and speed, since time is money when making trading decisions. Using analytic tools that work directly on live data gives analysts a critical timing edge.

Fraud detection is a core practice in financial services, where seconds count when determining whether to approve or deny a transaction. Big data platforms like Hadoop are being applied to this task in innovative ways, such as integrating behavioral analytics, demographic profiling, pattern recognition, and trend analysis. The result is fewer fraudulent activities, but also fewer unnecessary transaction declines and more sales for retailers.


Security operations centers (SOCs) are widely using big data technologies for cybersecurity. Cybersecurity is increasingly becoming an analytical discipline as traditional endpoint devices are becoming less effective. SOCs gather and comb through vast amounts of server, network, and database log data looking for patterns that indicate a breach, as well as for anomalies that might indicate a breach. Timing is critical to isolating and containing an intruder. SOC personnel don’t have the luxury of waiting for extract databases to run analyses.

Systems monitoring/management also entails analysis of many log files. As data processing environments grow ever more complex, the number of variables that contribute to slowdowns and outages grows accordingly. In the same way that SOCs analyze log data to find intruders, system managers can use live analysis and comparative historical data to more quickly avoid or remedy performance and availability problems before they impact the business.

Operations (Internet of Things)

Streaming data analysis is becoming more and more critical in today’s business environment. To understand the data volumes that IoT involves, consider this one statistic: autonomous vehicles are expected to generate and consume 40 terabytes of data for every eight hours on the road. With 50 billion new connected devices expected to join the Internet over the next three years, the data management challenges will be unprecedented. The only practical way to manage and make sense of this volume of data is by using an analytics platform that directly accesses the storage layer with no unnecessary data movement.