Big Data changes things. We know that. We use big data when doing even the simplest things like commuting to work or shopping for groceries. As data-savvy professionals doing our daily jobs, we are often faced with the challenge of sharing our stories with powerful and convincing use of data, and these days, Big Data. At work, we often look at BI and visualization tools to help us tell a data story. How often we succeed in doing so is measured best by what actions we can affect by telling that story.
Traditionally journalism has been at the forefront of telling credible, fact-checked stories backed by data. So if you ask any journalist their answer to “have you written a Big Data story?” is likely “Yes, I’ve been writing data stories long before big data was around.” What then, one might ask, are the key challenges in writing a powerful, effective data story when using Big Data?
In our latest Hadooponomics Podcast we hear directly from someone intimate with what it takes to craft an exceptional data story. Mar Cabra, head of the Data and Research Unit at the International Consortium for Investigative Journalists shares how she and her team discovered and told one of the most impactful data stories in recent history, the Panama Papers (and yes, we say that even with all that is going on with the current U.S. presidential elections).
For those unfamiliar with the Panama Papers, this a large scale media leak and resulting investigation that “shows how a global industry of law firms and big banks sells financial secrecy to politicians, fraudsters and drug traffickers as well as billionaires, celebrities and sports stars.”
Cabra discusses three important aspects of how she successfully led a team of ~400 reporters in close to 80 countries to work together, in secret, over a year to break news of the tax haven practices that are apparent in the Panama Papers.
Analyze All the data – Structured, Semi-Structured or Unstructured
The first step was to analyze the large-scale, multi-structured data. This data story started with a single email that lead to a significant amount of data — 2.6 terabytes of information, 11.5 million files, 40 years of activity, in more than 20 jurisdictions — that Cabra and her colleagues received. Additionally, this is not just tabular data (contrary to what one conjures up when taxes and their evasion are under investigation) but also there are a variety of data including PDF documents with free text intermixed with numbers. The experienced technology team at ICIJ used technology to index and make accessible the data for analysis by journalists across the globe.
They spent months analyzing and processing all this data. First they needed to make these various documents searchable. The second challenge was to make the data available to hundreds of reporters working in 80 different countries. Using optical character recognition (OCR) and the cloud were part of the answer. They customized an open source library search tool connected to Apache Solr and added security so their reporters could conduct their research from anywhere in the world.
Visualize the Data for Insights
Once this global team of journalists had access to the data, they searched for patterns that would lead them deeper into the story the data had to tell. Cabra shares how her team thought through the best ways to communicate their findings. An article or two would be good but with this rich treasure trove of data and the important stories buried there, they needed more. In addition to a variety of articles, they decided to create a number of ways people could visualize and interact with the both the structured and the unstructured data. They generated rich infographics embedded with visuals that represent the high profile power players involved in the broader story. Once they broke the story, they made the database searchable and open to the public. They even created a game — Stairway to Tax Heaven — to help people engage with the story.
Collaborate to Increase Coverage
Often missed in today’s data-siloed environment is the collective benefit that data analysis can have when collaborating with like minded experts. If your technology doesn’t keep up or still communicates via archaic methods of collaboration (like emailing) then you are falling behind. In this web and mobile first world, the architectural limitations to collaboration are no longer just an annoyance, they are a liability. When that first journalist received that first email, they realized that sharing the data with colleagues around the world would result in a much bigger impact globally.
The ICIJ found that using this new model where collaboration and sharing the glory of breaking a story with others, empowers you to make a bigger impact. The Panama Papers show that working together people are able to go beyond what could be done when working alone. The same is true in business. Having all the data, not just a subset is going to lead to more impactful results.
Whether your story is as evocative or global as the Panama Papers, it most certainly interests you and your organization. Using data to shape and inform those around you to affect change and influence action is key. Telling your story effectively has as significant a dependence on your visualization tool as it has on the subject matter expertise and flair that you bring to the job. If your BI tool limits the type of data, size of data, speed of analysis, richness of visualizations or the collaborative nature of data analysis, then contact us to learn how Arcadia Enterprise helps you craft data stories with your Big Data.