May 22, 2018 - Dale Kim | Big Data Ecosystem

The New World of the Citizen Data Scientist

You might be familiar with the concept of citizen journalism, which refers to “common citizens who play an active role in the process of collecting, reporting, analyzing, and disseminating news and information.” The term became commonplace in the late 1980s, due to open publishing, collaborative editing, and online content. This use of the term “citizen” as an adjective to describe individuals empowered by technology can also be applied to a “citizen data scientist.”

Gartner coined the “citizen data scientist” term in 2015 as someone “who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” According to Gartner, “More than 40 percent of data science tasks will be automated by 2020, resulting in increased productivity and broader usage of data and analytics by citizen data scientists.”

Speaking of which, in this third blog post in our series that’s based on the ebook, “Modern Business Intelligence: Leading the Way to Big Data Success,” we highlight the rise of the citizen data scientist and the need to make more data available to more users. We also touch on how Hadoop emerged as a solution to the roadblocks that littered the early big data landscape—namely cost, capacity, and scalability. And finally, we talk about how visualization tools are now a “must-have” for clearly communicating insights, and how these types of tools can convert insights into business action. If you missed the second post, which provided an overview of the history of BI and the present and future of enterprise reporting, you can read it here.

But the bigger story here is about making data more valuable to more users. The rise of the data scientist is largely enabled by improvements in self-service big data discovery platforms. BI user applications are empowering a new generation of business data consumers, much broader than the technical specialist pool of data scientists, DBAs, and analysts.

The Imperative of User-Friendly Analytics

Big data analytics solutions can create significant business value by delivering relevant insights to those within the organization who can quickly act on them. Corporate giants like Uber, Lyft, Amazon, eBay, YouTube, and Hulu were all able to analyze data far better than their competitors. Uber and Lyft transformed transportation, Amazon and eBay revolutionized the retail industry, and YouTube and Hulu spawned a generation of “cable cutters.” But even smaller companies in traditional industries can benefit from using big data to accelerate their digital transformation and increase their competitiveness.

The main challenge for any data-driven company is putting into place the right platform and the right tools to allow more business analysts to take on these big data analytics projects. According to Gartner, the best way to get started is to:

  • Facilitate the ingestion, preparation, and analysis of complex data currently beyond the reach of business information analysts.
  • Increase the range of analytics capabilities available to users by deploying tools for data discovery, self-service data preparation, and behavioral analytics.

The use of data warehouses for these kinds of big data analytics projects has been limited to large companies due to costs. Data warehouses were traditionally managed on premises, using a storage area network (SAN) or network-attached storage (NAS). But as the number of data warehouse consumers grew, the amount of system resources also grew. Soon, it was apparent that scaling such an environment was far too costly.

Hadoop as the Platform Game-Changer

Apache Hadoop changed these platform economics by leveraging much cheaper commodity hardware. Hadoop’s distributed processing manager enables the different nodes on a Hadoop cluster to handle processing of its own stored data, so it minimizes the movement of data, resulting in less latency. Distributing work across many nodes for parallel processing leads to significant throughput gains, meaning that Hadoop can perform like a traditional data warehouse but at a fraction of the cost

For the longest time, the main analysis tools on Hadoop were those used only by technical specialists such as data scientists. Although data analysts could use SQL and SQL-like processing engines, these approaches weren’t mature enough for handling machine-generated SQL from mature BI tools, forcing analysts to write SQL. It was clear that the more casual BI users who preferred a GUI simply couldn’t rely on traditional tools for self-service analysis of big data.

Data Visualization Comes to the Fore

There’s a clear and powerful relationship between using data visualization and communicating those analyses and insights about that data. It’s simply easier for people to understand complex data relationships visually.

Data visualization is key when it comes to expanding big data analytics beyond data scientists to business analysts. In fact, Gartner believes that enterprise analytics has evolved to the point of being more business-centric and more user-friendly.

Companies are now replacing legacy BI products with more user-friendly tools such as Arcadia Enterprise, which supports sophisticated big data analytics and doesn’t require oversight from IT. This level of self-service enables companies to discover new insights from big data in a faster, more iterative approach. This is in stark contrast to the traditional BI approach, where data must be carefully prepared and organized in order to answer a specific, known business question.

Data visualization tools allow business analysts to visualize the reasoning behind big data analyses and discover new insights much faster. It’s no wonder that Gartner and other analyst firms believe data visualization is a “must-have” for quickly communicating insights from big data and converting those insights into actionable business decisions.

In the next blog post in this series, we’ll talk about a larger movement that is called “democratizing big data,” which refers to enabling more and more business users to quickly access the data they need to perform analyses, independent of IT. The associated ebook chapter also includes an interesting case study of data democratization in action, as well as a handy democratization to-do list. Stay tuned for Chapter 4: Democratizing Big Data. Be sure to check out the entire ebook here either online or as a PDF.

Related Posts