You’ve heard this before, “You can’t manage what you can’t measure.” Indeed, without visualization it’s extremely hard to make sense of your raw data, tell data stories, understand trends, or drive informed decision-making. Some companies have been mastering these aspects for years pioneering self-service BI for the masses, however, data has been evolving even more quickly these last few years. Initially it was relational data, delimited files, excel documents, and business applications, and now it’s evolving to multi-structured data from NoSQL databases and cloud sources such as Amazon S3. All of a sudden, a flood of new data types and forms have emerged through the Internet of Things (IoT), web applications, the data providers and the likes, which has also driven new technologies, such as Apache Hadoop and Apache Spark, to store and process all this data at scale.
With this variety of data and exploding volumes, how can business users stay self-sufficient and continue analyzing their data at the speed of their business?
To respond to this need, data-savvy professionals turn to data preparation tools such as Trifacta that convert raw data of any sort and of any volume into a structured and clean tabular format ideal for easy consumption by visualization tools. Self-service data preparation, also known as “data blending” or “data wrangling,” is comprised of various steps to turn raw data into a consumable asset that can be used by Arcadia Data’s real-time modern BI platform for visualization and analytics.
The typical steps to turn data into a trusted asset would include:
Discover: This step helps the user understand what’s in the data; to get an idea of the data composition and its possible anomalies informing the user about what they should focus on to make the data usable.
Structure: Not all data has the same format. Especially in the world of big data and IoT, you can find JSON formats, exotic logs and hierarchical structures that are not easily consumable. A data preparation tool will help convert data into a familiar tabular format.
Clean: The data can hide incorrect values, missing ones, and various other inconsistencies that the user will want to smoothen and make disappear to report on trusted data.
Enrich: With big data, users have the ability to try out various combinations of data. Merging, appending, and joining in various ways to enrich the information or try to figure out possible correspondence between data sets.
Validate: Once a user has prepared the data in a consumable format, it is important to validate the data at scale, leveraging the underlying big data processing platform. Often, a data lake could have billions of rows, so it is important to provide an exhaustive quality view of the data before it is consumed.
Publish: When the data has been prepared and certified for consumption it can be exposed to Arcadia Data. Most of the preparation tools will export the data in a format that Arcadia Data can understand (e.g. csv files) but better, Arcadia Data can connect directly to the big data platforms using traditional JDBC or native connectivity to have real-time and ad-hoc access to the information.
At Arcadia Data, we’ve seen several of our customers successfully adopting the model of on-cluster data preparation and visualization. The two naturally go hand-in-hand. Think about it, you can’t fuel your car with crude oil – you actually need gasoline. Much the same way, your business intelligence and visual analytics platform produces better insights if you have clean data in your Hadoop data lake.
Data preparation and data visualization are complementary. A Trifacta + Arcadia Data partnership delivers faster insights to your end users. By nature, together they give the end user a seamless, simple experience enabling them to master their data themselves and spend more time on what matters the most: managing what they’ve visually measured and driving their business based on trusted data.
To learn more about building out your data infrastructure, from your raw data to the broad range of business users who need access, take a read here.