March 26, 2019 - John Thuma | Big Data Ecosystem

Introducing the Arcadia Data Cloud-Native Approach

Arcadia Data is excited to announce an extension of our cloud-native visual analytics and BI platform with new support for AWS Athena, Google BigQuery, and Snowflake. This post will talk about each cloud service and (soon) link to example videos and how-to guides for connecting Arcadia Data to these services.

The data lake has evolved over the past couple of years from being primarily an on-premises design pattern to one that incorporates the flexibility and ease of cloud-based deployments. As we saw in Eckerson Research’s assessment of data lake analytics, more people are moving to cloud and hybrid-cloud data lake environments. As such, Arcadia Data is committed to provide the most innovative, in-data-lake BI capabilities for customers, regardless of their deployment preference.

Excerpt from Eckerson Research’s “Data Lakes for Business Users” Report. July 2018.

Our latest release of Arcadia Enterprise provides a modern BI and visual analytics platform with services to Athena, BigQuery, and Snowflake. Arcadia Enterprise enables cloud customers to create pixel-perfect visualizations, support streaming data, run geospatial analysis with the MapBox integration, and of course, leverage our newest feature, natural language query AKA “Search-Based BI.” The remainder of this blog will highlight some of the key advantages of Athena, BigQuery, and Snowflake.

AWS Athena

AWS Athena is a query service that enables users to analyze data resident in Amazon’s popular Simple Storage Service (Amazon S3) and other AWS services. At first blush the process of using Athena is very simple. You select the data set in S3 where your data is located. You then create a table. You have a couple of options. You can use the built-in Table Creation Wizard, or you can write your own data definition language (DDL) using the Hive dialect. Finally, you run your ANSI SQL query.

Other nice features with respect to Athena include:

Integration with AWS Glue: Glue is a data catalog which acts as a unified repository across various data sources. Glue will allow you to scan your data sources and build up a library of data available to Athena.

Low Data Preparation: Athena has no ETL requirement which means you don’t have to curate data or build a data warehouse.

Works with a variety of data types: Supported data types include: Avro, CSV, ORC, JSON, and Parquet.

Pay by the Query: There is no EC2 instance to setup and you are charged $5 per terabyte of data scanned by your queries. If you compress or partition data you will save up to 90% of the Athena costs.

Snowflake

Snowflake is a data warehouse built completely in the cloud. It has many key advantages: It offers a simple-to-use web interface for spinning up new environments and making changes to existing ones. It is completely elastic and allows you to grow, shrink, and turn off the environment as needed. It supports a variety of data types including structured and semi-structured data such as JSON. Finally, with Snowflake there is no need to spin up data marts or copies of data to offload processing for competing users or applications. Virtual Snowflake warehouses can be spun up quickly while sharing access to the same data and not impacting other workloads. Snowflake is resilient and redundant out of the gate and requires less maintenance meaning fewer knobs and operational DBA activities.

Snowflake is super easy to set up and get running. You select the cloud platform you want to use: AWS or Azure. Select your region and finally select from a variety of pre-packaged options. They also have a pay-as-you-go option that is billed on a per second basis. Snowflake supports ANSI SQL and allows you to perform joins across structured and semi-structured data.

Google BigQuery

Google BigQuery is serverless, requires zero system provisioning, and is an extremely scalable data warehouse environment. The neatest aspect of BigQuery is its price. It is free up to 1 TB of data analyzed each month and up to 10 GB of data stored. It has a very clean and easy-to- use web interface for completing tasks like running queries and loading and exporting data. It is integrated into the Google Cloud Platform (GCP) which means you can integrate it with other related features such as machine learning, data from objects stores, as well as streaming ingestion. Another nice feature is that it comes with a ton of public datasets that you can use to blend with your data.

Conclusion

Traditional BI tools forces organizations to move data into their middle tier server to gain performance. This is not what we want you to do! It is expensive and it does not scale! With any data movement, you need curation and other expensive data transformations. Don’t forget about the latency. With Arcadia Data, you don’t need this movement of data as our acceleration will do most of the hard work for you. We also have a built-in cache mechanism that is specified on the connection. This cache is transparent to the user and can be turned on at the connection level. The results are very fast dashboards.

Also, because many of the cloud-based database storage systems charge by data scan size or time used, Arcadia Data has a built-in ROI. Instead of being charged for every query that scans your data, Arcadia Data analytical views reduce the overall data/time usage, and thus the overall charges. Arcadia Data will help control your costs as well as massively improve dashboard and reporting speed when using these cloud-based systems mentioned above.

We are in some exciting times in computing and big data analytics. Athena, BigQuery, and Snowflake provide many features and functionalities that we need as analytic professionals. The biggest problem you will have is deciding which provider is right for your organization. The good news is that it won’t matter when it comes to Arcadia Data. Arcadia Data provides tools that enable you to seamlessly connect to data in the cloud.


Related Posts