Getting Started 2: Connections, Data Import, and Datasets

Published on March 17, 2018

Arcadia Instant and Arcadia Enterprise are our company’s tools for big data visual analytics. Arcadia Instant is a free downloadable visual analytics tool while Arcadia Enterprise is an enterprise-grade platform capable of supporting teams of users and a variety of data sources.

Arcadia Instant comes in Windows and Mac downloads directly from Arcadia Data and helps you quickly gain insight into your data. So, you can explore and visualize your datasets with interactive visuals, dashboards, and apps.

Arcadia Enterprise can be installed yourself with Cloudera Manager or Ambari Stacks.  You can find detailed directions on our documentation portal.

In this article, we will take a closer look at how to connect to your datasets and import them into the Arcadia Data platform.

In the Getting Started 1: Exploring Arcadia Instant article, we looked at the Life Expectancy Dashboard. Now let us get started with the first step of creating a visual by connecting your data to Arcadia Data.

1) Start this process by clicking the DATA tab on the top navigation menu.

2) Click the NEW CONNECTION button in the top left corner of the screen.

3) This opens the Create New Data Connection window. From here, we can set up connections to different data sources that we can use to build our visuals. If you don’t know what values to specify, your IT person who maintains the data source can provide the information for you.

Note: Arcadia Instant only supports 5 connection types:

  • The Arcadia Enterprise analytical platform
  • Apache Hive
  • Apache Impala
  • SQLite
  • Spark SQL

Arcadia Enterprise supports those plus several more connections such as:

  • Apache Drill
  • Apache Kudu
  • Apache Solr
  • AWS Redshift
  • AWS S3
  • MariaDB
  • Microsoft SQL Server
  • MySQL
  • Oracle
  • PostgreSQL
  • Teradata Aster

4) In the Create New Data Connection window:

  1. Choose the Connection type that corresponds to your data source configuration. Note that the Arcadia connection is for connecting to an Arcadia Enterprise deployment. If you don’t currently have an Arcadia Enterprise deployment, but wish to explore it on your cluster of nodes, please contact us for a trial version. If you choose SQLite, some of the information below is not needed.
  2. Type in a Connection name of your choosing for this connection you’re setting up.
  3. Enter the Hostname or IP address and Port # for your data source.
  4. Supply the Username/Password credentials you may need.
  5. Depending on your configuration, you may need to specify options on the Advanced and Parameters tabs of the interface. If you do not have this information, please contact the IT owner of the data source.
  6. Click on Connect, which will create a new data connection.

5) The new connection appears on the left navigation menu. Click it and then click the Connection Explorer tab to see the different data tables you can access through your newly created data connection. Selecting any of the tables shows a sample set of its data. In the example below, we are illustrating the tables of the samples connection, and previewing the us_counties table. (In the Arcadia Enterprise version, we can see additional column and table statistics information such as partition information, column cardinality, min/max values, and so forth.)

6) To build the visuals and dashboards, you must first define a “dataset.” A dataset is the logical representation of data as the basis for building visuals, dashboards, and applications. Some of you might refer to this as the “semantic layer.” Datasets can refer to the contents of a single table or data from several tables in the same connection.

7) You can create a new dataset on any of the tables in your connection. For example, to create a new dataset for the census_pop table, click New Dataset either to the right of the census_pop name or at the top of the screen under the main menu items. (Note that depending on which button you click, the opened window will differ. The screenshot below appears when you click on the upper New Dataset button.)

8) In the New Dataset window, enter the name of the new dataset and click Create.

9) After the dataset appears under the Dataset tab, you can enrich it by editing its attributes, adding data joins to other tables, importing supplemental data, and so on. Let us explore the dataset further and see what we can do by clicking on the dataset Population Census.

10) This takes you to the Dataset Details screen, which shows the dataset’s base table, connection, optional description (if any), the date of creation, etc.

11) Let us modify the description by clicking on the pencil icon next to the field and enter in a brief description. Be sure to click SAVE afterward.

12) You can find the Fields link on the left navigation bar. Clicking on Fields will show the table columns. Arcadia Data categorizes the fields as Dimensions and Measures. Dimensions are an aspect or feature of a dataset. For example, names, locations, and dates are good examples of dimensions. Measures are quantitative data that we would like to perform statistical operations on. Typical measures are costs, counts, distances, and weights. Note that when you create a dataset, the system will make a best guess on which columns are dimensions and which are measures and it’s best to double check to make sure they are correct. You can change this attribute for each field by clicking on the EDIT FIELDS button.

You can see that the state field is under Dimensions and the year and population fields are under Measures. We’ll leave it as an exercise for you to change the year field to a dimension.

13) Click on Data Model below Fields link then Show Data will show 10 rows of sample data.

14) One way to enrich the data is to join it with other datasets in other tables already connected in the system. On this screen, you can create joins of multiple tables. Start by clicking on the EDIT DATA MODEL button.

15) This will show a “plus sign” icon in the census_pop icon.

16) Click on the “plus sign” icon to open the Table Browser window. Be sure to click on the “plus sign” only and not the link on the table name (census_pop in this example), or else you will open a Table Browser window that will change that base table.

As an example, if we want to see the life expectancy of a specific year, we would select the world_life_expectancy table to join it.

17) You can now define the join condition by selecting the corresponding column from each table. Note that you can view or hide the preview of the data under the column’s drop-down box by clicking on the sample data link.

Here we are joining by the year columns in each table.

18) After clicking on the APPLY button, you can now see the join table and even see some sample data to verify that the joined table is what you want before saving.

That’s all for the brief introduction on connecting your data into Arcadia Data. For more details, you can visit our online documentation in the section on Working with Data.

Now you can create your first dashboard while following our next article, Getting Started 3 – Working with Dashboards.

Other Getting Started Guides include:
Getting Started 1 – Getting Started with Arcadia Instant
Getting Started 4 – Building Visuals
Getting Started 5 – App Navigation