Webinar: Delivering on the Promise of Business Value from Data Lakes

The Bloor Group

Watch this webinar now

In this episode of Hot Technologies, industry thought leader Wayne Eckerson explains how a new wave of technology is enabling visual analysis and discovery of nearly any type of data.

He is joined by Steve Wooledge of Arcadia Data, who will showcase his company's visual analytics platform that provides native BI for data lakes.

Watch this webinar to learn about:

  • Key challenges of getting value from your data lake
  • How native business intelligence can drive more insights 
  • Customer success stories


ladies and gentlemen

hello and welcome back once

again too hot Technologies

of 2018 that's

exactly right my name is there a cabin us I

will be your moderator for today's event

delivering on the promise

business value from data

Lakes that's what everybody wants for the delay

can tell you right now we have a great lineup

for you today yours truly have to stop

there and my good buddy Wayne actress in the Mexican group

as dialed in today as well as my

other good friend Steve will inch of Arcadia

it data or did I threw a couple presentations

and hopefully a demo as well but

I do want to share some other quick Thoughts

with you this web cap is part of a

whole program and and that includes an

assessment which is designed to frankly

helps users like yourselves understand

where you are in the date of Lake Journey

as they say in the business to really help you

figure out what your next step should be

paid upon feedback from

yourself and from your peers

so many of you should have seen a

survey or Nessa SEPTA it pops

up when you registered and this is

what you would see when you got there how

much value does your data link provided business

users what we've done here

in this is something that Wayne's came up with and his

team called rate my data it's

a very cool Technologies a platform

again for helping companies understand how

they can basically size up

to other organizations and determined

what are the next best step forward

so obviously any kind of investment

is serious at the data link required a lot of

thought a lot of time a lot of resources and

of course some money thrown in there as well and

you want to make sure you're making the right decisions so

when can I put this concept in his team

of developing rate my data which

is an actual application at the web-based

platform for acceptance

and what happens when you take the intestinal

it takes about 5 minutes and you get

a personalized report so

the report can take a number of different

directions in terms of what you get

and you can stay this is one of the things that you

get after you take to report this was

a medium score this is a high

score and you can see how you compare to

other companies sewing ideas to help you

figure out should you move this

way to give you that way which direction should you

take for your organization to optimize

the value of what you can guess

and I also like to push a pole

if I could very quickly let

me see here I'm on the road myself today don't

open this bowl and you can see the question

is how do you or planned to

give users access to your data Lakes

your five different options there a b

c d and e

I'm going to give this a couple of minutes okay folks

are starting to answer so a s development tools

be a direct sequel access via

traditional bi tools to use head

dude natives he is other

and for other if you would just

go ahead and chat or something if you have something

else going on and this once again it's

all part of our desire to understand what's

going on out there in the marketplace we are researchers

manilus way to myself at least of

course companies like the Arcadia are always

very curious to understand what's actually

happening out there in the real world what

are you folks doing what are you seeing we

always want to understand your thoughts and your

perspective on these things because like

I said this is a very challenging space

you want to make sure you do things right and

so that's why we have this this hole

platforms for you I'll give this one more

second maybe Wayne actress in Under throw

a quick question over to you if

you want to talk to spirits 2nd of Bounce rate

my data or about this particular one

to be guys some time working with Arcadia

to put all this together any thoughts

on what you hopefully from

the survey as you as you read what

people write them in their personal

assessments Sprite

soda likes traditionally been the

main of data scientist so we want

to explore how many regular

Joes are actually using

the data like in getting value fun

crafts it's about 20

questions but

it takes four minutes to complete because

of the way we designed the questions with

the word scale it doesn't change for

each question and

it expands the six

categories that

we looked at each category had about two

questions and then we get through and some muscle

quick questions the

cool thing about this report with had over a

hundred people I take

the assessment so far as you can go in and

filter it the filter button

you can see me at the right hand corner and

that's where you can further

a benchmark yourself

against that more targeted Mich

group based on company

size region and

Industry among other things

so then it's pretty useful tool

give you a quick snapshot of where you stand

in terms of data usage

for a regular jobs

okay good stops let

me go ahead and close this whole we got a good

number of responses I can say the results 31%

say traditional bi tools 12%

direct equal access 8% hey dude

latest 4% development

tools or percent other and

that's that included the entire break

out of folks oh well no answer is 42%

so far but thank you very much for taking

that smokes well done and

now let me hop back over here I'm

going to push this next slide for an actual to give

the keys of the castle to mr.

Eckerson and when you can share

your screen or use the flight in there

connection to set

I'm on a Mac it takes a little while so

you seen one screen or two air or

just one looks good

okay topic

of the data but you can

business value I think I decide

to start from the beginning which

was comparing data warehouses

to date a legs in many ways

they legs her response to the background

this is the traditional data warehouse architecture

the benefit

of it for

all your data multiple

Source systems this is just one place

design for

queries not transactions I

design in a way that simplifies user

access and speeds play across

all of your

single version of Truth the common metric

and standard what

we found some very ideal for supporting

your call the ports and dashboards

that was the promise

of the baskets still is

irrelevant Palms today but we hit

a lot of speed bumps along the

way and would stay there lakes

in many ways are an answer

to the data warehouses speaker setup

email on right

it takes a long time to model

and load the data so

takes long time to build the

change it's an army of people to

maintain the same so can be costly infrastructure

built on relational databases tends

to be costly

as well it's skills well


really designed for multi structured data

so what we have found

is that data warehouses

are really good for answering my own questions

things that they traditionally it

Department requirements for

and developed

reports and dashboards and point

those at the warehouse with some

drill down to do some root cause analysis

around those but it's

really last good for answering me questions

with mint types of data

in many ways back

problems with the data warehouse were

at we were asking

it to do more than it was really designed

for so around 2010

I do of which most

eight legs are built it the

scene hard and

leg circles people

advocated wiping a data warehouse away and

spirally replacing with the

suitcase date of lakes

because they solve

a lot of the problems that they were intimately

scalable on a low

cost scale-out distributed

architecture any

kind of data Thanksgiving schema on

read basically you're just dumping dating

till file system you don't have

to model it first giving

users especially those feed

a hungry how are users in beta scientists

instant access to data

sick of having to wait

for the IT department to model at Daytona all

so built on open sores so

the cost of software licenses that order

of magnitude less than a day Patrician

Banquet house and probably put

it in there you never have to move it

because you can bring different compute

engine light

flashing engines into

a cluster so you

never move the data out of the out

of the cluster into something else

the promise

of Adida lakes that we found our hearts

and liabilities Wanted

still relatively new technology


quite fast very

fast as matter fact but it is still

working out issues a

lot of software built

on open-source from Apache Foundation

lot of different projects

starting stopping overlapping

hard to keep track of which is

why we need visa to Traditions

from scum bags like Cloudera in

map artwork Works.

Return after a c failing complex

to manage especially that the infrastructure

in the hardware which ends up costing

a lot more money than people think and

the skills are in the skills

to do it this is not cheap either

I'll pull out workload processing

perspective he

can do real fast Full

Table scans but less it's

less good for complex multi-table

joints and I buy a taxi

and this

is a really

good for data scientists

and power user you want

instant access to Raw data it's

going to be open at data dump

it's also good for offloading

TL workloads

And archiving large

finds a detailed data so you don't have to upgrade

it so

what you're saying is that they Lake

some water response to the depression

Sears warehouse has its own deficiencies that

in fact the warehouse is

well suited to address so

if you look at the underlying Technologies

behind the gate or a sand Beetle a specific

Warehouse relational database

and and distributed

file system Call of Duty

in 2010

the attributes for each were almost

Paul Robeson you can go down the list

year and see that on every

single characteristic they are

completely different ones interactive

the other batch one offers

at each other's

java-based one scheme on rape skin

laundry just doped up

but these two technologies are kind

of plane boat

frenzy and and competitor in

in the ecosystem and

its results that capabilities

are starting to convert Excel relational

databases starting to take out a lot of capability

to do then

and vice a versa and both

of them are going to the cloud so

I've been trying to help companies

trying to figure out what the dividing line is between

these two worlds and it's is a little

bit difficult and I did some moving targets

but we're seeing is that the

data warehouse right now in database

is great for sporting business

people McGregor Jose is that a saying

for supporting specific

types of work clothes and acquire complex

multi-table joined large x

users that's really

good for supporting existing reports

and dashboards is doing analysis

on those things

glad to do it

send the date of legs are really good for did scientists

and power users want instant access

to the rod data

or slightly scrubbed

clean data Scout

really good for big table span

stands large batch jobs

and ETL off upload

data offload and Dave science and

boxes so when you put

these two together you realize that you know

what why should we have one vs.

the other what we were doing because

how you unify these into

its own parents ecosystem

or architecture

what I figured as that this for

action and maybe more but this is what I've come

up with so far what

is that you space distinct

World environment

city of a date of Lake right now and

then sings next to it and

integrated Sports

Warehouse for rent by owner racial

database and I did like once on Duke

that's how it exist in most companies

to do I bought today but

there are other options as well try

to rebuild a successful

I will call these more data

Marts and data warehouses but that's where you

takes I'm going to see people on Hadoop

technology lights are there is

Impala build a stable

open a dimensional Cena of sorts

and then runs against those

those tables in the

date of lakes

stop a third option would be to use

a bi tool to cut a recreate dimensional

view updated in the legs and

in this case there virtual view the Galaxy

S6 outside in the lakes in

query stated inside the inside

leg may pull back data

into its own cash where

I can optimize that day that your system

faster Foreman

Columbia Direction the last option

is where the analysts will actually

sticks inside the daylight and

resides natively into Aquarius

of data from there and I believe

Steve willing to talk about that

approach since that's the one that Arcadia

takes so

where are we today

well it's a

good thing 2 years ago was

you would put most

of your data and Hadoop

or S3 as

the clouds are starting to emerge

as a platform of choice for many

companies going to do a lot of your ETL

work in spark


your data scientist

me also use Sparkle libraries for

doing machine learning and

then once you refine the day you

threw those owns on the left once you free

or you can push that data

into a relational database

serving to

support your big data warehouse

Stow the benefits

here are basically

you get that the steel bility support

for multi structured data and scheme on read

that I did a lake supports weather

in hot or not but

you still have the disadvantage of hopping and

duplicating data into a data warehouse

which anytime you duplicate large

signs of data and are

making a transition between orthogonal Technologies

you can run into problems and

expense the what is

a little bit different environment where

companies are using

big data analytics tools like

Arcadia to actually

query the day that it's been

transformed in the lake whether it's a

stupid or are you

see that Transformations happiness Park using

python oftentimes I

or commercial tools as well sometimes

Richland way down and should be at the

landing area as well

so that's where we seem to be going in

this big data world the

benefits years are the

same as the other scalability multi

structured data sports team on Reed's

but you don't copy if you were okay.

Just keep it in one place in the lake and

get the metro access to non data scientist

beer so

I get the contents of coaches

that it's new and I got

there is no relational database and that might speak

some people out but

it's something to consider as some of the

bleeding edge and Leading Edge companies are going

in this direction and

maybe more than that of

that so this

is a dirty or picture of that architectural to

little bit more detail but basically saying

the same thing I just a

lot of different type lines that come

out of that that don't use

data Hobbs sporting

different types of users in applications

and its environment were seeing

can be built that only by traditional data

engineer 🙂

no but also Big Data

engineer super fur open source

library in trolls

so I thought I'd mention

this attachment stats for

running right now only takes 4 minutes

of your time I might give you some

interesting insights on how you've

progressed with your day the lake

so with that

I'm going to turn this back over to

Eric alright

I'm going to turn it over to Steve

Wooldridge Xbox feel free

to ask questions up those two slides to just a second

with that Steve will it take

it away great

thanks I want

to share my screen as well

Seattle care

yes I can

alright everyone

my name is Steve I work for Acadia data

happy to be here at work in the industry along

with Eric and Wayne for gas

by 15 18 years now I've worked

at relational database companies I

work that doesn't tell just companies like this

it's object's works that

I do plender's like Matt Barr and

Eric arcade data and it's fun

to see how the industry's evolving how

customers are using different Technologies in

different ways and what I'll talk about is that

fourth option that Wayne pointed out of getting

value from business

users her business she's getting value

from data Lakes using but we call

native bi and Analytics

and in a quick snapshot in

your back in 2008 when I was at a

small day of a startup company everybody talks

about big data and it was all about

moving from

structured data to Wilden multi structured

data things that didn't fit as neatly

into rows and columns things

like Jason and wipees

traffic did off of sensors in those

types of things batch

workloads within 2 dupas Wayne

talks about to move to more interactive in real time as

well of course big data in terms of volume

but a lot of

complexity there until we're way past that the platforms

of evolves and relational databases

heavy pallets but I think

the need for agility on this data

where people don't want to have to structure it all in advance

is Wayne talks about they want to be able to carry

things as they lay without doing a lot

of structure and in some cases you want to be able

to query search indices events

documents like document database is

right things like that and you

don't want to Nestle transform data and

have it modeled perfectly into the

environment for now so she might want to do

the transformation in place in the date of our house or

in the in the data link or discover

the day before you transform into something turn

on the lights for reporting so there's been a lot of changes

because of the nature of hardware and the

cops coming down but are up servation

at Arcadia it's been that there really hasn't been a

lot of innovation around the bi technology

if you will from

where we've been iming sequel to Spill the the

language of choice of business users but bi

tools don't necessarily handle the scale

or the complexity of

spaghetti on the platforms are out there so that's

really what we are set out to do

and if your a Game of Thrones fan the question

becomes you know can you stand up to

the big data analytics requirements

and it sucks out there

so people don't know so John

Snow who decided to charge an entire

Army on his own so I'm just

kind of fun and really we

found at the company with a mission of connecting

business users to Big Data is Wayne talks

about data Lakes today

tend to be the realm of data

scientist developer tools

those types of things where you want to go out to the rod data

you don't necessarily want it structure do you want

to lose any signal in the noise so

to speak but there's a lot of value

in that day late as well that business users can get

access to and I

take data Lakes today off and gets treated

as like a development environment to find

discover information but if the date

is already there and you found some

insights why not share it with a lot of people

from where the day two steps you don't necessarily

need to move it into special-purpose system

that handles concurrency in

and SL A's

and dynamic workload Management in that kind of stuff

so that's kind of

what we do we've been around since 2012

we've gotten some awards from

Gardner and Forester and different technology

areas like Gartner

I'm sorry what I do what the forcible

call head do Native bi but

just a different category from traditional

bi and we had a lot of big customers

with data Lakes

trading a standard for the day

like for their bi so which is different from

their data warehouse this is not replacing

data warehousing this is new use cases

new data and do applications that companies

like Citibank or Procter & Gamble are

the plane using

daily extend our kid is a front

end to the business either so what I'd like to do is talk through

the reason why people are choosing

to Pi standard the

benefits of that and show you an extra product

demo and we can get into questions and answers

from there so

again the premises that there

is a whole host of bi two of those been around

for decades I've used to work for one and they're

optimize and work extremely well on

relational technology not

necessarily optimize to work on the

openness and scale that's

available within non-relational

data Lakes that's not to

say you can't build a day like conceptually on a

relational database going to talk about more

than you do Basin sod based objects

or types of data lakes that are out there

if you think about it why

people are choosing to be a standard for

the Enterprise is because the

traditional relational database was

highly optimized to take advantage of

the hardware that was available at the time user

closed environments and I don't mean close

in a negative way but I work for

teradata and the amount of engineering the

performance you can squeeze out a relational

database is amazing

in the work they do to integrate the hardware

is fantastic but you can't take a processing

engine and run it on the same Hardware

where those Dave Espinoza running

because it's just not the time to handle that kind

of work clothes so if you were to take a bi service

they are running on the data warehouse can't

really do that so bi servers

growing up over time or

a tiered model right and you've got stated

that sits on the server on a

desktop these are scale up environments for

the most part you can talk to these but they're not distributed

systems and what kinda panties

you got to load data once in the warehouse

and Justin transformation you're going to

feel bloated into the bi server you got to secure

in multiple points you got

a semantically Irish Maps back to the

schema that's been defined in the database

and then you typically will optimize

the physical model maybe

twice once in the day to wear house if

you want to do the optimization there you

can also optimize the performance going to be a server

right so that's a

choice that people make from an architecture

perspective but often times you're doing and both

places so it becomes just a

little bit of extra work right and is value

in that but you don't have a native Connection in

many cases two things like semi-structured

data if you take Json files

as an example you're going to flatten that to put

it into a table or the bi tools

required to be a more of a relational format

before they can you create against

it and these are not parallel

environments that I talked about so the idea with

Arcadia was to take the openness

of systems like Apache

to do but to allow you to have multiple practice

in engines running on the

nodes where the date of cysts you

know the whole rabbit the whole idea is bring

the processing to the data don't

take the data for the processing especially

when you're talking about petabytes of data

so we took advantage of that we built to be at

server essentially that runs on

the notes in the date of leg pulley

parallel distributed system performance

and things like that that will get into the

backend stuff is also usually valuable

we inherit the security this already in place

we do the physical modeling

in place we give

you a business semantically Road access the

data to look in to find out what business terms

directly in place we have an understanding of

where the day does located from a distribution

perspective hashing

etcetera so that we can trade query

plans for highly Optimas for distribute

environment and you only do it once and

you get native connectivity to those beta pipes

we can handle complex bites like Json

natively and it's a fully parallel


and you would say well you know I don't necessarily

want to have all my did in a day like and for

sure yeah every company I mean it's mind-boggling

out of the garden show last year and they said her last

week and they show this graph of people in

the number of systems they have hundreds

of databases and big organization

so yeah you can connect other systems

into a system like Arcadia

one of the things we've been innovating

with is the Apache Kafka projects

and our partners confluent where they've

treated a sequel interface to

real-time streaming data called KC

Guan so we've been a great of the bat you can have real-time

streaming data coming into your dashboard

which would trigger an alert and then you can

drill down into detail with in the daylight

or within your data

warehouse environment where are your mongodb environment

solar other types of systems where you store

data so it's not just

for the data link but that's where you get a lot of performance

in the benefits for people that want to discover information

and then also production Isis

within one system

that you can trust that with butts

out there this is another way of saying some of the same

things but day where is bi

architecture is really a scale of

environment again optimized for the

technology at the time but that requires data

movement multiple points of Security

Management and Cetera there are

vendors out there that's come up with

a middleware application of

this sort of has a Band-Aid approach to allow traditional

data warehouse bi tools to

connect to another

data store within the Buster

which they put on an edge note

or a series of edge knows and

that works okay but you still got

multiple points of integration and

security and really you

don't have the semantic knowledge about the

date of this down on the data knows you're still pulling

data house you've lost information

on what's where are the filters and Aggregates

where should those be applied in your simply passing

sequel back and forth between the

bi tool and that middleware

box that's interpreting things

in and pulling data back from the Dayton OH and

those those kids are typically built

on a nightly batch run and you got to build

the cubes in advance based on what you think people

would want a query they lose

a little bit of the fee the

freestyle nature of being

able to query ad hoc against the full thing

versus data native or

native bi which pushes down not only

the processing office also

the semantic knowledge and

what we can do then is

build Dynamic Cash's

of data based on the actual usage

of people that are issuing queries it doesn't

have to be filled it on Advanced

based on what we think people will query but let's

learn over time and build ways

to accelerate Performance Based on the actual usage

of the bus there and because we have that semantic

knowledge and everything else

from the cruise that are coming into the system

we like to call this a lossless,

like High Fidelity high-definition television

or audio you want your analytics

to be high definition as well and if you lose in

the granularity cuz your aggregate pointed

out to handle the low scale of a bi server

you're not going to have that for Fidelity access

and then the performance is something that really stands

out as well this is a benchmark from one

of our customers who is trying

to give business

analyst B's Russia customer service reps

for a telecommunications

company that might run a

webinar platform some of the one we're on

to get really high performance

at high queries 430 concurrent users

so they can troubleshoot things like

you know where the bottlenecks in the the

webinar platform and I'll head over there

and what are some different questions when

needed for BuyBacks for a customer so the

point of this is not to compare us with a sequel

on Hadoop engine but we actually let her speak want

to do connectors to data but

we are putting a proper bi server

if he will within the day like which gives

you much better concurrence

performance to build a return

results that's in a reasonable

amount of time for people

that's the kind of performance that we

see in again the way we do that is through

some Innovative technology we call it

smart acceleration there's some

patent-pending technology around this

and again in terms of agility we

want end-users to be able to access to the data

like Buster gets granular

access to all the data

ask any question they want and then we

have these analytical views that are recommended by

the system based on machine learning we're looking

at what tables are being accessed which careers are

being run on a frequent basis and

we recommend back to the admin person hey

you might want to rearrange

and create some aggregate tables

that we store back in hdfs RS3

deploy those out to the next

time I Curry comes in we can make it cost based

optimization decision on where

to write that query for better performance so

you get a hundred likes better performance

than just scanning the entire data

Lake and trying to bring back results test

that's a big difference in again it's

incremental is dynamic it's based on actual

usage. To build the entire Cuban advance

to see which is a huge Advantage from

a admin perspective

and really the whole premise of

the data Lake was to provide more data agility

and again whether it's done a relational or Hadoop

doesn't really matter the point is you've got to be

able to bring in data and

iterate on it quickly so if you take a day like

you just treat it like another database

where you want to

near the bi service

forcing you to take that out of the

Dead awake secure it there do the

performance modeling in the bi server before

you can actually start doing date of discovery in

a coherent way and then oh

by the way are we forgot to put a dimension

into that Cube that someone want to look at so now

you got to go back to ITT after the added

Dimension before you can go back to the 2nd generation or

the and Federation of analysis that

you want to do

and I've got to live this in

my previous lives and it takes a lot of time and

cost to to Mansion environment like that

and you lose the business of jelly which

is the whole promise of Hadoop and data

lakes in the first place so

we've changed all that and we allow you

to analyze data as

allies if you will in its

original form yes you

do Symantec modeling on it so you can put different

terms against it and you can interpret

Jason and look at what is the steamer

that's embedded within the metadata but

go ahead and analyze do the discovery

before you have to do production

ization or optimization of that

data structure in a way that allows you to production

eyes it and you know what a lot of

business analyst might want to just find some insight

and then go take it and do something with that they're not necessary

going to deploy this out 200

control user so you're not

the phone smiling step is optional

so it gives you a lot of flexibility

faster time to value and we just move

that entire analytic and visual Discovery

step from Step 6 all the way to step 3

so that's delivering on the promise of

agility cousin the date of John

summary that's what we do we provide business user

access all the data complex

schemas whatever on

an eighth of architecture that gives you that type

governance and Integrated Security on

the data Lake and allows you to deploy

to hundreds and thousands of users in

a high concurrent type of a workload still

with that I

will switch over and I'll give you a

demo but we're talking about

I want to hear so

I've got Arcadia date of running

here and a web browser environment

everything is HTML HTML5 browser

basis no browser plugins there's

no just stop download everything you see is just delivered

via the web the date of all sits back in the day the

lake which is huge from a governance

and the compliance perspective you not to worry about people

downloading date of the desktop and hotter than that

it said it's all browser-based and

what I'm going to do is show you her simple demo

that gives you a sense of

the tool and then I'll show you a more robust application

around cybersecurity which is

a big yusuke's that we have with some of her clients,

which I can't mention summer include

US Agencies like the Department of Agriculture

and believe it or not but while

I want to do in this case is I want to show you how to connect

to data and build a simple dashboard so

in this case I'm going to click on data

it pulls up on my connections you can

see things like solar and to Doo and


and things in a relational technology

as well I've got a very simple

data set that was crater on

TV viewership data I call it a curse

in TV sure

handle fear next TV show Wayne and

all it's going to do is pull up a pallet here or

dashboard is going to bring into tablet

tabular data in

a way that doesn't necessarily do

a lot for me as an analyst so

let me go ahead and and look at this a different

way so I'm going to edit

this and I

want to look at all

of your shifts are all viewers

overtime so I'll bring in a

day to train as much as mention four measures

of bringing the record count

let me just refresh this is kind of filter

and down looking at the data okay so for different

dates overtime I see the number of total viewers

at any point in time across a lot of different

TV channel so as an

Advertiser I might want to know what shows are people

watching what time of day are they watching those types of

things so let's visualize this in a

different way and what we've done

this week embedded machine learning not

only into the back end for performance optimization

but also under the front end it's

just people with the right

ways to visualize did if I just click this

button this is Explorer visuals this

is actually showing me different visualization

types using my data

and I can compare what's

the most useful to me do I want to do a

standard bar chart scatter plot

bubble thing or maybe this

calendar heatmap would be interesting since we're

talking about time so here I'm

looking at the total numbers of viewers and it

got hot spots on things like Sunday

when maybe sports are happening or

I know your favorite gospel show could

be something on Thursday but I'm not really sure what that

is one thing that use. Stay

that way to my dashboard

and I'll close this and I've

got that visual now I want to look at something

a little bit different which would be to break

down things like

the you're

not here the

channels and in different things that I want to

look at and I can't

reach my edit button

so now I look at Channel and

program as my dimensions for

the measures will look at record town again

precious visuals now it's going to break down

a little bit more by what are the top

channels and which programs are most popular

but again I want to visualize that so just

speaks me a little bit better this again

will recommend some different visual types you got

your standard bar charts and Scatter Plots

we've got some things like Network

grass down here which are really interesting

and dynamic but not something I want to still

use for televisions you ain't your

traditional horizontal bar

chart oh

you know what I forgot to put the filter

so that's going to take a while

to fullback but the final results

and would be this bar chart on the right which

shows the different channels and

what shows are popular and I've added a filter

which allows the user to select

something like the BET Network

I wanted to talk to an Advertiser

about what are the best days advertise on BET

and which shows now you can see

what those shows are pretty quick cleaning around

a hotspot and filter that

buy okay this day Wednesday with a

bunch of people what shows are actually watching what

was the BET Hip Hop Awards

it was death of a funeral in those types of things this

is a very simple visual the show

you what you can do connecting

a different date individual way have

a cool part yeah that's very simple

use case but big Enterprises want to

build applications that really

help him do things like stop cyber

security attacks so we've built

an application with one of our partners Cloudera

around the Apache spot project so

this is an open source project which

rooms together a community response

to the best way to visualize

threat from a network and

user perspective as well as in points in the network

there's machine learning algorithms that are included

as part of this project and Arcadia Spartan

this is to contribute visualization

types that can help people spot

issues and swimming spots in

a visual way onto

not only detect

attacks but to do Greenfield threat hunting

and things like that so shows

a little bit better any idea here is

you can have something like an executive summary of you

it's just a square that's created

it's bubbling up and it's using machine learning

that bubble up high potential potential

threats from an end-user perspective RN

points and there's some ways

that you can

feed back to that model learn over time but this

is cuz you that bird's eye view of what's Happening across

your entire Enterprise if

you look into the network from a security analyst

I'm going to look at not slow data

over time within my environment and

again machine learning is being used in the bottom

left of Bubble Up suspicious activity in

as a security analyst I know a lot about the

systems that are there I might look at the stop threat

and say well you know this is a demo

environment so I think that's a pretty

low score in terms of its Kratts

I'm going to respect that some of these other

ones or maybe a little bit higher but

that feeds back then to the model

and can learn over time and improve the accuracy of

what the machine is doing the detective potential

threats and you can do things like

you no pick the time

slider here and it's going to change the

network draft we're over here looking at

the flow of

data between endpoints beginning

to end and I selected

to little data

this is a demo application but then

you can look at the thickness of the line to understand what

what are the strong connections

between systems and identify the specific endpoint

what are the other endpoint says

connected to and you can drill down into the

ultimate be all the detail that's there

so specific IP address we can click

into that and it takes me into an exploration when

they were now there's some workflow to spend to find

what's safe for the security hours for they can collect

call the state in one plays Lex

click on the username proxy

actions etcetera I'm not

security analyst by nature but essentially

they can do some analysis here and then if you want to share

that with other people it's a simple

as going up here you can email that to you

to somebody or get the URL

copy and paste that in the case management system

in when someone logs in with the right authentication

they can see all this data in context of

where they at analyst left off in the

respiration so that's the kind of thing that we

want to do on a very large scale for Innovations

around cybersecurity iot


you know just just general marketing

applications and things like that but hopefully that gives

you a flavor of what what

we do and just going

to wrap things up for my perspective so if you want

to learn more about our case there's some links we can

leave out then I'll turn it back to Eric

to take any questions with Gus

great and we do have questions

in a bunch of the questions so let me

just kind of Dive Right In you were shown

a pretty cool demo there by the way I love that spot

I love the machine there any service

things what to look at that's one

of the best it seems to me use cases

for machine learning is too kind of help separate

wheat from chaff and the point

you in the right direction so lots

of questions here one is the

TV of your favorite song is that is

that structured or unstructured what kind of data

with that yeah

I think that was just an open datasets

I didn't skip a day

to set myself I believe that was structured data

so we have examples of taking

Jason and and visualize me on

the fly without flattening but that was not the case

with the TV viewership data


and let's see one of the users

are staying date according to their own

policies at their company they have

access denied to external file sharing

or storage on do

you guys have any ways around that or what would your recommendations

be there I

didn't quite get that so they they are not

allowed to do file sharing was at the question right

external file sharing or storage

I'm not exactly sure what they're asking

but I guess the point I would make

is that again that all the data stays

in the data Lake you can have the

option to allow somebody to download

that data to Excel or whatever

they want to do with it but in some cases we

have a very large Healthcare organization that

did not want like their

challenge was they had traditional bi tools

were people are downloading data to their desktops

and they had to try and keep track of all

that from the data governance perspective so

that was one of the reasons they wanted a native b i

approach for the date I could spit like people could still

do their very reporting an ounces in one

environment but they should be restricted

from point I did it down to a separate system

somewhere okay

good and here's another nothing about it really detailed

questions folks and if we don't get to yours

in this event we will forward them on to our presenters

today here's a question

from an attendee asking what kind of

measures are there in the architecture for

data sensitivity like masking

data of sensitive information is better can

you speak to that event

yeah from a high level will interpret

or I'm sorry we will inherit any

existing security protocols

that are in the underlined state of platform

so meaning if you've got a patchy Century

or Ranger or

some security model within spot

environment we inherit those role-based access controls

now last I checked I don't know

maybe more

things around masking in touch with a lot of pain so

he's a third party system after

getting some of the names now but we partner

while those third parties around security space

that would do the data masking and things like that so

we don't provide the full granularity

of all those different security

protocols within our system but that's

why you have a lot of these third-party providers so

but anything right in a project like

Century Ranger we Leverage

he's a really really good question and Wayne

feel free to chime in on itself throw it over to Steve

first and then wait if you want I'm in.

What Concept in the architecture replaces

the cube datamart like as a nest

and I think that's just the

mass of parallel nature of the technology

right Steve

not that that's a very informed question

and I was careful I

try to be careful not to stay the word Cube

because I think a cube has a very

specific notion in people's heads like

a space where

again your building that cube

in advance you can build multiple

cubes and it becomes an IT

overhead and burdens at some point so we've

tried to minimize

that burden as I was talking about we call them

analytical views but

they're really much more than a few there is actually an

ocean of dimensionality and

physical data structure

in modeling both on disc on

the sound system as well as some things we doing memory

so you can call it a

dynamic Cube if you want but we don't force

you to build it all in advance we build

it incrementally over time and would recommend

dimensionality to

add to speed up crew performance over

time so we can we call it in and I'll work

with you and it's part of our smart acceleration process

but yeah you could call the cube if you want

but it doesn't have some of the Legacy baggage of what

people think about on

this and not trying to bash a space for sure anything

like that but it's just was designed for different

purpose right yeah

where do you want to come down there real quick

yeah we're definitely moving away from the

world of physical cubes weather

in space

or products like that or

or even at

McLeod it seems that with the

all the horsepower that we have been in

memory processing we

can build these dimensional views on

the fly or

maintain them in a dynamic cash

like Steve the same tube lot of vendors for

doing this kind of thing

each with their own twist on it and

I will talk to you they're doing all right and he

doesn't get paid for a lot of

Spanish Oak Point point of daylight hours in

their own still out in

memory cache

so I'd like but I can't get

to him because they that doesn't go

anywhere just stays in the do not

moving outside of the Cross her but

yeah there's a lot of ways to skin a cat but

that the days of the physical Cube seem

to be pretty much over

yeah we got a bunch more good questions here folks

thanks for sending these in so

when attending here is asking about metadata

can you talk about metadata management

and and what kind of functionality have there I'll

be there some open source projects that have

tried to address that and I know

that the news with other analysts that at

least in the early days of the head you think his system

it felt like they're all making some of the same

old mistakes Again by not really focusing

on metadata but Steven kind of talk about how

metadata is handled with Arcadia

sure medivators handle

just like you would expect within a a

bi tool we have the notion

of a semantic layer is in one example

where the business person and

you know Finance can name

tables and columns within the data

Lake based on the business terms

that they're familiar with that could be a different term

within a different department let's stay under

the mat back to the same day that we can also leverage

any metadata that's been defined you

know things have been set up in the hive metastore and

other systems that the like that we

partner with companies like trifacta

and streamsets the lever Jenny ingest

transformation types of things that they do and

and data catalogs and things like that with water

line so there's a robust

Metadate environment around these

things which we all know is required to have a

governed environment early

on it do it didn't have as much to

Dalton that area but I think there's a

robust ecosystem all around metadata

management did it go weed

in the state of lakes now we we take advantage of

that as you would expect to be at school too and

you can again to find some of that with our school

and do some lightweight transform

works and naming of Metadate and things

but we again Reliance third parties that

specialize in those things just like you

would with in a relational environment

okay good and the folks we will stop

writing stop the hour Yours Truly has hardstop

I'll try to get you as many more these questions

as I can there's a really good one that I

think that you guys shine a bit on

the question is around how you get data

into Arcadia of

course you guys are right inside the cluster

there right so that's the specific question

is something like how does a how

does a user-defined their own

data Lake another input to Arcadia

data and that's kind of proud that

you solved on the box right by embedding

right inside the club drink

yeah exactly there's no spensive

importing and moving data we're

just creating and it's kind

of funny we actually have them a

little internal discussion around the

naming cuz if I go to this data

tabs here I think I'm still sharing

what we called data sets are actually

what I call is semantic layer

of view

of data that's already in this Buster

so we just to find what's

in that data sent through metadata

when you pull it up and

I'm not as up to speed on

all these different connections and things like that but

yeah there's no date of movement it's

just training reviews on the top of the

day of this in the environment and defining

these different semantic layers

which you can again name them different

terminology and measures

and things like that

okay good and yeah way and I'll just throw it over

to you real quick I got the very clever

move by Arcadia it seems to

me to embed right in there that's

it's the general movement in the industry

is away from movement right away

from moving data,

I remember way back when the date my stuff

here Foster Henshaw telling

me about the whole concept of putting

the processing where the date of lives that

the direction we seem to be going on obviously

have to be a very long tail to the old

way of doing things but. E that's pretty

clever way what do you think

yeah I've kind

of come out with little manifest there about

10 characteristics of a modern data

architecture and that's one of

them it's don't move the data so

we're not there yet almost

there because

MILFs most people

still pushing baby out into a into

a bi cash like

son of Arcadia competitors or

into a relational data warehouse so

we are still moving date around so

what I like about Arcadia is that

it does hold

fast to that that characteristic

of a modern-day architecture

so I'm the people are definitely

using the processing power of

scale-out and memory architecture

to reduce the need

for a lot of back and modeling free

I say speak

processing of the data in

a cube or a database and

spending more of their modeling time on the

phone. Where

their Bentley

creative use of potentially

complex data sets in the back end simple

find them for a user's


of that platform to

pull all that David together in real

time and you're cashing

out of getting we're absolutely needed

for performance and also

get to minimize compared to what we used to

do but everything was pretty aggravated

in there was no access to detail

that's right yeah that's

a straw in the wind the

questions here while

I'm pretty sure the answer is yes.

Yeah that's our preferred

structure and save for the data is

in the park a format

yeah I kind of figured that let's

see lots of other questions though,

to try to get to as many as

possible that cigarette request that came in a while ago real-time

data real-time streaming data.

Call Freddy Custom Design mechanisms

in the day to Lake how do you deal with streaming data

yes streaming data

today or are integration is

a couple different ways so your

people will talk about spark streaming right

as one mechanism and in that case

we wait

for that streaming day of the land from spark

into the system and then we visualize it from there

so it's not really real time but

it's your stuff second once

it lands we can visualize it another

really Innovative thing is Kafka or

I should say confluent release the case

equal interface to

Kafka streams a

caca topless topics I

should say so that's now generally

available and we are one of the early people

in fact was the only be I feel right now

that can visualize on top of

case equal on Fresh it's just a connection

to that we've got a demo up on our website you

can send out later to a video showing

that an action but that's

something that's in our latest price

release that people can download an Explorer

and try it today with our kitty instant

2K support

for real-time streaming within the dashboard and then

be able to take action

got time I got actually blow up a

quick little idea what I'm doing

so this was

Ernest we got this iot demo

this is another one we built against

our partner saw Tara and in this case we

want to have this is an environment where you looking

at the safe use

a fleet manager is managing a fleet of

cars and they want to measure what's

happening out in the field of these cars so you got

an event stream this is more of the real time information

about where our car

is located are there different incidents that

are happening and in real time

you're sort of getting that information in here

you can see things update

in this case it's just writing and data

into the file system either

for us older

index for sure also use some more

real-time sub application is not true

stream you're not reading it in memory but pretty

fast and then you can drill the detail from

here to look into one of those specific

VIN numbers of a car that just got

into a hazardous situation or

something like that and go into a detailed

view of what's happening so then you want to

look at sporulation mouses for that

VIN number and and different things are

happening again this has been one information

is not there but you have

a real-time dashboard that can be updated

and then drill into detail because

you got all in all the information in one

place I

love it I love the stuff post

for watching the future here that's a great quote

by William Gibson the future is here already it's

just not evenly distributed at

least not yet like I said earlier there

is going to be a very long tail to the old way of doing

things you heard Wayne stayed at 95%

of environments are still dealing with sparkly

batch processes and the other ways

of getting the job done but this is the future

of this is direction we're going on a big thanks

to Wayne for his time today and of course to

Steve Willis of Arcadia and data

you will get that assessment stop

up when we close out this WebEx so by

all means post please do take

the 3 to 4 minutes go through that little

puppy let us know if you think you can always email Yours

Truly info at inside analysis.

Com hope to hear you on the air

from you tomorrow on DM radio with

some big news as well on that front right now Coast

to Coast AM radio from

Jacksonville to Atlanta in Chicago all

the way out to Los Angeles hope

to hear you on the show sometime you know his tweets

me what hashtag of ham radio and what that word

opinion farewell folks we to archive all these

webcast for littlest things and doing so

feel free to come back share with your college etcetera

and otherwise we'll talk to you soon so

take care bye bye