Are Data Lakes for Business Users?

Hosted by DM Radio

Eric Kavanagh


Steve Wooledge

VP of Marketing

Wayne Eckerson

Founder and Principal Consultant

Data lakes took root because they provided an instant sandbox for data scientists to explore raw data sourced from operational, analytical, and external systems.

As data lakes have matured, organizations have begun to ask whether they can be used to support more traditional business users — executives, managers, and front-line workers who want to learn from curated data and dashboards, not wrangle with raw data and SQL.

Please join us for this joint webinar with the Eckerson Group and the Bloor Group. In this webinar, several industry experts will discuss:

  • The evolution of data lakes and analytical tools
  • Whether business users are really taking advantage of these new data constructs
  • How organizations can measure the efficacy of their data lakes with seven key metrics

More About the Presenters:

Eric Kavanagh

Eric has more than 20 years of experience as a career journalist with a keen focus on enterprise technologies. His mission is to help people leverage the power of software, methodologies and politics in order to get things done.

Steve Wooledge

Steve is a 15-year veteran of enterprise software in both large public companies and early-stage start-ups and has a passion for bringing innovative technology to market.

Wayne Eckerson

Wayne is a long-time thought leader in the BI and analytics field who has a passion for helping business and technical executives and managers strengthen their leadership skills and increase their effectiveness to drive positive change in their organizations.


welcome my name is Shannon

camping on the cheap digital manager at a diversity

would like to thank you for joining today's DM

radio Arcata legs for business users

sponsored by Arcadia data continuing

conversation from a live game radio broadcast

a few weeks ago which if you missed you can

listen to it on demand DM

radio. Biz under podcast

just a couple of points to get us started number

of people that attend the sessions you will be muted

during the webinar if you'd like to chat with

out there with each other we certainly encourage you to

do so just click the chat icon in the upper-right hand

corner for that feature to

an infection in the bottom right-hand corner of your screen

or if you like sweet we

encourage you to show how it's a question sites what are

using hashtag DM radio as

always we will send a follow-up email within 2

business days containing lyrics to the fly's the

recording of the session and additional

information requested throughout the webinar

oh and welcome welcome

everybody thank you Shannon yes indeed

it's time for another guy

right in our data Lakes for business users

obviously that is a slightly rhetorical

question the apparent gold


today Steve Willets of Arcadia data

of course you are truly in the middle there and my good buddy

Wayne Atkinson of X

and group and I go way way back

to the old history

together focused on things like state

of warehousing now courses data and lakes

and is a concept I wanted to touch on Drake

quickly before I hand it over to Wayne to give

me some results from his assessments

and that is the whole concept of data science

we keep hearing about data science

cuz it's one of my favorite quotes from

the movie Nacho Libre esqueleto


right we hear all about signs these

days in data science as well we

all know the numbers don't lie but they sure can

be misuse or misrepresented Topic

in perspective on what we're trying to accomplish here

today what we're trying to accomplish in

the broader business intelligence analytics

big data market we're trying to use

data to get insights to

be able to make better decisions for our business

the mission is the saying the mission has not

changed the tools have gotten much more

powerful we not talk about dado blades

as opposed to data warehouses and they

are in fact very different things are

designed very differently they were developed

in different eras of this

industry and let's face it they

were a full set of constraints many

many years ago Wednesday to warehouses

were designed around which they were

built processors were slow pipes

worth in for example memories

I was expensive so

all these factors really dictated

what had to happen in creating

did warehouses are also

extremely expensive it's a good

pair of data warehouse appointment today versus

125 years ago it's

astonishment the price difference is gone

from millions of dollars to hundreds

of thousands of dollars to even $100,000

or less depending upon to use case

in the complexity of communication

is hard to do I think people

take medication for granted and

at the end of the day if you're not communicating clearly

with your team with your business users what

is it to learn what you can glean from the data

been really you've been on a Fool's

errand until it's something to be considered

here so science

data science I think it is applicable these

days I think that is an accurate term that we can

use to describe some

of the more robust well-thought-out

and efficient environment for

managing status but I think

there's a significant disconnected our culture

today when we think of this term

is science I think a lot of people believe that

represents a virtually installable

version or representation

of reality and that's just not true at

all scientific discipline

and it relies on a methodology

aka the scientific method

which is applied appropriately

and effectively and efficiently and responsibly

can give us great insights about

the world around us but remember

axiomatic to the

scientific method fundamental intrinsic

to the scientific method is a commitment

to Forever questions

your data your processes

your hypotheses and even

your conclusions so again.

My point is that we need to take the term

science or the pit of a grain of salt here and

find this change their mind being

bad for your member house it's really

scared of Peter kitch about 10 or 15 years ago

all the cholesterol in eggs you're

going to get a heart attack and then what happened

and it came out with good cholesterol and bad cholesterol right


what does that mean I think

the point is it's fine that will change their

minds about things sometimes it's frankly

can also be paid by large organizations

to say things that they stay probably believe

to distort what

is the reality that were all trying to better

understand so it's just one

quick example of how much we really don't

know these days when will the lava

flow stop in Hawaii we

just don't know and the reason we

don't know it's because the Earth is really large

environment and things like

volcanoes are extremely hard to protect

their very powerful very complex

and we just don't fully understand what's

going on does the magnitude

of those the problem space if

you will trying to understand where the hell I was

going to blow where the pictures will come from

next what that volcano may do

next week just don't know and

so I think it's just paid to remember that

data will always require analysis

no matter how efficient you are with a data

Lake management project for example

you're still going to analyze

that dated to put it into contacts to

view it in reference to

your current situation than

the historical data that you may have no

data is ever going to give you the complete and total

answer of the story because you have to

come up with a story yourself so

leveraging what you know is important to take

Sammy takes boxing analytics

and Big Data are all very useful invaluable

if we understand what we know

I know roughly what we're

doing and so with that I'm going to hand it off to

my good buddy Wayne Eckerson was

going to talk about the assessment

that we've done on behalf of our katydid

in our end users about data lakes and

the value that it provides for business users so

with that way never seen I hand it over to you thank

you Eric it's great to be here

with you once again and date

of birth city and everyone in the in the audience

whoa that

didn't work that

image for some reason it's

not showing up that image

of a data Lake and


data Lake yeah

I saw this this webcast

it's about the business value of data

lakes and it's Eric mentioned

date Alexa Rose

number of years ago almost

10 years ago when Cloudera was founded

really to address a

lot of frustration on the business side with

the data warehousing you

know too slow too

hard to the design

too hard to change very costly scalability

Tech capped out in a couple terabytes

really didn't really handle

unstructured data very well so

I'm fast forward

to the day awake and Hadoop and

that made a lot of people

happy however

it did not replace the day to Warehouse

what did they delay became

in essence very quickly

was not a day

to Warehouse replacement but really an

ultimate ideal sandbox

for data scientist

or power users really

wanted what they've always wanted

historically is big giant

data dumps and then

to get icy out of the way and it

did like the sensory was bad just

put all the data in one place and

then let me go in and navigated

manipulated manage it

and analyze it

and create models from

it so the data Way Grill

in its first Incarnation turned

out to be a great for data scientists

and data analyst

how are users he wanted access to the

raw data it

really wasn't a replacement for the warehouse

that supported standard

dashboards and reports for

I would call Casual users

executive Managers from my workers

really needed Taylor and accessed

information so

yeah we've we

passed recently when we got together at

Arcadia what about State

wakes and regular users who don't

know sequel python or Java which

was a tools of choice for

a dupe type of processing

in analytics need a

graphical interface to analyze

data also known as bi

tool cool require

clean curated aggregated

data another would someone typically

nighty to go in and

take the raw data and then nip

you waited so clean

it is integrated so that deep casual

users can make sense of it without

having to do all that manipulation ourselves

and who needs sub 2nd

performance Perry performance and

use a data in reports and

dashboards that are highly Taylor to meet

so they only see what they need and nothing that

they don't try

to find your past that really meet

their their needs to glance

at KP eyes and

and take action if

things or arrive so


nothing a good thing for

regular users regular Joes

if he will executive

managers front-line workers even

customers and suppliers but

we got to go to the car came inside let's

let's test this let's do an assessment

and figure out if

this is still the case if

they do legs are still just for power users

are not so

we did an assessment came up

with the survey of 22 questions

for took about 5 minutes to complete once

they completed it ever seen group assessment

or surveys generator Dynamic

support as you can see here on the right Ira

personalized it's it gives him a

score comparison to everyone else

over all in by category

was conditions for next steps based

on on their rank in

the scoring so that

Stephen still running now and I encourage

you to go out and take it at the link below

to assess

the value of your day

like if you have one for your

regular business users

so as of

April 20th when I put the slides

together 100

almost 200 had started the

Assassin 262 at completed at


have a date of lake in production and those are

the folks that we really want to focus on I'll

go to 74% were from North America

and about half was fairly large organizations

with more than 10,000 police is

based on that subset of the respondent

base I think we're up to almost

250 respondents now we'd

love to have you get us up to 300

so I'll write out that you

are out and go it only takes

5 minutes or less to complete the assessment

and you get your own personal free


so what did

we find out first of all surprisingly

a little bit is that

Ford 800 x most people almost

two-thirds are using to do for the

day like and

I hope that's not too surprising did

awake it's become synonymous with a dupe

but in the last couple of years we've

seen a real rush to move the

state of eggs into the cloud

and replace a group with Cloud object

stores which is which are

currently running at 14%

of respondents in our

our pool not

sending 17% or running

their big lake in a relational database

and some of you might think

that's an anomaly but

truly if you used to give me method

of design a day to Warehouse he

always called for a staging area essentially

a place where you put your raw Data before

you turn it into third normal form before

you crazy and

push updated March from that

I also 6% said no

sequel database nosql

days is not an analytical database

by any means but it's only can hold a heck

of a lot of data that can

be used for analysis

second question here how do most users

query the day like in this was very surprising

you know David scientist

tend to prefer tools like python

Pearl Java and

other coating type languages

or in the head tube for old

Pig Hive tools

like that or just plain

sequel if so the

data in had to piss inside the

park a files in,

format so we are actually

pleasantly surprised to see the more than half

or using a point B quick

visual bi tool the

court of the Dead awake so

that that was surprising now

I will say both

floor group and ungroup in Arcadia

promoted this survey and

each delivered a

bunch of a number so

there may be some Buys in there but I don't

think too much so I think we can

trust it this data is generally representative

of the market

okay then

we asked what you

know where have you deployed your data like and

you can see here that the large percentage

is still on Primus public

cloud is ranges

between 19 and 20 18%

so less than 20% there and

between 15.6%

I have a hybrid environment both on premises

and Cloud.

I just said that we're seeing a

large gravitation

towards the clouds or running day lights

with this this chart actually

contradicts that and it shows

that company so I got to play dead lights

in the last 2 years or more likely

to deploy on premises so

I'm not sure I quite understand that that

kind of runs counter to what we're seeing

generally out there or at least anecdotally

but numbers don't lie is

that Eric would like to say so

I have to discuss that in a little bit

can business users explore

data to get the views they want so

this is the part

and parcel of what power

users always do and Casual

users to some extent do and

you can see hear that more than half almost

two-thirds agree

or strongly agree with that statement

so the day

like really is exploration area

of Discovery area and

if users are using bi tools then

we have to admit

that a large percentage of those users

are are casually sources do want to do exploration

Racine hear that the date of Lake far

from being a day of swamp is actually

providing information and data that users

find trustworthy and

enables them to make better decisions of

course that's the whole point of using data

is to improve

your decision-making improve

outcomes for the business so it's it's great

to see that over 50%

agree and 70%

strongly agree with that statement

this is

another surprising morning we asked about query

performance and 50%

agree or strongly agree with the statement

that the daylight provides consistent performance

when you think about it I do

quiz to find you some batch environment

and only recently has become

interactive with sequel


so things are moving very fast

in the day like world and

Abel support out fast

query performance in response

times the next question

about the accuracy of analytics

in the day awake

and that's another reinforcement

of the notion that these day lights aren't they the swamps

and that people with bi tools and

not only make good decisions but

trust the day that they're working with their

we also did a lot of analysis

by company size

and we didn't find much variation between

large and small companies

although this chart will

show you that very

large organizations with over a hundred thousand

employees are a little bit more advanced

47% strongly

agree that isn't

Caesars order to get the views

I want where's very

small companies with less than a hundred employees

good 40%

disagree with that statement

we did a lot

more analysis and we're writing report

up on the results

but in general what

we're seeing is that according to this

data from this recent

assessment most day like today Ron and

Duke on premises were

staying at the dead legs are not there swamps

According to some gurus

at me in the Stream that

companies are able to maintain

high quality data in the state

of lakes and most importantly they're

not just for data scientists there

are graphical bi tools being used

heavily that provide fast for

a performance for Larry's an

exploration and finally that

their quality of data in The Lakes is suitable

for regular business users so

I must admit these results

in summary and it when we do have more

details in

the data were little bit surprising

to me but I think it's a good Testament

to how far and how

fast we become with

this new technology I do

but now the clouds

and I think that

is probably a good Segway to our

next speaker and Steve

will let you can talk about how

they're supporting both

regular users and power users in

Gator Lakes using their visual

bi to him a


questions Wayne I'm curious

to know have you found or what's your take

on the people who were involved in

these projects in other words do you find

that the people who were in the data warehousing

team are the same people who are working

on data Lakes are they different teams

can you offer any contacts on that from your experience

yeah you know I think in

the early days a lot of the data

links were started by

Advanced analytics changed his

experiments to create

an analytical sandbox to fast-track

delivery station delivery

of analytical models predictive models

but will call you machine learning models

today I think very

quickly it's those things scaled-up or

failed a lot of them did not work

out but I

T took over that

infrastructure which makes sense

as an Enterprise environment

that can support

a lot of either a lot of

users the Enterprise or a very

important segments of users the power

users and data scientist

so administering that varmint

became Charlie

the domain of it

Steve's May

disagree with me but that's that's that's

what I've seen today

traveling teams

of these as he's Daylights have matured

take them over in the other group

so just mention like data governance 2

hours and you see the bi competency Center of

being involved in their search using standards

for these these platforms

as well so we'll talk more about that but

definitely is it's becoming mainstream woven

into the average few of

the organization that

a lot of organizations struggle

to reconcile their

expenditures on data warehouses and their

expenditures on Hadoop Hadoop

obviously this is less expensive by

terabyte and a lot of

business people look at the budget

on the bottom line of these environments

and you want to replace Adidas

warehouse but technically

that has not really been feasible

there are things that companies

are offloading from the data warehouse

that probably never belong there in the first place or

offloading me TL or detailed

data we're starting

to see this bifurcation at

least for now things to change

quickly that they don't houses is

well suited for supporting large

numbers of concurrent users we

need to do basic reporting and dashboarding

where is the date of lake is

suitable for power

users and for bi SWAT environments

to build things really quickly prototypes

an experiment test them deploy

them but now

we're starting to see a lot

standard applications

in ripped applications also happening in

Hadoop as well so

I think these

two environments or co-opting each

other there are quickly developing the

capabilities that the other one has and they're becoming

more and more identical will

never be the same about the

dividing line between them is it's getting

fuzzier but we are seeing

I do a birthday like taking over

and more and more of the functionality of Analytics

then I guess since people just kind of throw

this over to you real quick before you

jump into your presentation it really

you do want these two environments to

be coordinating collaborating

you want to be a lot of overlap between

them and it seems to me and

I know that you guys are kind of playing in this space

but from my perspective

and random in the endless space I'm

on the outside of all of this but

I fear was surgeons in business

intelligence almost like we

went down the road a big data analytics we

learned some interesting things but maybe

we're not as Tethered to the Core

Business objectives as world of business

intelligence was and I kind

of now see a Resurgence of use and

bi Tools in able to buy more

powerful infrastructure underneath

that can tap into traditional data

warehouse environment closest polling sites

from Dean Lakes in from these new

environments these is that what

you're saying or what your take on all that

yeah that's why I think the

power user I think there's terms

like citizen data scientist floating around

with your kind of interesting because a

lot of excitement around to do was

getting after all the granular

data there's not some it Department that's

pre-processing and

telling you what you should be analyzing it

it's sort of an expiration area but

I think you know what's been missing is

how do we give that same power to the

to the business users and then you

know you've got things like machine learning that are being

adopted by bi

and Alex schools out there that can speed

up the discovery process can put more power in the

hands of these power users are system data

scientist and things like that so I think it's it's

just been this Natural Evolution as head

of the next generation of native people

start using Technologies like to do

been filed there is I think

a generational

growth of different technologies

that need to keep up with the demands of these different types

of user so but I do see it coming back

to the end of the day sequel is

the language that people want to speak and

if you've got a gooey bass tool that can generate

sequel that can be utilized

by the date of our house or the date of Lake

I think that becomes a standard through

which you can do your houses so I

know here that assessment

which Wayne talked about from

which we got all that day that we were just sharing a

moment ago you will get a link

to that assessment in your follow-up email

later on this week so we hope you take

a look at that and Dive Right In and use

it really also not just understand

where you are going to see where you compare to other

companies and you can even do analysis

of a company's your size your region

and want some pork industry so it was

designed to provide some

really nice granular

detail to to give

you some perspective on where you are in your

organization and give you some advice on

which direction you should take so I think

it's a very powerful tool and I would recommend checking

out the intestines so with that Steve

would take it away

yeah thanks Eric I'm waiting and I'm really

pleased with the survey because I honestly

I've been in this industry 17

years overall I've been looking at the Juke

big data and data Lakes for like

the past I don't know 8

to 10 years I've never seen research frankly

that really gets into the adoption

the usage what platforms that are on day

lights that's really cool to see this

research going to come in the house and as

Eric said I think there's a lot more people out there

using its love to get the perspective from folks

but what I'd like to talk a little bit about is

what I've seen change in the technology I've

been at traditional

bi companies in my past I've

worked for large database companies

like teradata I've worked

at the Duke distribution vendors and

now at Arcadia day that we were really built to

focus on that challenge of how do we put

Power of bi into

the hands of people that want to go after these modern-day

to platforms if you will

and really what

we're starting to see now is that large Enterprises

as I mentioned he's bi competency

centers they are choosing new bi

standards for their data Lake which are separate

from and really not competitive

with their day to wear Warehouse

bi infrastructure because his Eric mentioned

at the beginning the

technology for bi that came out around you

know that was really based

on the processing power number in

things that we had been and I think there's a whole new world

around big data which is obviously

the size of it but also the variety

the date of the speed at which comes in the

need for to go have more real-time access

as well as just distributed

systems and

yeah the whole concept of figure

out what questions you need to ask and true date of

discovery versus again having the IT department

try to cheer a and

build cubes and things like that that are

based on business requirements but maybe not

opening up all the granny with detailed

data to the exploration of

some of these citizen data scientist

or power users want to do on

all these new sources of information they now have access to

so one with its way

to say that I think times are changing and there

is this inflection point and if you look at the technology

history is kind of interesting because

the date of Warehouse relational technology

is Eric mentioned was built at

the time when processing your Hardware

was really expensive memory is really expensive

and there's a lot of optimization done

at the stalker level to immigrate very very

tightly with the hardware to make sure you're maximizing

resource utilization so

those systems been to be proprietary which

is not a bad thing that I actually super high performance

but you couldn't take the

eye server software or a bi server

and run it in that same software

layer with a database it's running because it

was so you know

engineered for performance so

that's why you've got

traditional Vehicles spit on servers

desktops and will access

date in the date or out and there's nothing wrong with that that's

how it was set up so when you look at the analytical

process you've got to create

physical optimizations

of the date and how it stored physically on

disc and there's aggregate that

are created of course happy Samantha

Claire's at the bi to level

which connect to different data sources but

a lot of times that Dad

has to be secured and load it in two different places

and when you start talking about

realtime in but it's the laws

of physics date that you just going to be late and see

is your movie theater across the wire from one system to

the other not to mention the overhead of multiple

security layers and models and roll bass access

controls and need to be connected and

acceptance sync between these different system so when

you start to throw a big data into the architecture

you got semi-structured data you've

got these massively parallel systems

like to do and plowed object stores

and you

just the volume of date and the time it takes to move it

at that are you lose that ability to connect

natively and do real times and

now it's on the system so

when we found it Arcadia data

back in 2012 it was really to solve

that problem in for people that have the day like

in place how can we give large

numbers of concurrent business

users access that information and

the big I was

you rather than having to work

as a server let's do what

they said to do was all about it ring the

processing to the data let's build a bi

server that is fully distributed

runs and parallel across all the

Dayton OH so rather have separate bi

server we said let's use the servers

that are already in place will install and run

or software natively on each

of those mailers and we talked about needed bi that's

what we're talking about it I'm bi server that

takes advantage of the open

nature of Open Source software and

just modern-day to architectures like

the father he's got lots of processing engine

that can run on those data notes

and take advantage of the low-cost commodity

hard and you're

the resource utilization may not be as

Kylie optimize but the cops so

much lower you can just continue to throw

machines out of there fairly low cost

and scale extremely well that

was a big change that we made it on the architecture

side which also has

huge advantages from the overhead side where

you don't have to optimize physical layer

of spice you create a semantic there once

you can connect natively semi-structured data security

is done once we inherit security from

the underlined file system

and security systems

like a patchy century and Ranger and some of those and

you want to put in leggings

there you don't have to bring it into separate analytical

where which just by Nature gives you more real-time

access to the data

that's really architecture in the results

look like this this is a proof-of-concept

that we did from someone who's not a current

customer teleconferencing

platform but not allowed

to name but they're requirement was that

they needed 30 some current

customer success

manager is to be able to analyze the

log information around

the use of the spell conferencing service

to look for bottlenecks or issues

when service with would go bad and things like

that so they had to be complex queries it

had to be bi school and it had to

be 8:30 concurrent users and I

took away some of the names

of the different schools cuz I'm not trying to point out any

issues with seat want to get pensions but

to connect the the issue was there trying to

take a traditional bi2 on connect it to

a sequel on Hadoop engine

and there's three different engines they tried

in blue spray

and yellow and once you got above 5

concurrent users the performance

degraded significantly and results

were not returning so

again the contractors that Arcadia

data or I need to be ice platform is not

is she playing the gungeon it's

not just doing scans it's actually

optimizing performance and thinking like

a bi server that runs in the date

of platform that gives you that ability to support

lots of concurrent business users and

accelerate existing bi tools or

we provide our own v i II or II

and of course only

sit in the day too late cuz we talked about the data warehouse

is not going away it serves a very strong

purpose and we're close that didn't belong there are

moving on to other systems you've also got things

like event streaming was a really

popular now people want to be to stream data

from iot sensors out on the field and

connect the car so tell show demo

on the 2nd but needing to

be alerted to but also see

data as it's happening in real time respond

business that is happening with the ability

to drill to detail in the

data link or connect to other systems

where there's no sequel or

a relational system if you had a visual

eyes all that in one place is the

requirement of course if you would have any

bi to all I needed bi tools are

no different than can support that

so just the one last thing on Arcadia

specifically is the other thing we really

thought about was not

talked about cubes and all that you've seen

this idea that i t

Bill's bees with the business based

on business requirements in advance and they can be

fairly complex projects

to take on you build a cube and you're trying to teach

people how to fish but

you're only handing them a certain number of fishes within

that Cube and every time they ask for more

information you got to go in and then more fish or

recreate that Cube so what we talked about his

I can we give and users dry

nail or access to all the data for a Dock

Brewery in the data Lake and

provide optimizations

as we go on the fly so we treat

these things using some machine

learning and a recommendation engine Roxy looking

at one of the cruise it's people are right

what are the tables are accessing of

the files are accessing it will recommend

to the administrator what we call analytical

fuse and these are passing

mechanisms our guest and physical models

that will build back on disc in

the distributed file system run the product

store got to take advantage of memory

on the machines to make sure that

the next time those queers come in there's a crawl

space optimization decision which

will write that query to the fastest way to

bring it back so there's no modeling in advance

human is still involved in

cashews the physical modeling strategies

but it's it's using AI if you

will or machine learning said recommend the

best ways to speed

up those queries in the future so that's smart

acceleration is what we

call it when our system that again.

Cubes on her head nothing wrong with a lot Cubes

but it doesn't reduce don't want to wait

and see in the process

speaking of the process I'm going

pretty fast but you'll get besides afterwards

I think what we're seeing is if you look

at the bottom and white

A lot of people are taking it away

and they're treating it just like

another day to wear house or storage machine

and they're trying to take their bi

server connect to it there's nothing wrong

with starting that way but we see is this

analytical process that can really

be delayed because as

Wayne refer to if you're following

the end of model and third normal form and

at staging area yeah there's a process

by which you're going to land the data in The Lakes

you're going to transform it in some model trait

the scheme on that then you connect your bi

tool it's running on separate server just

the modeling part of that can take weeks so

this doesn't read before you can even

start to connect to be a server to

it you got this modeling us done and

then oh by the way you're going to trade cubes on

BI server to speed up performance

there when she brought in the date of from

the lake in step five and then

again you've got to secure it in two places go

before you get two steps fix it could be weeks

or months before you're actually able to do

any kind of analysis on day today

started out the date of eggplant before

you put in a production and maybe some additional modeling

so with the native approach again

one system where it stored and analytical

processing is done in there so

we land and secure at once you

can normalize and create schema if

she wants there is a semantic where they can also

connect to semi-structured data structure

raised in those types of things in your anal to the

Discovery process is much faster because

you're not moving data you don't need

to worry about the optimizations in advance

it'll run just fine for the

discovery queries and then that a

i driven from smiling the smart acceleration

it's done I can be after the fact

when you decide you want to put something

in the production so you're not moving the day that's one

security model and you're taking advantage

of next-generation technology to

speed up at analytical process as well as

Molly on the back end so greatly

accelerates that time the Insight

from weeks or months 2 days and I can tell

you again having work for big

database companies I've had customers I've worked

with you said your anytime we need to

add a new dimension to the schema

in the data warehouse is literally 6

to 12 months of time and a million dollars

of cost so if you just want to bring in I don't

know clickstream data into the warehouse for

Discovery lot of these systems and departments

have been fed up by which because it's

so highly governed it's just a long process before

you can get into some of that so I

think that's why you saw this need for data

scientists and the Resurgence of creation

of data Lakes words more exploratory nature

so I know we're going to save some time for questions I'm

going to be a quick demo flyby of kind of what's

possible with a date of native

technology come

back to this but

I slept over here and

sorry my emails up here but

this is Arcadia

data and in this instance here

we've created a couple of different Dental environments

I've got one on connect guitar this is cyber

security application that

I won't show here but you

go ahead and launch this and this is a a

demo environment talking about connected Vehicles

which it's a very hot topic now on

automated autonomous Automobiles

and you can imagine as a fleet

manager for let's say I don't know some

Service Company like AT&T is putting

Vehicles out in the world

do you want to get some notifications of things

that are happening so you can have real-time event streams

that are coming in this could be coming from something

like Apache Kafka could be coming in

from spark streaming people

you stole her and indices and

things like that for more real time updates

and analytics and we're requesting this information from

the vehicles and again

we're looking at your legal and

departures and yellow things in orange

or collisions in hazardous conditions are

in red and we're looking at you

a map that you can zoom in to and liquid

in San Francisco for specific events that

are happening or I can slick on an

individual then or

a car and doing more detailed

analysis so the history of

that car and what's been happening over time

so this could be across different drivers

and for this then we see all these different events

that happened we've got some scores

that are being calculated these are results

of Sparks jobs that are looking at

the acceleration of

Russian score if you over some for

this vehicle how much is it been accelerating what's

the strength that which these people have

been breaking steering

knobs are sensors

that are on the device for accelerometers and things

like that and then you can start to

do some correlation analysis for those drivers

are car and things like that and look at things

like it was there a correlation between

people that drive really aggressively in

the number of collisions at the raining again


but also gets into things like predictive

maintenance and what's the correlation between

acceleration and the

needing to replace brakes

or transmissions and

things like that so the fleet manager you've got

the ability to monitor things in real time but also

drill to detail looks for correlation and

all that within the simple UI

that's a quick flyby the types of things that

are possible I got

one question that came in about what industries

are these in data likes that is and

I think it we see it hugely and financial

services telecommunications

government's retail CBG

all the traditional industries that have lots

of products locks the customers dial

TN Spencer devices will be a lot more growth

and things like that but

it really does span all

different kinds of Industries in different forms of use

cases so just real quick I wanted

to show the tool itself and how easy

this build stuff so this is environment we've got

running it's connected to

the date I should

say city in a day like from the distribution

and all I want to do is show you

how I build a sports I'm going to connect to

data source this is just TV data

on a few worship across different

channels from a TV network I

called it a Kirsten TV cuz I know Wayne's

going to get into TV radio again someday

just kidding but

I'm just going to take that day to said that's already been

connected to I won't bore you with how we do the

connections but it connects the lots of different stuff and

now I'm going to build a dash for it so I just click the button

that says create dashboard it

pulls in the day that's been connected to this

is looking at session ID user ID

etcetera I just want to simplify that

down someone to look at it here I'm

going to look at

overtime so bringing the date string

call record count for all channels are

programs that

really quickly so now I've got a nice simple date

string I'm looking at all the record count overtime

for that but you know

where visualization tool so let's try something

we've got something like 30 different visualization

pipes and I'm lazy I don't want

to try and testing myself so I'm just

going to click on the spot and cut Explorer visuals

what this does is it used

to sew machine learning and best practices that

are built within the product to recommend

different visualization types based on the dimensions and

measures that I selected to

hear some different things like bubble charts and

Scatter Plots and horizontal bar

charts there's a calendar heat map which

is kind of interesting so I'll grab that and

this again it's just all records of her time

and that you can see the hot spots of days

in the months that were really heavy in

terms of people watching TV so we'd like to try

and explore that sauce closed

so David's Stone

and I'm going to splice

a little bit differently I'm going to look at

channels and programs

and the measures will stay as record

count bunny or limit that to the

top 50 just to speed up

looking at here

and sit down so refresh

that and

it's turning but there has got all channels

different programs and record count again nice

type of the form but I'd like to visualize it's

let's see what the system recommends to me

and this is the real data if you can visualize

it's not just make

up thumbnails I can actually see the results

here I'll go ahead and click the horizontal

bar chart and

that looks good from channels

and it's ranked and sorted

I got the top 15 so

save close that and

the one thing I want to do is

add a couple filters real quick and

I will open up the questions again just

showing how you can connect to data Explorer just

like you would expect within a high school so

it's asking filters

when I add a filter for a channel

and filter for programs

say that one more time see you

at so

here we have it and I've got some filters

stuff and I can see what's happening over all time

for all channels let's pick a channel

655 in here dork

anorexia looks this once let's see what

are the top shot what are the top programs on Syfy

show face off

Friday Night SmackDown

Bourne Ultimatum X-Men


like that so, but

gives you a sense of how you can do this again

Arcadia data running directly in

the day delay giving you access call The Grind of a data

so that I will stop

jabbing and we do

we do have some good questions here so let me just

start throwing some over to you one

of these pennies is asking about data quality

where the quality updated being curated

inside the crkt architecture

we do not focus on that

a prep that's something that are Partners like

trifecta box. A strange

that some folks like that will get into

we have a little bit of day to prep stuff within

it for the the business analyst what

we were really glad those Partners

provide a solution

that runs within the date awake to do all

the standard preparation steps that you

would want for more curated data

okay and working

pens is asking about 3

as a possible destination can you kind

of talk about your relationship

with Amazon at 3

yeah absolutely we have a number of customers

that are fully on the cloud

trying to wish names I can mention I didn't do

stars one Turner Broadcasting

is another but a lot of people are

starting to store data directly in S3

to still leveraged you could do because in many

cases so I can be able to run in the elastic

tear and connect directly to data and S3

to visualize it but

that's that's something we've had for a while we

just announced support for Microsoft

is your data Lake store as

well okay you

must have been reading my mind cuz that was my next

question I was asking doesn't

work in Microsoft as you're in the answer is

now yes over

to you this is an interesting one we got to talk about

it already but one of the attendees nose than

likely it

people used to working on a date of Warehouse

are going to require better than mine

step shift what

did You observe that can facilitate them to

count every Orient themselves to

focus on supporting a data

link versus the date of our house

well I think

it looks good is that a lot of the skills that

those people in there they're dbas

or what have you have her to

please reusable I think we're starting to

see more and more analytical were close also

moving to the dead awake for new applications

as people want to build them in as

one of the colors ask about you know Theatre

quality cleansing skiing all those things are

still really valuable and important I think was

changes just rethinking what's

available in terms of bi tools I think you

know we were first to

Market to be to connect two things like apache-kafka

natively because we're just kind

of in that space and they've got a new case

equal in her face that allows you to to

query streams of information or

things like Apache solr or

Apache kudu and other types of data platforms

that have some benefits to it and being able

to explore data and take advantage of

nested data like things

like Jason's trucks and raised where you

got the meditate at in the the

date of hornets so you may not

need to build a lot of scheme

in advance just give users more access

to it but you still need to have things like

roll base access control and security and things

like that and I

think those concerns about securing

all those involved installed by the community

I think the next wave is just providing

tools that can take advantage of those who brought

her stuff so I don't know if that answers the

question I think Wayne probably gets more involved and

used appliances of training and education


yeah I mean I was going to say that

you still hear you got to pay the piper at

some point and you have to create

a schema for this data I

mean the value of the day too late for power

users it was scheme on reading

didn't have to wait for my Tita model it but

you know at

some point especially when you're trying to get

strong query performance for

large numbers of concurrent users you

probably do want to model the data and

that raises the question I had for Steve

when you talk about your smart

acceleration you kind of insinuated that

you really didn't need to model the

data that using machine learning your

tool would be

able to eventually

crate Autumn Cash's

I'm Aggregates automatically

so that you can get up and running

pretty quickly like in a matter

of days without having

to do any modeling at all the tool wood would

essentially create structures

on the Fly based

on queries you feed it may

be priming the pump to

deliver the decline of performance

that users would want the morning if that's accurate

reflection of what you're told us

yes it can certainly do it that way

you know but it's not pixie

dust right I mean you still need if you're

going to have your metadata definitions

and data Steward data catalogs business

terms in the date of

the tables that people want to access I think there's

you're still need that at some level

for tickly as once you done some initial exploration

if you want to provide a

broader View to a broader set of people having

those definitions of magic

players and things like that in place are also important so

anything that someone's built in the hive metastore we

going to do for other places can we take advantage

of that or if they've got to be at a cow in place

we conserve read from that and make it that

also available but yes I think even

for those query than that may have been to find

her the tables have been set now

there's going to be acceleration strategies

based on actual usage that

is in the ministry or may not think about an advance

so we can kind of monitor that in

the system will recommend other ways to

speed up the screws in the future so yes

it can be used turn on Raw data as it's

come in without any set up in advance but

it's also beneficial to have more

the curated date of that will live in support

soon as end user applications were you talking

about hundreds of thousands of users on

the system as well

but you don't necessarily require

it certainly wouldn't hurt for

users to create

a schema inside

I do using high for whatever

right to support Paris

but there's value to

it and we can also read it and take advantage

of it yes it's not required


I mean that seems to be the trend these days

with a lot of these new technologies and tools

is that processing

Powers is so great

that they can deal with now

the source chemo sloppy

scheme of it comes from the store soon

and do something with it and

give value pretty quickly and

you can only enhance that value by

by doing more to sign

up front and and

in your tool you actually help

do that as well with the smart accelerator


okay good we got a couple more things here

are several accidentally throw and you kind of give

did the loot at this moment to go but there's a specific

question about state of catalogs

and semantic layers and what you were saying

is that Arcadia is startups

leverage though how

that happened in and where it happens

in the process of play

you cut out a little bit there I think you're asking

where did did catalogs

play with in all this or where does Archie


does that actually work

well then

pausing just cuz I wanted to so there's this consorcia

for what it's work at worth that were part

of the club make big day to work includes

vendors like tripac

the streams that's in water line data water line

is a bit a catalog that was built specifically

for data legs and it deep

in particular but there's also Elation date and

others that are out there so I'm

not an expert on those things but I understand

more more people is there

you know having multiple systems like the date

of Ross and the data link together they need,

definitions of a customer

and things like that and where that is stored in and

what day is available where is so we

can connect to any of those used car

buy back to the business user

the those definitions that have been to

find access that day to bring it in things

like that and then we got our own to all we have is semantic

layer which runs directly

to sort of in the tool in Hadoop and

business hours can create their

own data definitions for four

tables or data they're looking at that hasn't

been to find it and there could be no

user a in sales going to

name the date of one thing that makes sense

to them and use your beat it to spin

I don't know engineering might like name it

something else so that you can also do that at the bi

to level but there's obviously some

you know concerns

without if your data

governance purest and can I have in a single definition

for forgetting things like that

but yeah there's just going to all possibilities and

I would encourage people that go out and check out make

spaghetti work we've done a kind of a webinar

education Series in around

data catalogs and things like that and kind

of this world okay

good and here's a good question from

an attendee I think I know the answer but if

you would share with the audience from your perspective

what's the main difference or

differentiating feature between what

Arcadia doing and what time you could do with a

product like Tablo

yeah I need a different the key differentiating

feature is

the fact that were a massively parallel system

that runs directly with the data Tableau

can cluster environments

but our perspective

is Bennett there's a lot of knowledge about

how data stored on the on the individual

nose and are soft for sitting there next

to the data we can take advantage of

that that local knowledge or not just passing

sequel back and forth through

an odbc driver something like that

I'm working to run natively where it's

at so that just gives us tremendous

scale performance it's

a lower TCO solution overall

yeah that I didn't is the architecture that really

makes the difference but then also as

I talked about that process it really speeds up at

time the inside because you don't have any data latency

over-the-wire you're not needing

to move data from one system to another there's

the security where we

just inherited directly from the date of platform

you don't have to read Minister it in a separate

the guys who own that's just it's the

philosophy of a native the eye solution

which I think is becoming a thing

okay good and

you run both on private them

in the cloud right can you talk to that real quick yeah

absolutely I mean you

can get in a long debate on the difference difference

between spotting on Prem do us to

like customers I should stay just a deployment preference

a lot of people that go to the Pod

off and start out because they just don't want

I'm not in your man around data center so

you can star software just

as you would anything else in that environment there's

also some advantages that that

we have in it that obvious environments that I

won't get into on this but

I think there are some it

was virtual machine instances and things like that

someone some different thinking around

how you architect software to run

in those environments to scale precisely

with the work clothes I'm just going

to leave it at that but there's a lot of things

that we do in the positive very interesting I

think I break down and people

that are on 5:00 somewhere or

what in the survey

results thus far from Wayne up in

roughly 20% cloud

and a large majority still on Prime

but certainly a lot of people interested in the

hybrid and environments

but yes we can run there right

someone is asking is it similar

to what to know does your key

is giving direct access to the data

through this highly parallelize environment

right literally taking the processing

to the data and a highly parallel way

but you don't need to do virtualization

is that right

correct and you

know there's a need for data

virtualization or a value to it I

think for us you know where you want

now the physical copies in one place

like that's where you going to get the the huge

performance staying

so obviously we live in a world where

did it's all over the place or theirs needs for Federation

virtualization those types of things but I

think for production applications

where do you want to play at the hundreds of thousands of users

again that's why you would look at something

like a native architecture in addition

to the benefits of exploration

everything we just talked about

okay good thank

you Steve and thanks Wayne great stuff will

talk to you next time thank

you Erika and thank you Wayne what a great

presentation and thanks to our attendees for being so

engaged in everything we do and all the great questions

that have come in and just a reminder I will send

a follow-up email by end of day Friday

with links to the recording and

links to the assessment for you

and I will

see if I can get you a link to the additional demos

and such from arkadata so

thanks everybody and sponsoring

today's webinar I hope you all have a

great day