Webinar: Take Your Enterprise Analytics to the Next Level with Native BI Platforms for Data Lakes

Aired: April 19, 2018

Many large modern enterprises are data-aware – they deploy processes to transform raw data into information using a variety of data integration, data management, and business intelligence (BI) tools. But being data-aware, or even data-driven, does not necessarily mean being insights-driven.

Are your BI applications providing valuable insights? Are these insights prescriptive and actionable? Are these actions driving tangible business outcomes? In this webcast you will learn what it takes to move your BI environments to the next level by harnessing the power of a data lake to drive new insights and business agility.

Join our webinar where our featured speakers Forrester Vice President and Principal Analyst, Boris Evelson, Alex Gutow from Cloudera, and Steve Wooledge from Arcadia Data will discuss:

  • Benefits and challenges of becoming an insights-driven business
  • Benefits of bringing BI to data (vs. bringing data to BI)
  • Evolution and best practices for modernizing BI through data lakes
  • Getting the full value of your data with agile BI
  • Real world customer successes

Find out how you can drive new insights now:


Boris Evelson

Vice President and Principal Analyst

Steve Wooledge

Chief Marketing Officer

Alex Gutow

Senior Product Marketing Manager


welcome to today's webcast brought

you by Arcadia data I'm Steven

fake directory database Trends in applications

and unisphere search I will be your

host for today's broadcast our

presentation today is titled take your Enterprise analytics

to the next level with Native bi platforms

for data Lakes before we

begin I want it's when they can be a part of this broadcast

there will be a question-and-answer session

if you have a question during the

presentation just type it into the question box

provided and click on the submit button

will try to get as many questions as possible but

if your question has not been selected during the show

you will receive an email response

plus one look if you work today will win $100

American Express gift card the

winner will be announced at the end of the event so stay

tuned to see if it's you

now can you do star speakers for

today Boris evelson

vice president and principal Analyst at

Forrester Consulting Alex

buto senior product marketing manager at

Cloudera and Steve

will advise president of marketing at

Arcadia data for

more information on our speakers today you can click

on the Arrow under their head shots on the

console now I'm going to pass the event

over to Boris

excellent good afternoon

good morning good evening everyone

I see people dialing in from multiple

time zone so thank you for taking

the time out of your busy days

to go to listen to

this presentation the

main topic of the presentation is transforming

your organization's from data-driven

to inside driven what

does that really mean it seems that most

of the large Enterprises today

have some kind of a data management

strategy architecture

platform plan the

date of the integrated

they are they model the data they collect

the data and analyze the data with

all sorts of Bri and the voting

and data visualization plan and

solution so they are getting lots

and lots of signals from the from

the data but are they getting

actionable inside of

these inside being transformed

2 things about business outcomes

that's really the next level of

maturity in what we

today call inside driven

by businesses without a mind

just too clearly differentiate between

all the generation just simply

data-driven capabilities and

next-generation what we now

call inside driven capabilities

with forest and I would like to use

the term systems of inside why

did we choose that turn while because systems

of record system or automation

are all the terms familiar

to you too we filed that system

inside was was a great term to

talk about inside driven

businesses the reason

that this is so infinitely

important is this fine

in the next couple of flights on

what is being predicted out

there is that for the next few years

inside drilling companies are

going to outpace competition.

Going to outpace competition

by 8 to 10 times now

that's not a to 10%

for 8 to 10 times faster

than competition and what that

really means is that those

companies that are going to be inside

driven award that are inside

driven today I going to take about

1.8 trillion away

from their known inside

driven competitors that's a huge

chunk of change and today and

for the last couple of years was absolutely I've

been seeing a

lot of proof points a lot of our

clients that I have already been reporting

benefits from their bi

Investments be you ingest

intangible qualitative


such as better decision-making

or better transparency

that they are increasing marginal

profitability they are increasing their

sales or or

some other top or bottom line

improvements based on behind and

your income statement

benefits are not the only ones playing

a significant role here balance

sheet in terms of improving

your work capsule utilization

improving your information

Back Bay status and reducing

inventory based on

working capital optimization are

some of the other benefits are the benefits

are definitely all over the tangible

benefits of the term

tangible for the rest of this presentation

are all over the place but but unfortunately

this is not as easy as

it seems and I'm sure most of you

on the phone have lift through something

like this was received on the slide

dye yes a lot of outside drowning

and data and still

alive what are starving

for youth size we've been hearing this for the

last 10 years so what is it

that has really changed at that

should be questioned and that's precisely what

we will address in the rest of

this presentation so lots and lots

of challenges are still out there just

under half of all clients are still

not realizing any kind

of quantitative a benefits

for my business intelligence and those

that do I

just found a half of them are taking more

than a year to realize those

benefits and that's not quick

enough we'll talk about that in

a couple of slides at one of the kid

challenges that we continue

to see in this market is

continuing disconnect between business

i l i t this is

not about who's right and who's wrong this is

just a realization that a

business and technology Professionals

for the right reasons I'm this

is not that criticizing anyone

for the right way they do have

some with conflicting priorities because


a a single bi

platform in an Enterprise

creating streamlined architecture

supporting a bi centrally

and in hopes of obtaining

that single version of the truth is

not a trivial process

it's not a trivial Africa. Steak

time it did it it does carry course

and it's not something that is

plug-and-play and

therefore sometimes these things do

take time and therefore when

business is our business partners

our business counterpart just want to give

their jobs done quickly and efficiently

and effectively and

Eve AIT professionals

don't realize and don't understand

that getting the job

done get getting the business job done is

the first priority. That's

where we start getting this just going

to act very interesting

a couple of data points on this side of the next

slide on the one hand

and I'm sure you know or all

of you know this very well the amount

of data that we all

store and processes Annalise

is growing by Leaps and Bounds

so what this is telling you on the right

side of the screen is at the number

of companies with hundred terabytes

of data doubled last

year but the window on

that day that I really hasn't changed my

cousin matter of fact the self-reported

numbers of about 1/3

or for a data unstructured

data and the hopal structured

data being a used for

inside San analytics and

decision-making is not a very

realistic number because guess what you

don't know what you don't know the way our

clients tell us we we think

we use about 50% of our truck

today yeah you may be using 50%

of everything that you know but there

was launched out there in your transaction

operational a structured

and unstructured data sources internal

and external a yield at your

partner that you would

just not aware off

of also very interesting

to note that we we

tend to get higher numbers

for my tea and much

lower numbers and how much closer to the

right side of this picture for a

call from the business users and

I know this last night I promise

before we get to the good part of the presentation

and I'm sure this is near and

dear to everyone's heart so I'm

sure everyone on the phone still

experiences there's that majority of

you are bi analytical

insights applications on North

done in enterprise-grade bi

platforms that still the Madea

Shadow i t type or types

of applications how

can you close the disconnect

between business and I T how can

we address and analyze

most of our data that that

that we have and how to can we finally

start getting reads off a spreadsheet

the key

point to make here before

we get into any kind of discussion on

machine learning or artificial

intelligence of Big Data all

the terms that have been

used for

the last few years and I've become very popular very

few people talking

about the fact that today we are in the

age of a customer this is this

this concept is much more important than

big data on machine learning or anything else

what the age of a customer means

is that most of the Enterprises

have to run and operate for

what we call Outside Inn

in other words more than consumers

more than citizens if you are public

sector agency do not really

care Northern Illinois should they

care about how you run your internal

processes they don't really care about

your internal Finance

risk management supply chain or

any other processes they have

lots of options all the consumers are

empowered with mobile phones

and with Cloud access to all

of your competitors products and services

literally with a single click

of a button they can make a switch and

therefore unless you as a

business you as an organization or

prepare to do everything that's

in your power to follow the customers

you're going to fall behind and

you are not going to take

advantage of the of the modern

Global customer

driven the condom yourself I just

introduced this term Edge agility

being Edge and being flexible

and responses to customer

needs is really the key to success

by Big Data by

Machine learning or other

Technologies oldest very important point

between YouTube Understanding the

brace at this is really the

key business capability that's going to

allow you to to be successful and

when your customers

away from competition so with

this in mind real I hadn't realized it's a few

years ago with forces created this profile

how we measure business for Julius and

hopefully all the attributes of

business agility that you see here on

the right are self-explanatory obviously

if your channels are integrated

if you are more agile if

you can handle change management

in an efficient

manner you're more Edge out if your

infrastructure is elastic and

can grow and Shrink depending

on the customer demand and customer

requirements you're going to be on

edge and I'm sure you're probably suspecting

what I'm going to throw you on this ridiculous

fight yes indeed there is

a direct correlation if we found a couple of

years ago and we still going to still track that

correlation between higher

performers on the right side of this picture

remember High performers at

those companies are there I demonstrated

a few slides back that grow faster

than competition that don't grow

faster than industry averages

and as you can see they are all

over what we called formidable

category yes they are aware

that this is agility is

a key capability and

executing and they are

executed well that's versus on the left

side of this picture lower performers

those that pulled behind industry

averages those who fall behind

competition are all over

what we call the coolest Gorey they

don't know that this is important that

are not executing well or they

are aware but not executing

and therefore we called him paralyzed or

they are doing something about it without

really understand what they're doing and we call

them a dangerous sub old to see

you want to be in the formidable category

so that that's

the point that you wanted to make your now what

is it that we i t

of the Big Data create

and deploy edge of business model

12 we can practice Edge

big data

analytics interchangeably

as you can see this is not

just about as your software

development yes absolutely.

Concepts rapid prototypes

extremely important but

it's not just that it's a gel organizations

finding that middle ground between

organizational silos

and overly centralizing

to four or I'm inside

sand analytics because overly

centralizing something a creates

bureaucratic structure with

lowest offer steering and planning

committees committees and approval level

and the arguing about prioritization

and therefore overly

centralizing support for Bri

off and creates lots of bottleneck

it's also processes

were recently published a report

on BI Governors bi

governance is very different from data

governance B I got one this is all about monitoring

and understanding what is it that users

are doing in there a bi an

analytical Ascend boxes

and data lakes and 8th of March and

then selectively hardening

or production lising than the Mormon

but what we really want to talk about in

the next 30-40 minutes

or Edge RBI platform

because earlier Generations

relational databases or earlier

Generations equal only bi

tools they

can support

Mission critical environment

but remember that number that

I shared with you if you find a

girl that you think your processing

about 50% of your data but

I know that you are really managing

to process and analyze no more than

20% of your data so clearly

different type

of big data and Edge of technology

is needed to

support this newer

generation requirements so was that

a no longer advicate

a a simplistic

a day of architecture where

I'm sure you you remember how

we used to draw the side with what we

used to call layer cake architecture where

the boredom you had your data

sources of data integration them in the

middle you had a data warehouse translator

Martin of the top of that picture you had a v

i II and the idealistic

Nicholas Thompson was that all

of the data at some point is going

to end up in the data warehouse well

guess what 10 20 years

later somewhere between 20

to 50% is where we are

ending up today so how do we finally

a break for that very hard

we finally start processing

and analyzing more than 20%

of the data while they are in

different treatments for

different data layers nights

are not old at I have to

be in a data warehouse not

all day and I have to be over single

version of the truth because if I speak

to you who are in the finance

organization of you are a businesses

he has two plus two always has

two equal to 4 even when

it takes you know a few days

and the few long of

a batches to calculate that because

books of the company really

need to reconcile so you have no options

but that's really a relatively small

percentage of all of your and the price data

if I'm speaking to you in

the CMO or a VP

of sales are you

wake up in the morning and you read

The Wall Street Journal and you realize that

your competition just lowered prices

introduce the new product

acquired a new company and you want

to get out of that company that

preemptive complain to your customers

and products today alright

I figuring out customer segmentation

for that razor sharp

razor Focus. Campaign

is North where

you need that single version of the truth Gordon

update where two plus two

equals 3.9 or 4.10

is it is good enough so

good enough but time we dated. Time

we dated for this particular you stay

strong obviously

I to describe two hands

of the extreme but there are there are different layers

in here so take a look

at your requirements

and tolerances for Layton

for data quality

for tolerance for risk etcetera

etcetera and then and figure out who

is going to be accessing the data in

a day late versus a data warehouse because

they are late where you

really want to impress store

close to 200% of your

data there is no way

that you can create and govern

hundred percent of your day in your those

those organizations that attempt to do that

basically they take their

data warehouse they renamed it doesn't

they don't like and then the same

challenges are happening there they like

environment but when you start treating your

little league differently are you

you govern it but you gotta run

there two more likely you

allow only

qualified data scientist

and power analyst will understand

what they do I access to the

data Lake and as you go obvious

that pyramid yes you are

now applying more for the

older generation best practices

in terms of Scyther governance

and this is where you are single

version of the truth is and this is where you can

open up access to your data hop

and data warehouse to all of the candle

users with

tightly controlled ask

you well and the other

types of doing it doesn't have to be a

three layers of your

organization may be more complex you

may need to create the Amor later

alright it could be just a

filing system of the bottom some

kind of flexible on

the edge of camaron read

type of SQL going

up the stack and then as

we go have the stack we we we tighten

the control

increase the governors and we

open up access to more casual

users because this is where we have more

control Society on this is this is

this is the best practices we see this is

the way you can start breaking

through that barrier or analyzing

just 20% of the data interesting

is that very

well and I do have quite a few clients who

are beginning to architect

they are bi environment for processing

all of the data Technologies

orange really set up to

a two handle that DS all of

the older generation

or earlier generation even a lot of courage

in Eurasian pediatric neurologist can access

data in battle a credit

doesn't matter whether they could do about a W

O Lakes

these are early

Generation VI Technologies is still

sitting outside over the

data lakes are in other words with still

bringing the data to D

I write and when you bring data

to be on what you're doing is that you are

moving data in and

out of class to sell all of that scalability

inside the class they're

basically now run through

this single in a bottle

necklace single choke point

of jdbc or

odbc connector you are

moving a lot of data Crossroads

points no matter

how you store data

in your data Lake Lobster the

bi tools still can only

access it via SQL

so even though they

delayed scam can work with unstructured

data sources in schema schema

on Reed's Bud to buy

by the time you architect everything

that you see here you are still doing chemo

Wright and Amanda structured

SQL any

method is used

inside the outside

of the store not really terrible

example of any kind of

a multi-dimensional operation

so if you have a say

some kind of a relational all app

engine sitting outside of

the outside of the question and

you're issuing one query and

then using the second query and the only

thing that you're changing is one

of the dimensions right so if

you weren't doing this in a in

a in the class if you were doing this in

a you know dimensionally aware

relational database the relational

or engine would be smart enough to

know that I don't really need to

bring that results at a game

right I don't need to reactivate you the

complete query again I just need

to adjust the aquarium with a new

dimension but all of that is lost

inside and outside of the class because

the only thing that passes back and forth

is SQL not really in

any kind of a dimensional awareness

else out when you run the environment

not only are using

methadone but you are not

really you you think you are linearly

scalable AWS

so I could do for you but you're really not

your queries are distributed

10 linear scalable but they did the rest

of the vi environment does

not to post

more and more of their components

are not just the the actual

career is in the data is

Syria sitting in the inside the class

too but more and more with bi component

Symantec where it's a tree

is being pushed down into the

class their self when you do that you

are not moving data in and out of

the class they're the reason all that extra

when I land traffic

there is no such thing as jdbc-odbc

because the data

is naturally as

any other day and everything

else and

you're not limited

to really sequel sequel

you can analyze any any kind of files

and all this with data and metadata

and now

in the same place this is what

we called bringing VI today

at all right as opposed to Bringing data to

the ice if you remember the previous find

a lot of the components that were outside

of that daughters line with pushed

down into a into the class

there so this was this is one of

the version of this type of a technology

you still run some of

the components on an edge

now so there is a little

bit old for a single

threading here and what

we really want to see is he not 100%

complete distributed


the only thing that the edge now this

is doing is rendering but

all of the last 2 months eclairs

and cubes and you're in the Seas

York where is everything is pulled

down to individual data

so that it is indeed

100% distributed and

at 100% scalable turn

data architecture this is the view of

this this is the one of the approaches

to start analyzing in

the driving in size from all

of your data structured

and unstructured not just 20%

but 100% and

do that at a high level of

scalability Alex

alternate start with u

thank you very much for us at this point

in time I'm going to introduce our next speaker

Alex couto senior product marketing

manager at Cloudera

awesome thank you Stephen thank you Boris

so as mentioned

and what forest was peeking around is

there's a lot of aspects to consider when

you look at a child the eye

and how to get more value in an inside

from your data and one of which will be touching

on during this part is around me the

technology side of things a lot of times this

Challenge on agility can actually

be due to so

many of these limitations that we look

at it and many of them should be familiar to

to talk on the phone with

existing infrastructure with

this limitation on resources

and prioritization at 2

to ensure that the reports

in the SLA is that you're supporting

today can continue to run it

difficult to start to bring on new

user group more data more

types of reports and use cases

I there's always a risk having

to bring those in and so

it can either be a very lengthy process

or just a straight

limitation or cut off

on on what date of can actually

be analyzed and who has access and what

they can do with it and a lot

of this I can be linked to the cost

of these systems so it can be very costly

expense to go in scale-out the system

possibly include some down time there

and it's pretty difficult

to start to justify some of those expenses

if it's just to be able to better

support the the workload that are running

today versus also looking

at how to bring in Newark loads one

of the things that we see with our clients

quite a bit is I

one way to address this while

you may have kind of an Enterprise data warehouse

with a very specific set of reports

and use cases there's

this proliferation of different data silos

that has popped out so there's some multiple

data copies be moved throughout the

organization and these different data

Marts are dressing specific departmental

needs use case of specific

users can

also be pretty difficult to maintain

across a large Enterprise

it ends up being a very lengthy

process as you try and join

data together as you can open up access

to them and maintaining all

those date of copies itself as in a lot

of in efficiencies there and

then the other limitations that

we seen in particular for enabling a lot

more at self-service is

this shift away from some

of the Pre-K and reports and even being

able to get into more of this self-service provisioning


better empowering I your end

users to dig and discover new

insights and oftentimes iCloud

can be a great way to enable that's and

give folks dedicated resources there as

well and

so when we look at a modern

pop armor modern opposed to tackle

that is of course being able

to provide the same performance

and concurrency and end

support for his people skills and

mvi tool that is a huge necessity

here but they're the True Value really goes in Breaking

Beyond just see the

use cases that you're already supporting into

a being able to have more data flexibility

so very easily being able to

land any and all types of data

being able to consolidate data from these

marks or different day two sources and

not having the model up front I'm so

lot of those initial benefits of either

the data like that we tend to speak around it

so being able to have data in its raw form and

then that data for

new use cases I knew question

that hand side in the state

of remains open so

it's never locked into any proprietary

formats or what not and the advantage here

is as you start to consolidate Thursday

all of this data get stored together

and multiple

different user groups have access to it so

that same I dated at the available

for your bi for your reporting is

also available for your date engineering teams

to be able to to process

order or run ETL jobs over

it it's available for your data science

teams you can easily operationalize any

reports or application from it as well

and then of course being

able to you cost effectively as

scale out at these

systems without having to make it a

major maintenance operations so you're not having

to make any trade offs or questions

as to which data to be stored or which workloads

reports that we were on them

and then finally having this flexibility

to really be able to leverage

all of these modern benefits in

whatever architecture deployment area

that that you choose bid in on Prime

environments I be at natively

against any object store such as ice 3

or Microsoft a DLS or a hybrid

combination of those

and so what would this looks like is

against time to consolidate a

lot of these different at datamark space

that you would start to see within

your existing environment

where you would have a a modern

platform that can support a wide range

of these use cases that that borders was talking

around one of those the benefits of having this

modern platform is being able

to have more of a logical architecture

to support some of these cases

that do have those varying degrees of governance

so not just being able to

support Tennessee data science in the exploration

but also being able to as

you go to open it up to a

regular I reporting or

more right

GIF applications

as well and also

still integrate with a lot of these Enterprise

data warehouses that we see where

you might want to push out some subset

of this data to the e d w i

to to support again some of

those higher play some more heavily modeled reporting

needs the

one we look at what are some of the steps to get

there cuz of course the technology piece

is only one aspect of it though technology

piece opens up this potential ability

to really I get more value out of

your day tomorrow this inside

striven a model but it's also

a round reorienting rethinking

what your team looks like and

these five steps here are taken from

with us working with

it a number of our customers as to how we get

them at to be successful there and

what we mean by each of these is

for building a data-driven culture just

means to shifting to being able to

empower your end users to let

them be able to discover what data is

a value and be able to ask

me questions be able to to dig in

interact with any of the reports that

you have to

be able to discover what date of maybe a value

and then create as you go there

in terms of a building

to the right teams and skills this

is a really evaluating what

you have in your existing teams

and and companies you

know there's always a lot of talk around data scientist

and how difficult they may be to Define

but a lot of times did the most valuable folks

are one sitting within your organization making

sure there's data Engineers especially are

part of this development and

those folks I have deep knowledge of

the state I can often be trained

I'm more into the data science realm as well

as I mentioned

quite a bit and when the big things here

is you don't need to over architect

at 4 for Perfection you don't need to wait

for everything to be all set

in stone start small

take your first use case get that to success

give folks excited drive that adoption and

then adding more use cases from

there and

then I definitely in terms of how we

look at these use cases making sure that

you're actually blinking them to production

value as well I'm so it

can be pretty hard to type value to

just having an a Sandbox environment

or what not into

making sure you're thinking of the end business value

and then finally when we look at being

able to to right-size data governance as

far as mentioned there's going to be very in degrees

of governance needed for each of these different

applications but that doesn't

necessarily need to be a limited a platform

itself so make sure that as you

look at Solutions and governance practices you

have a way to make it governance a two-way

street so that you can get you there driven

governance as well as highly

curated governance at the same time and

then I'll just end up with a quick

look at at Cutters platform

for machine learning in analytics optimized

for the clouds it's really takes this idea

of bringing data together and

if haired storage player opening it

up to multiple different users and

types of use cases be at analytics

data science operational

I use cases or data engineering

and then also ensuring that trip

and workloads not only has access to the same

Fair data but the same shared

data experience the same data catalog

security policies and governance to

ensure that you can really provide the full breadth

of data access and inside without

any a risk to the business or

without any I did

a copy user and efficiencies within the pot

farmer and

with that I'll pass it back over to Steven

and Steve thank

you very much Alex at this point in time I'd

like to introduce Steve willage vice

president for marketing at Arcadia data

great thanks thanks for the contact Sports

and Alex what I'd like to do in the next 10 minutes and

I will have some time for Q and A's talk about

the value that you can get from

the data legs but

with the attitude bility of

a visualization tool

or a bi platform that can really take

advantage and not only surface

that information that is

part of that 80% that may not

have been utilized already within the Enterprise but

also Grant so much larger audience Beyond

just for this scientist

not going in the market for about 18

years of work for companies like teradata business

objects are there to do companies and

what we've seen or the past 10 years is data

and platforms of chains right they always

talked about velocity variety

of date and how that's changed in the need to have multi

structured data get more access to unstructured

data the platforms like fighter

I have enabled a lot of that by

supporting multiple storage engine

search NFC enabling

people to do schema on read

as well as he gone right the

ability to do transformation within

the platform or even a l

d t which is Discovery before

you figure out what you want to transform and

build structure around for analysis so

really lot of organizations are

fine everything need to have both of data warehouse

and a data Lakes really Mabel was jelly

I'm all that's happened it really hasn't

been a lot of innovation bi-layer a

lot of excitement initially has been around data

science machine learning all the great

things with it does but there still an untapped

need I

need it's not being served to really an able

more the front line users and organizations

to also get value from is

data likes nuts really what Arcadia data

was designed to do and what

we're seeing is that within organizations

they're creating to bi standards one

for the warehouse that is optimized

for all that relation

of Olaf and things that happened in the two-tier

way based on the architecture Roosevelt

the time venality scale-out

🙂 architectures and

people can at least leave her refer to these as

data likes you've got a new opportunity in

terms of how you can plug DIY

and Alex and visualization into that

type of architecture and really Naval news types

of use cases Sony double click

on that a little bit you think about the data

warehouse and relational databases in particular

the reason why there's a two-tier

bi architecture is because you could never really install

the software on

a relational database it's been heavily

optimize to work with the

hardware I saw that that one of

my former companies are you get tremendous resource

utilization because these boxes

were expensive at the time and

really needed to sign the software in a way that


in all that resource for the database itself so

of course the BS ever existed on

a separate here and these scale

up nicely but they don't scale

out quite as well and more importantly

from Alex process then you've got to

optimize your physical

data storage mechanisms

your semantically rescue failure securing

that data you loading I didn't you're doing

it in two separate locations right

one some of the data warehouse defining

connections and I'm doing it again at

the bi server and then when you talk about Big Data

if you want native connections to

things like solar

index for semi-structured data to

handle parallel processing in

the naval real-time insights by definition

if you're moving data from one

system to the next Thurs latency there so

you're missing out on opportunities for

real-time insights as well on the set of the system

so that that really hasn't

worked in the reason why an architecture

that is truly scale-out Lake

Arcadia is to Naval

are these things so I can you date it was built from inception

to run natively with in

data lakes and what I mean by that is if

you think about the open source movement in

a lot of the openness that's been traded

from the way sufferance

to Felts but the ability to plug in

different processing engines in

the descale a storage architectures we taking

advantage of that we let our suffer run

directly on the data

knows where the day that exists and it's

not just the query processing like Boris alluded

to but it's also all the knowledge about

how data is stored locally how

you can create better performance

schemas and international models

to take advantage of that and scale-out very

very linear way

so that's really what would we do and

because of that you don't have to

optimize performance in multiple

locations you're not moving data in the multiple

tiers you don't have to secure it in multiple ways as

an example we can inherit security

directly from Apache

Century which is a security

project for them to her as an example so

that the administrator doesn't have to configure in

a separate place to just inherits how is defined

at the federal level it's

a and similarly we can

handle things in real time does data is it

snowing in is automatically available

being able to connect to

Modern systems like to do it center allow

us to take advantage of some of these different

abilities that are out there the

other thing to Arcadia has done to really flip

the idea of Olaf on its

head is to enable what we call Smart

acceleration one of the challenges that we

seen with Legacy VI

and middleware applications is that you

want a building cubes in advance

based on what you think the business

requirements are and that be a

lot of planning and set up time

not to mention you can lock yourself into

what dimensions and Views people want to have

in the reality is we can

allow people to query data

and granular Satan really do Discovery

in a much more agile way to look for insights

and what questions should be asked and

we enable through machine learning and artificial

intelligence we monitor those

Cleary patterns what day does me an Access

and we recommend to the administrator smarter

ways to aggregate

store cash and physically optimize

that data in the cluster stored

directly back on the same story

here so the next time the screws

come in there's a cosplay stops in this Asian

decision that's done and

how to speed up performance in this is really the last mile

of getting value from a data Lake where you want

it to play at the hundreds of thousands of users

and a customer-facing situation

and we do it it really based

again on the actual use is not needed

to build it all in advance and again this back

to the divorce is final

that's tremendous impact of how

quickly you can get to Insight example

of this one of our large retail

cpg companies was trying to

lock down or knock

down the silos across a bunch of different friends

from product marketing services

in shipment within their organization they really wanted

to improve at what points

on a geography basis as well

as digital media perspective how do we

influence have to purchase the

different products that we sell and if we

run a digital ad lets saying you're up how's

it impacting sales in different countries

and different zip codes within those countries

at cetera and they

chose to do this on a date of Lake architecture

happen to be Cloudera in this case and

now they're supporting hundreds of brand I'm just giving

them direct access and self-service visual

analytics across all the different

components of these marketing campaigns and

programs and what they say is

that you took them 3 years to find

a toilet really allows self service i

t having to go back and pull out other extracts

loaded into a BS service here and

then see if that answered the question for

the the business analyst you

can of course then be able to drill down quick

snapshot it was something like that would look like I'll

give a live down here in a second another

interesting area for this company who by the way said

they identified a billion dollars

but billion with a be of

instrumental value from their data

Lake by working with a business to identify

different areas where they could save expense

and find the revenue opportunities one of very interesting

use case was around supply chain optimization we're

used to be a six to eight

month project to bring in a consultancy

to map out all the ship points

and products and Freight rates

at cetera for the different

routes from manufacturers to wholesalers to

retailers and it was a six-month

process do this so if

you're trying to do what if analysis and

rates are changing and delivery

mechanisms potentially change right based

on those rates then it's hard to do in

iterate more quickly so they implemented away

visually to do that with the

Sankey diagram and path analysis

to figure out those Rasta Market in a much

more iterative fashion says pretty

fascinating know what can be done when you got

all the data in a visual way to explore

it and this is being done by business

analyst not Engineers or righty


so the other part part I would say that's really interesting

as if if you're looking at

modern-day two platforms if

you're looking at a dead awake and if you treat

it just like another database you're

going to fall into the same trap of

does date of pipelines have been created and

I don't have time to go to this whole lot of detail

but if you think about treating a data

Lake like just like a data warehouse you're

going to wind up Landing securing that day to give

me some physical transformation of that and

then its efforts here building your Samantha

Claire doing the performance

optimization and moving the data into that

to your and then you can start to do antelopes

Discovery right so that can be days and weeks

and yeah I worked at

organizations where to add

a new dimension to this model

could take 6 months 12

months of million dollars of cost and that's

not a joke so it's like once more

around the Sun in a year

before you can go back and do not like this cover

on this new dimension that you wanted to add in

there so it just becomes

a slow process where

is if you take the date awake and you enable

that agility that bores talked about it

much faster way you can really

shrink down that time the value and days

so you can query unstructured data if

you doing Discovery before

you do performance modeling your model

it after the fact what you figured out what needs to

be modeled and optimize and that's really the approach

that we've taken with how we've been formatted RBI

software with in the daylight and

again it's one security model no movement of

data Etc

and you take that Alec Discovery process

from Step 6 down step 3 I

think you kind of get that point and we got a lot of

customers that we work with a

lot of these are with hot air in a lot of different application

areas obviously customer intelligence and

inside as one big area Financial Services

as big telecommunications iot

analytics is really interesting but the

example I wanted to give was around cybersecurity

so with that I'm going to try and

share my screen here

and pull up a live application so

this you know don't try this at home but

we'll do a live demo here this is

Arcadia data running on a server

in our in our office we built

a demo with Cloud era around a

project called Apache spot and

what a patchy spot does it's the community-driven

approach and open

source project to fighting cyber

security threats and it provides

a open data model or

way to store data about all the different threat

intelligence and points etc

for your organization as

well as machine learning algorithms to help

identify suspicious

activity so I can just providing the front

end to this and if you look at what we felt

this is an executive summary of you and

then I want to come to go to all this in detail but this

is taking the machine learning

outfits and visualizing the top threats

across users and

points and that works and is it security

analyst being able to

see this all in a single-pane-of-glass and

do your forensic

analysis to record within one system is

huge because typically your your swivel

chair analytics moved from one system to another trying

to look at threat intelligence in the bluecoat proxy things

and then you know your active directory

and now you can look directly at all Network

traffic in the organization you

can get a timeline view of what's happening over

time we've got a network graph

to look in bubble up a couple

dozen suspicious activities

and we might want to drill into an account

on the bottom left for using machine learning to

Bubble Up specific threats

that have been identified by Source IP

but this is interactive I can go

ahead and grab a Time slider and focus

down my right

now said I want to look at potentially if

I wanted to look at this specific

IP address I can just click on that it's

going to pull up more information about that IP

address and we can look up and that Warcraft all

the different endpoints that this this IP

address is connected to an end drill into detail

by going down here and

again because it's a security data

like if you were we got all the data from the organization

in one place not just 20% a hundred

percent and they're bringing and data from

outside sources potentially as well so yeah

not a lot of time but you could different

domains here and in this case Russia

that is bubbled up at suspicious you can go

down and look at what was creation dates

for these different things that were

happening all the details right there so I

can drill and get a lot of detail going to different

users are connected to etc

etc so a very quick

flyby and will provide links to other videos

and demos after the fact that

I just wanted to share what this kind of can

look like from an end-user perspective and again

this is not a data science

workbench were taking the

great work that day to sign Joseph done we just

bring it into it intuitive interface

to and do spread analysis

again from a cyber-security perspective

so that'll turn it back over I

think we were going to put

some slides and can wrap things up for questions

I was just just one

last thing here on this light is there

is some research the divorce is done around

what at the Times called native

to Duke bi so this gets into more specifics

about these are distributed bi architectures

the differences over traditional we got

some other demos we built with this

one's around the connected-car Fleet

Management if you want to get started with Arcadia instant

weight of a completely free download you can get installed

on your desktop sample data sets

and you can start to make

your time and thanks for joining us today now would love to

take some question

thank you very much Steve we're going

to move into questions from our viewers

today and our first question is for Boris

Boris I already have other bi

tools why do I need to consider Arcadia

data so

I think it's the part of the exactly

the same a conversation we had

earlier is that the fees if you are

okay with looking at just the

20-30 40% of your data

and if you have to forming a

structured data analysis and

other words that you are operating in

a schema on right environment

where everything is already

predetermined fix etcetera then

probably experiencing

white wide

area network experiencing

data data

replication definitely

a plane to off environments

where that is really all you need but

I think once you start getting into

a terabyte of data they want

to start getting into multiple

day what types and much much more importantly once

you start getting into environments

where you can't really wait

even for a few weeks for

your relational database

administrator to

change of value in a

column to create a new joint

with a primary foreign

key 50 if your environment calls

for more edge of

much more responsive type

of environment where you are dressed as your requirements

within hours as opposed to days

a week so I think that that's when you really

start to I need to look

elsewhere and

therefore would we really are seeing

today is that. Probably no one

out there in Barnesville

Enterprise capitalization

actually has a single bi tool

live happily ever after with without

abijah colleges

yeah I mean one

point that I was talking about is some of the large

organizations we work with are or

Tuesday multiple standards one standard

for the day like in one for the day warehouse It's

not that I want to

replace all the good work they've done on the data warehouse

but they want to open up some the different mechanisms

on top of modern-day to platforms

like pot era so that's

what we're starting to see is he's kind

of neutrons and standards merging

understood our next

question is for Alex Alex how do

you balance the needs for data governance as you

shift to more Self Service access

and Analytics

yeah happy to speak today so like

I mentioned one of the things that I think around

data governance is

being able to have it become to this

two-way street when you start moving into

a modern platform where

you have more data coming in at

different speeds as well so you may have stayed

a landing in real time you may have data coming in

and batch updates and

all of that data may not have

initially a known value

or youth case right out of the gate

on but some of that data as we spoke

to four as supporting some

of your production needs will definitely

kind of be supporting you ain't no news case

going into it and

so I being able to have

this almost decentralized


being able to have a deposit

form as well as DD process

and a people to be able to

Stuart and treat the date

of four sets of known you four

sets of reports maybe

regular dashboards that are going out to Executive

teams being able to trust that that

the data being used there is accurate

and then being

able to have the data also be

open for immediate access for

the users to not necessarily having to go

through that very linear accuration

aspect I'm the one that benefits

was ridiculous is

a government policies and

metadata management can be added

as you go so as you see

data being used More Often by

different teams as you see regular tables

are call instead of being accessed

and be put into new dashboards

you can actually adding governance policies

As you move to the platform and

even open this up to have your

end users participate in this a

collaborative governance it as we call

it so and users can add their

own discoverability their own dye

packing their own stewardship to data

they can better work across teams

in the one of the big

areas is it is to really break free

from a lot of is linear mindset

that really works for kind of a set number

of use cases but

you don't want that to be a limiting factor as you

started to open up broader access so

having a platform in a meditative management

plan that can address all

of those understood

thanks Alex Steve our

next question is for you or

is artificial intelligence and machine learning

leverage in big data analytics how

can bi tools leverage this

yeah that's really needs to be part

of what I showed him the demonstration was how you

can visualize the output from machine

learning and artificial intelligence

like within the Apache spot project I

said the other thing that's doesn't get a lot of

press is that a lot of Technology

like Arcadia Gator using machine

learning within the product itself so

I talked about the smart acceleration technology which

using machine learning to analyze

and recommend different ways to speed

up the queries of the data on an ongoing basis

one thing I didn't get a chance to shows

we also use that intelligence

to make it easier for the end-users

we can actually do what we call instant visual

so you can look at it at Ascend based on the

dimensions that you selected our

system will actually recommend the

best visualization technique and you'll see a

pallet of like 6 or 9 different visual

types displayed on the screen with

your actual data and he's are based

on rules and learning from

them on how people are

selecting visuals in and what are the best visuals

to use so we're absolutely to accelerate

that and make it easier for people to analyze

these big data sets

understood thanks Steve for

respect to you how do I sell a

bi project / platform

to the business sponsors I

guess probably overly simplistic on

says you should get another job because

if you are working for a foreign

organization where business Executives

on seeing the value of this and

then there are plenty of others went

where they do but the kitty kidding

aside and obviously

applies across different type

of technology is not know just about

this particular type of Technology but

I am a huge fan of

Rapid the proof-of-concept

price of Steve talked about talked

about that

you can download them the immediately

reported to action so that that's precisely

what I would recommend build

build a day

or a couple of days to build

see if your proof

of concept can provide an indication

of what the tangible value

of wood with tangible

outcomes you you're going to a

support such as for example if

a customer care what say you are struggling

with custom return you don't understand the rules

cause you a

bit of a root cause analysis

and you you find that would cause you could potentially

predict that X

percent of your clients

are going to stay with you or come back to you and

that that's a gold mine

or over the opportunity to again take

for your business Executives and say okay

can you find this project to

to scale the South

understood that's actually

all the time we have four questions today

we apologize that we weren't able to get to all

your questions but as I stated earlier all

questions will be answered via email

I'd like to thank our speakers

today Boris Edelson vice president and principal

Analyst at farster Consulting

Alex guto senior product

marketing manager at Cloudera and

Steve will advise president of marketing

at Arcadia data if

you would like to review this presentation or send it to

a colleague you can use the same URL

that used for today's live event

it will be archived and you'll

receive an email tomorrow once the archive is posted

now as we stated

earlier Just for participating in today's

event someone would win a $100

American Express gift card and

the winner today is Chris Webber

Chris we will be in touch via email

so you can claim your prize thank

you everyone for joining us today and we hope to see you

again soon