Unlocking the Power of Your Data Lake

Webinar Aired: May 31, 2018

You have a data lake — now it’s time to unlock its power. Register for the upcoming webinar “Unlocking the Power of the Data Lake” to learn how.

As Hadoop adoption in the enterprise continues to grow, so does commitment to the data lake strategy. Two-thirds of Database Trends and Applications readers are either implementing data lake projects this year or researching and evaluating solutions. Data security, governance, integration, and analytics have all been identified as critical success factors for data lake deployments.

To educate this growing audience about the enabling technologies and best practices for unlocking the power of the data lake, Database Trends and Applications is hosting a special roundtable webinar.


welcome to today's Round

Table webcast brought you by hbr Arcadia

data and procona I'm

Steven fake directory database Trends in applications

and use her I will be your host

for today's broadcast a

presentation today is titled unlocking

the power of the data Lake before

we begin and when it's when they can be a part of this broadcast

there will be a question-and-answer session

if you have a question during the

presentation just type it into the question box

provided and click on the submit button

one like if you work today will win a

$100 American Express gift card the

winner will be announced at the end of the event to

stay tuned to see if it's you

now can you use our speakers for

today we

are at

Arcadia data and marketing

manager at Strathcona.

thank you very much Stephen

and everybody like

integration because

we work with a lot of customers and we see


LX have a variety of volume

because I think those are if you

like table fake challenges everybody

will will need to address those and

will be addressing goes beyond

the capability volumes


typically used as we just

got a date tonight with our clients

and and an opportunity diver

steps of data I

think Define criteria of a database

is a data needs to be entering

the day for leg with minimal agencies

cuz that's what we see us as well

as one of the two challenges none

of your face today that

I give it 30 days I think

I might support mobile UK this weekend

um across marketing machine operation

a government it's all over her

legs support many users gauge with Jake

and legs are crossing the Street Copperas

Cove there across everything there departmental

use cases May in fact be using

a scale

a database make a green prom or at are

they on a cloud-based equivalent

to like

using a file system as a

destination and

I think that's even the more dominant

more flexibility Bible

talking about large diverse at the

data after all so

from that perspective but

we argue even more


it's available

as a service pay-as-you-go scalable

relatively easy to

do a valuation

side what

does it look like

we see the variety of sources

write a 4th for your

day so that you may see a substance abuse

or two but if we look across

the board across the many use cases

that we support we can loss

of use cases that are

bringing a variety of

data along the lines of what's going

on this picture so we see XML

file data feed

fancy coming from external

sources can

be so far they can deal with manufacturing

industrial kind of day so I could be

sensitive from your coffee

machine could be tons of that I built

into modern automobile

streaming data coming in from

a variety of sources

it's call it a geographical

location based coming

in multiple datapoint all feeding into

the day to make more

traditional sources but

in some cases it is some

of the most important data organizations

are trying to organize an organization.

Trying to analyze because some

of the core processing

some of their operation


databases in application


Supply Chain management

how would we look across those different

data sources and we look at what kind

of day to feed into that

comes in from the various

systems are

usually fairly Alpha Theta

coming out of various


on existing data

points does not happening

on Twitter

feeds you name the night all new

datapoint everything is out of there.

You think about some of that traditional

sources weather

is the Erp use are the traditional

databases day

for Lakes relational

database Technologies in

order to update or delete rows

for those kinds

of we're

going to have to be able to manage that we

going to have to be able to deal with that in

order to make today for like eight come

in from the other sources as well

let's talk about some of the

key data integration

challenges into the

data light one

is security Accord

with the day today containing

a consolidation

of sources

in some cases it's a

it's a gold mine for for

somebody using an

alarm to turn main Off Lease in

order to guarantee trust

the data

turn off live into some of the details


I mentioned that they

did they feel like me to resolve some

of those traditional run

a happy process

on that floor to you somehow

retrieve the information you

want to do continuous don't

miss any changes you don't have to

rely on what

happened to your dad

do you want to get those changes and

you want to get the transactional changes as

well as he is

something that's related

to the trust in the day,

but it is very important I think

we underestimate sum of the value that we

get the traditional application

but if you think about it long

does an example we



we are all the lines but we didn't have a

messy water record

traditional application the

database will ensure the

transaction ality of your

day and you're

not going to call

because those those

systems those traditional systems

are often very heavily loaded

Talon security ride

we're talking about building a date tonight

with creating a

gold mine for data where

Austin preaching the data across

a wide area network and

in many cases I think at least 4 out

of 5 cases we see things

like being built in the cloud

when lost well

we better encrypt the data as

it moves across the wire right where

we're going to use SSL encryption

to make sure that data is

not exposed when it goes

across the wire but we also want to

make sure it

needs to be encrypted you to take

advantage of the Key Management

Service that I provide an

encrypted data when it's there

and also as it arrives

there Animal Hospital

of you not me you also want

to ensure that there is very strong

authentication in the system

so that it's not

easy to break into it so

using certificate using

key is a way to ensure

that kind of security


the success of your data

like that

may or may not happen you

need to make sure and you're building a community

whomever is going to

rely on going

to tractor data

traditional application onto

a file system that day

so it looks very different it's no longer like

how it came out of Oracle it's just

on my file system so having a

date I can pass solution and

give that piece of mind can give that confidence

that the day that is in

fact the correct that we

can try and at times we can run

our analytics or a date at

Discovery against today to

make the

graduation continuous speed and

can you use encryption certificate

and finally and two

other the end-user is going to trust

the data show

that Steven I was going to hand

it back over to you thank

you very much Mark I would like to introduce our

next speaker today products

and solutions at Arcadia data

information agility information-sharing

got in Rich Man being able

to get a complete picture of your business

data use

the right tool for the job and


warehouses emerged as the types

of staph look up for

certain large-scale analytics in data

warehouses so the story is not single

standard for class

of Technology standards

that help you address your specific needs


for the job and Technology

what is well as

we like Define

remote computer. It was created by


the environment of a


variety of real world situations we

have real-time requirements larger volumes

of data and Main sources including

those that are alongside

the data so you get a lot

of improvements

take a note from the bi

tool can .300

right well

performs well with

only a small

set of data or or vice versa where

you have large volume data

but only a piece of Accuser


lot of work

that's required in the back and so a

lot of data

and finally the truth about

the guilty there really a Jillian name

only and they're not about self service across

the board and all the time self-service

is interpreted as the ability

to pay the

one that

you might consider so is


awake and scratch out of the PIN

to your dedicated server to the four I

got towards the end users and


governance I

can approach is about big

data for the big data by architecture


work with this

is an ongoing and time-consuming process.


framework pictures

from different angles

I can be a bit funky

you will have to

go to smash

where and then physically model performance

purposes what's

the data warehouse

tell if we looking for the Nativity I

forgot Electric in

the platform from

your system

know if your ice creams with data

warehouses Bend you might assume that the additive

process to someone else's problem

but modeling

your truck is because

what are the 12th

and then


and discover all

the information you need after a

year and a million-dollar

contractor to

the got an error code for

the discovering plant bugs that are

next to each other by three

Heroes uc.com


steps Bonita

be our coach allowed to the scale at both of you current

volume level and

Compass satisfied on

your data Discovery activity Santa Margarita

Lake in space on YouTube

give me a quick example

for the star Graham represent in

Native bi architecture


with data blending capabilities where

the other

modern and traditional nautical

users data

directly and the ability

to store modeling

remodeling is



the winner deployed over many users

lighten the overload in the system



work like a bad person, and then acceleration

engine finally

a white

out like the field and what are some recommendations to

be successful with them or there's

a video on a data warehouse optimization

to give me some ideas of what can be offloaded

from a data warehouse on to a big

data platform and welcome to download

free software available

thank you very much to yell at this time

I'd like to introduce our final speaker

today Rick galba product marketing

manager at percona

thank you and thanks Dylan

Mark as well company

make it called into an organization

to help out really

seeing value out of

it and the first thing I want to do

is draw a comparison between Warehouse


that the concept of a day Lake


to drive from Mesquite

cuz you got that very

protective and

harder to figure out how to get


read and

when we go to actually access that

Jada imposed

some of this organization on that

data and start

out with the best intentions with a Droid

wallet and to

locate any specific piece of information


getting right the

current the most accurate information out

of it I

have in here that might be more current

more accurate so is


is much

more open ended in terms of the

way you're putting the data in than what you

would find in your traditional data warehouse we're

obviously you know it's going into this record

is going to go with

the day like this still a necessity for

having some for the organization structure


daylight is

the necessity for some sort of organization


want to have this don't

want to have a case where your day like this

you be looking at relational

and non-relational data normal


like teradata or Oracle or

something based

on the structure that already exist

but now I have

all of this non-relational data Internet

of things that

we can have all of that information coming in

and being made accessible and

that's the next week is being made

accessible you

may have non-relational data coming

in it still needs to be able to

be located in found in

and put in places where it is going to

be acceptable

come from the analytical

side and in again with me what they all

talk about with the bi cool looking at

a variety of different ways that we could

go about accessing that data retail

product that is very very much a

key piece of it I


to work at opening it out to some one of them


this is the idea of the bi-fold evolving

because when

we understand a

lot of how dirty

I told they're going to allow me

access even

need to keep it shut

down because we could be in danger of failure you

want to take the action and want to cut that puts

down right away from

two months ago 4 months ago when

they look to just consolidate that out

and keep that Jada and in some

ways that is usable and accessible but

not in the


of the scalability in the

world of the cloud so having

the ability to scale up as needed to

accommodate what's coming is your day like is

really cute the other piece that's really

nice the cloud providers

normally have a good an

actress that dated and how it's going to be obviously

you have lower cost storage in the clouds

and then in a lot of cases


the doesn't

work just get rid of it and

you don't have the real risk the real

involvement of going out

having a piece of Hardware configured

for you and and having all of that

additional work done so really

seeing the cloud providers

coming to the Forefront with getting

involved with cloud

is on the one

other piece here is

does the coffin of moving from

awake and

again I can



if you have the date of it's just totally

unusable manner that

baggage is Campbell really well


item of


lake on

your hand we get to this point we're just kind

of mixing all the the luggage

in my knowledge it together now

we're moving into the Shelby.

Christmas time

State dump luggage out into

the terminal building and you had

to go through and you have to find your

luggage now that's

not organized easily accessible

we're looking

to avoid and one

of the big pieces of getting

your day of having your legs moving that

they feel like they can't trust a

couple of reasons that users

may feel like they can't trust the data






ways to


it available to a lot of your users

but also highly trusted

and this comes back to the

security to come back to make

you certain you

are looking at all of the change data

looking at all of this information so

that you and your users have

the sense that the Jada is really truly

the current

time and most up-to-date information

are you stuck is

it going to be in probably

not that's

the preferred West what

you need to do you

might need to put some new structures in the place

and for a while


going to call it an

actual but

it is a gradual process overnight


the swamp land obviously

that's the time to look around get

some help get out of it so

that you can avoid the bad situation it's

kind of already happened and the


something that

is is really not in a great and

sounds take me

to all of the dork in

the success of the talks

about some of these things out

there thank

you very much Rick at this point we're going to

moving to questions more viewers today

and the first question is

one that I think all three of you can weigh in on

and why don't we start with Mark will

data Lakes be replacing

Hadoop in the future

well that's a that's

a great question Stephen


often chosen

as a technology that

is hosting the day

tonight but I think it's a

way of managing day for us to do

is a technology technology

that is very efficient what

is it going to replace artist

is the data like going to replace

the do I say no I

do visit technology a day should I give the

youth gave the different

things understood

mark would you like to win and

in fact you know over the years what will

see other Technologies jumping in

it for that data Lake infrastructure stores

that all together in creating this

is Dad awake

understood into a wreck

your thoughts that

one way of accessing data through

the day like with that but I don't think it's going to

be a full replacement

understood our next question

is for Mark Mark do you support

data masking and tokenization

great question

the cords

that we talked about the day today and think

back about what what is the it's

going to be a defining criteria

of the day to Lake we

say we store data in its raw

form after can

I see the picture it's not something

something to

use the

ritual basis we done sometime tonight

Jason the

context of a day to Lake


modern Christian where we see there

is the need for security and ensuring

this is no no data

access understood

thanks Mark

yeah so if you have something

by to do for your using objects

doors as far as that that data

Lake environment then so

you'll have large files stored

in the stores

in the Forum at like

parquet or or which are are

really good and analytics

the other the call Nur their their crap

and so you know you can park

in and work a deal ideal

ways of storing your dad I would like

for Analytics

understood reconnect

questions for you is the cloud the best

place for your dad awake

I think the cloud is an

excellent spot for a day late because


is that ability to expand as needed


a lot of the house spiders are

really working to make their likes

much more accessible and available

to understood

thanks Rick Mark we're going back

to you how would you mean team practice

transactional consistency in a data

Lake yeah

that's a great question Steven

and that is quite frankly you're going

back to what the

date that looks like wood how

to date a kind of behave in

our world and we want to make sure

that we we're appliqued that

day or that kind of

behavior into the electrical know

when the

technology can see the

transactional boundaries on the floor then

that's where I go back to change

data capture on my

chain management CRM

kind of resources

utilizing and

then orchestrating

the publication that

we can still maintain the transactional

consistency minutes

or every few seconds.

How we can help maintain

understood thanks Mark Arnett

questions for dale dale said there be

a separate data link for business intelligence

analytics versus a machine-learning data

Lake really

I mean different

types of antelope your

bi tool and so many

different user group

understood Thanks Dale Rick

or next question is for you are there practical

size limitations that you could consider

for your day awake

that is another one

of the challenges your

credit limit what goes into the lake and and

that is definitely something that you're

going to want to avoid be

a problem because that can lead to people

again if I go back to my friends


that's different than an unrestrained passion

so alone but

not alone keeping

the indexing and structure

of the data so that it is always acceptable


massive massive volumes of data

understood thanks Rick back

to you Mark Mark with the technology choice

of a file system like a CSS

how would you compare date in potato

Lake that

is a great question to ask

a question often so

the answer to that question is that

we going to represent

the date to a

technology like 5 and

how it happens to be available on the

file system that we commonly used for

days are like that includes HD essentially

flag data difference

between source and destination

candidata like we're at as I mentioned

with the norm additive sources

that are required to complete power

but because it's high because it runs

in the cannibal

and yeah

we can we can still run

comparison even though the

destination is a file system even though

we keep changes

on the fire system we can still do

that comparison with the current state

of the solar system great question

understood thanks Mark for

you what's the learning curve for adding a new

bi tool to the stack


are existing. You

if you wanted to play getaway can get

a lot of value for you expect much out of

way cuz then you know people

but overall

under good deal for

you what are some of the common data like

you use cases that you see I must clients


area being another thing where

workout data from multiple

sources and looking

to compare

data from different sources together

and disallow potato

like allows for the flexibility of

accepting the data from different sources but

still bringing it all together into

one place where I can really analyze

and and looked at it and compare one

against the other one of the big

lake kind of the replacement for big data


I think big data is just

another component of the data

like there's going to be certainly

need for that

concept of big data and

allowed us to start accepting more

than one common place

understood wreck all


questions but as I stated earlier all questions

will be answered via email I'd

like to thank our speaker today Mark Bonneville

she's technology officer hbr Judith

presentation or send it to a colleague

please use the same a URL that

use for today's live event it will be archived

and you'll receive an email tomorrow when the

archivist post it if

you would like a PDF of the presentation you can

click on your resource icon on the console

now as we stated earlier Just

for participating in today's event someone

would win $100 American Express gift

card via

email so you can claim your prize thank

you everyone for joining us today and we hope to see

you again soon safe

Web cast