Webinar Panel

Data Management for Big Data, Hadoop, and Data Lakes

Big data is evolving rapidly. From ballooning in sheer size to the diversification of sources and structures, there's a multitude of things we all need to make adjustments for in how we manage data.

In this TDWI webinar, four data experts weigh-in on the evolution of data and the implications for its management and business use. 

Gain valuable insight on:

  • How data itself is evolving in size, source, structure and speed
  • How data management is changing with revised practices, new tools and platforms
  • How all of this impacts business use in terms of analytics, monitoring and transformation


hello this is spell approximate DWI

and a welcome to a t DWI

webinar panel today

we're going to be talking to a

panel of Experts of data management

data and

data Lake and also

was first of all

I want to thank you for finding time in your

busy schedule to join us and

also we do want to hear your questions

and we're going to answer as many as we can at the end

of the webinar so as

we proceed to the webinar please

look in your browser-based user interface

and find a place for entering a question

that way your questions will be up

and ready for us to answer later now

for those of you please

use our calls.

On Twitter that would be hashtag


it will be archived noon available for

playback for those of you who registered

for the webinar today we

will send you an email that includes a

link to the archive so feel

free to replay the webinar

set alarm today

as I said panel

and on our panel today we

have experts from for software firm

each representative advancement

including uses involving Big Data

Hadoop and data lights and that's what

we're going to talk about today so to

get ready for that I'm going to lay out some basic


introduce the panel members and

we're going to walk to be three groups

of questions first of

all how is data itself involving


of sources in structure and so forth

number to we'll talk about how


of us are revising our practices and

adopting you tools and platform

as we diversify

into more analytics business

monitoring and business transformation and

then finally will get to all of your

questions that you submitted through your browser-based

user interface well

let me just set a real quick staying here as

I mentioned there's a lot of evolution unstructured

to semi-structured to fully

structured even relational sources and


from Vehicles

you know like frogs are real cars in freight

companies and so you're getting a lot more so

we all need to make some adjustments to

the way we manage data and tell data management

practice and to all the fart balding

accordingly so we have

applied as we access

stator for the first time and that's very different

from what we get we done and data warehousing for

years which is there

a quarter new platform people

themselves are getting more and more Savvy about

how to use data for some type of

organizational advantage or other forms

of business value of a

lot of your using the new data to extend reviews

of customers cases

as we go quick

word about Hadoop you know part of

my job every

question says Run 3 times in 2012

14 and 16 in 2016

we found at 20% of

data warehouse programs surveyed

had his clusters

in production

so it's

all fun and could you about their regularly

used primarily to Extended

warehouses and her provide

go to storage and analytic

computing power for analytic processing

data legs data

archiving a marketing data

specialist multi-channel marketing operation

day Toronto can

be physically located on Primacy

Zeno behind the firewall in your Enterprise or

I could be cloud-based or a combination of

both so weird things to do in clouds

let me say a few words about the daylight


of the day like it's not a platform you

would buy from a vendor or get through open

store something like that instead

the data like is a method

for organizing a large volumes

of data that

you design data based in the past

right you can't just start a database the

Guardians and all kinds of things like that for

the day like help to create two structures

for a check

combining and collecting data in

a very large way today

because dinner's

at this mean different file tax or

relationship database or

bow you know I think a lot of people dying

in the plug at a lake on a relational database or

a combination and of course you

know any of those doors

is a metronome among those

include customer channels and touch points

and I'm

not placing the lake in warehousing multi-channel

marketing right

again I think a lot of people just need your

reaction today like new messages

from a wide range of sources of modern

and traditional data

and maybe

thinking okay that sounds good but more

deeply adapting canoe and a lake

and a part of it is to

have a way to capture new data sources

and the voice

the prospect of having new inside

a new

well but

there's also a lot of your

360 degree views of

customers other facilities as well

Apartments Exedra until

you can do correlations mostly across

this more Diversified data calculations

that customer segmentation accuracy

for the recording

analytic especially visualization with

it so as you think about how you going to


some kind of Self Service access service

you have very data may be exactly

real-time data and if you

want to have management dashboards and that

doesn't have metrics with some managers

need refreshed regularly throughout

the hotel

for sale that's one way to capture that data

so that you can be more frequent management

driven delivery product


get to the real me to the matter here while we're

going to have some conversations

with us today so

I'm going to go right

into the question so I will

hear from today will be Steve willing she's

vice president of marketing at Arcadia data

we also have Mark vandewiele Chief

technology officer from a tree

terribly and nurtured Waterpark

who is Chief product

officer at Texana okay

today today

I'm going to get people

to feel

free to pick a spot and dive in the

Steve take it away

hi everyone just

a quick background I've been in this market for

about 15 years that different

bi vendors like business objects

Thunders and currently

in Arcadia data and what I've really seen

in the past 15 years is that dated

really becoming more dynamic in

terms of how we Define it and that's

causing organizations to struggle with these rigid

or static date of pipelines that we can

place for a long time where you got to change

it into the Steam app for a report

or something within the day Rob could literally take

6 months or million dollars of cost

and this is a real figure from

my head on a customer Advisory

board company at

the same time we have these new formats like

Jason which allows self-description

to be present the data and they're

really designed for Machinery readability

at large scales

cannot make these rigid structure

and at the same time it needs to be able to survive

inside from that perspective

from streaming

analytics and set-top boxes and an advertising

concepts of security and

machine laws that are out there trade to balance

internet services on and at

the same time where we live in It's On Demand who world

where people want access to the data

in real time format

to talk about and I really

see that the applications in the future is business

intelligence of that we need to be able to read

that stuff in real time and allow

people to explore it so example

application for example customers

and Social Security and what they're trying to do is have

real-time incident response alerts go

off but then give their security

analyst the ability to drill down into

across all the endpoints the networks

in the users to see who's connected to

that incident and triage it and take you


Fleet managers let's say that

want to track drivers

was saying a truck week

and they want to be alerted to an incident and then

to look

at the aggression patterns

of those type of those drivers overtime

or what's the weather like in a different

the road

service yeah

thanks Philip then thank

everyone for listening

into today's webinar a little

bit about my background I've

been two decades in the industry

by now and basically

been in the in the database management

Arena as well as in

the real-time data movement space


I see some of the same Friends the

TV thing we absolutely see

the demand for real-time data

that's of course when we get involved from

cremation cost effective if

I just quickly go through

the list of questions and what changes do

I see in the beta we have

to let me see a significant increase

in volume some of that is driven

by a coyote but some of that is also

driven by unlocking data

in Legacy systems as well we we see

eternity in our customer

base that they want to just simply add

more and call data sources have

been in have been around and have been running for

many years we see volume

both from a transaction processing

the fact David as as well as for me,

tell Dana volume defective I

think what was driving to change is

simply that by now the technologies

have evolved to the point

that customer

start to see how they can get

the value out of the day that

if there's no there's no longer technological

limitations for today that would artificially

limit the volume of data that needs

to be collected or that would artificially

limit the

the real-time aspect of the day that

we now have to know if he's available that

even though some of those sources

are extremely busy we can actually

capturing and manage the data at

scale at volume and

we can start combining that to drive

this is about you so what

can we expect from today to the Future from

that angle I think we're we're only

scratching the surface we've

got several customers at

HR who are building out their day to Lakeland

well the one thing that we have

to let me see for all of those

customers is that they continue to make changes

to the environment of

course they add sources they add different

data type they decide

that we haven't seen before but they're

also changing Technologies and

all of that said that

would you already said it's

like we see that happening on

promises we see that happening in

the cloud that we see our client

shifting from one deployment model

to the other week it's it's really all

the trade it all over the map in

the one constant is really changed here so

I expect to see a lot more of this

going forward thank you


yeah thank you I feel so I

absolutely agree with Steven

Mark Mark said about


in the collection and

aggregation of multiple data sources in for

our kind of like the same

I mean they haven't changed in the course

of time I mean in the

end of course you know the new

world Concepts and

I'm talking about techniques like the flirting and being

applied in dating site but


has been since kind of time

Pacific methodologies have been developed

to answer questions then apparently

we basically deal with Adam Sandler

local data so that

kind of limits Us in terms of the

data Trends So

my answer will only be

in regards to animal cool


what are we eating

is bad

the sources of data are getting Much More

Diapers. Kind of like what Mark

mentioned before being

more and more companies even smaller ones are utilizing

over a dozen different sources together

the data replace the ability to

gather data into

one plate with which makes

the ability to gather data into

one place organized


heart attack that were saying

is that smaller and smaller companies are

gathering more and more stores

are costly going down revolutionize

the space but the thing is

that most of the day that is still

kind of garbage you going to head up there they're collecting

meeting for

extracting Valium and most of it will probably

never be used but they're still

holding it away which is trading town

accessing the date of birth

actually matters so it's


easier everyday

two it's cheaper to store data storage

everyday and especially new company everybody

wants to be data-driven so there

are a growing number of date of organization

and we believe that we will

continue to grow and

the woman from on premise what they think Mark mentioned

earlier at

least multi-vendor are

yeah that's good examples as

well thank you thanks

thanks very much Philip and you

know I'll just briefly introduce myself I

am the chief product officer of Pat's

Auto and like the other distinguished

analyst I've been in the bi and analytics

page or too long I


think what we're seeing is not

being data driven but being data disillusioned

so I think that the the

date of the big

lot of data that is now available is

forcing us into becoming a hoarders and

so there is there's

no question that was what the fox said

before the absolutely true which is that we're

getting data from

call nupur idea sources you know we

started off in the relational World

pulling lots of data from the transaxle application

and then you know social media

machine data are

coming from different places but the fact

is that what we're staying with our customers

is that they have lots

of data. The problem

has shifted because of

the economies of scale that they

did like architecture is provided what

are there really struggling is

actually turning that data into something that's

actually valuable so a lot

of our customers are telling us that the problem

is not getting the data in the first place we

now have solutions to that it's actually

making that date of valuable and

useful when it's portable or

for the challenges that that we're going

through and I think you know what's

it worth driving that change is that ultimately

being able to collect and store data

is a commodity right you

can continue to buy and if you look at the prices

Microsoft and and the cloud

vendors are driving from a storage perspective

you're seeing that the you know the limit

is not in how much data you can store

right but be but

the fact is that for the

rate that the storage is

going up we are certainly not seeing

a commensurate rate in business value

that's being being

driven from these folks so we have

customers like some of the largest banks

in the world where they they have

gone all-in on the data Lake architecture

your harvesting data from you literally

hundreds or thousands of sore but

the fact is that all of that harvesting hasn't

led to to be able to

exploit that information because the

data isn't clean contacts

it's not really usable for their

perspective so what

we do when we believe very strongly

is that what we're going to expect from data on

the future is that

how to turn into you know we're going to have to use

Smart in automation to

make data used for an exploitable

instead of just focusing on

collecting it and I think that's the next

order of capability that will

start to drive value of our customers and

it's certainly something that we attack Tata

are pioneering in the bucket second

group of question and I are

so much better than they were in the past and so

we can we can let go of a

lot of the crap that we did

in the past do crap on the Fly and

I need

to make as users in our best practices

and also in our portfolios

of data platform leverage

not just take care of the new day

new possibilities in the hardware

and software for real

yeah Great quest guide great

question Philip and so and

also I actually wanted to go

back to 9 checkouts, it's

about organization

simply talking to hoard

data more than 10 driving

business value out of it I do

think that tended to my earlier comment

did the day 2 Lego architectures

that we're seeing that the only call sideris changes

stuff that's really what's going on here

there is a lot of this data

being collected and in

an effort to drive the business

value dude there's an ongoing realization

the changes need to be made in

order to to drive that business

and a friend that's a fact if we

see technology changes and

that's where we see some

environment we see limitations if

the most traditional relational

Technologies and we see a lot

more organizations

a lot more of our customers shifting

to file system

from a management effective again

to nunchucks earlier point that

the sum of that is it simply driven by a

cough argument but

it's also I think from a from a volume

and scale perspective but

they said it's the change that

were sitting in this environment and at

the same time as well as customers

adopting does the new data

platform we see that there are

at least in the early phases


the traditional turn signal

processing applications because the date that

window systems is unlike

IRT date at which kinds

of kind of comes in as a stream of date

as a set of measurements and

in many cases in contrast

to that kind of date other traditional transactional

application process in traffic update

and delete

as well and organizations

initially really struggling

to make sense out of the date and

also to to the earlier point

that were made about data quality is

just the challenge to validate that date

and make sure that well data

quality is of course one aspect it

is what the quality of the date on

the stores and to the early appointed

by data warehousing cleansing gate at

that dress but it's not just the

point of cleansing the date and the master

data management aspect of it shifted

the kids challenges dealing with

well is the date

on the destination the same as

in the store so what types

of tools and platforms are they turning to hdfs

of coral to do one of them until

after you said that in your of the survey

results but also S3 and

some of the other class

what thank you Mark

and gentlemen I need to ask you to be

more concise in your answers

yes I'm searching for the changes

that uses are making tutor data

there's definitely more awareness of being data-driven

are there for the way that is stored

in Production service today is much more oriented towards

their data architecture than it

was the same a decade ago I'm

also because I mentioned

in the past in the earlier

your other question now you

take that awareness and turn it into the data

are out of there still a great Gap


their proprietary data infrastructure it's

really cool because of Technologies

are awesome and become

faster in and more towns

are being solved in the cooler way

I'd like to call Jesus. I just

mentioned what they don't realize is

that profession itself and

its we're

definitely seeing more and more companies especially the

smaller and checkered one I'm

turning into a self-managed


optimizing passport in fact I'm

in the first in a survey that we public on

our site professionals

and I've seen similar surveys also

in Gardner's latest latest report was

by far the highest challenge come cab companies

are facing today

so just like just like Enterprises

are just now realizing the power of the cloud and there

is a huge wave of companies migraine to

Cloud infrastructure or at least some sort

of hybrid infrastructure the

companies that are more in the Forefront of

data Technologies are now beginning

to offload these challenges to

self optimizing self-managed platform

and this is just for

the sake of focusing

and cost optimization

okay thank

you very much happy

to answer that. So the first

thing that we have to ask is who is

the user who is part of of

the data management practices I

think we've seen the last decade or

so that there is for the inexorable push

toward democratizing information

across you know not just the

data scientist Community not just

the developer and engineer moving

beyond toward see the

average business user inside of an organization

frankly most of

those folks have been using

tools like Excel and access I'm sure

many of you are familiar with that writing

vlookups and macros

and pivot able to be

able to manage their data but

not really having tools or capabilities

to do this kind of work and so

the big change that we see happening

is that on the one hand the

end users have really adopted Self

Service tools like Tableau and click

and others

light on the other hand their IP colleagues

are working with data management infrastructure that's

allowing for the persistence and management

of poly structured data Json


their haven't been tools and capabilities

that allow you to bridge the gap between

those too especially for

the business business people and so

what we're seeing and certainly the

things that we're driving in the market

looking opportunity

to provide a self-service

mechanism or

groups of people can actually leveraged

the data collected and

give them the capability yeah

I have to agree with you those are all excellent

points and I mentioned

it earlier to the audience

wider range of users going to want to access

if we see a lot of marketers that's the

hotspot I think right now Texas

Santa to a portfolio

definitely I've got a lot of what

then she was also saying just

now it says that did it to my current

position of the access to the data

and the challenge of angry faces if you will get the

new type of structure

tools as I was pulling

out don't have them really well in a lot of questions

to even ask how can

we allow them discover and explore and

I think the fact

that up to one is data wakes it

it just doesn't scale as it's pointed

at our house or at the aggregate

their data to pull it out so you can I lose all the

granularity infidelity that data so there's

a whole new generation of Technologies I

like to call native or data native bi

tools are visual analytics that allow you to run

processing as

a promise directly by the day that we

former self and that gives people incredibly

fast access to the data as long as

they can access it and it's for the native format

like I was talking about earlier with Jason all the types

of things but other

things like store in Texas

or real-time sources like how can

others see cells

which is available now because we nor

Open Standards that sings like a pro

excellent and now I'm glad you

brought up a data exploration and

Discovery just

to get a grip on what the new data is


and also what's the technical condition

of it from a steamer or a

day to call you before I head

a requirement gift

over and I know we touched on business value

but here's where we really dig into business value

and this for the rubber hits the road

it's it's time-consuming expensive to


a dead man's going to grab the new data

but I think it's worth

right so I'm waiting

on the answer to the question

like this is something

you know it and everybody

can read about this online you know how for

turned a $1,200 lawsuit

to a prophet less than three years while

using data science and Kimberly-Clark know

they're massive massive amount of times they

have like over a billion clients and any different

countries do the dozens of ranz

and sewing across like multiple mobile Social

Web you now

that utilize Big Data everything from customers

retail level 2 inventory to improve

stock forecast in Target retail

company being one of

the largest employers are data

scientist in the United States and

that's just the tip of the iceberg it's

a known fact today that 80% of data

collected is not being utilized for


exact value now that doesn't

mean that doesn't

mean that there's

actually 80% more


not being other

still there there's a big

chunk of data that you

can still utilize for checking more value

in the

volume 80%

of their time on prepping the data and

infrastructure no more


a platform offload

all those from the inside

this will be able to five

times more efficient in expected value

yeah bring good example thank you for

some Industries specific examples to

Sofia the First

the first thing is you know how do we how

do we quantify the value right now and I I

like I'm from New Jersey but I like to keep things

really looking for them

call so you're keeping them out of jail so

you're tired

of those things right approach

of folks and and Joan

right before me refer to this 80%

of the time that people stand in analytics projects

he's actually on the date of preparation process

right and so if you can

find new ways to actually be able to

allow people to interactively

transform data and basically

flipped the equation right what if we could spend

80% of our time on the value

generation instead of scrubbing and cleaning and

all the other things we enjoyed you so

what we have seen is still

open that is some very dramatic

return on the ability to work with

with big data on one of our largest customers

and financial services

they had over a billion dollars

in fines over the last 5 years do

the Regulatory Compliance challenges and

every additional regulatory report

because of the date of preparation process took

them 22 days to implement

but going with a self-service

approach we're actually business users

the ones who had the domain knowledge or able

to prep the data themselves.

We were

able to take that 22 day process and

turn it into a one-day process so

that's an example on the compliance by on

the customer side we're

seeing some of our customers were looking at

initiative bicycle view of the customer

and again they're pulling data from

multiple different data sources including web

logs including the classic

call center information and

the light and the ability

to take that data and very rapidly

Consolidated the duplicated

aggregated to make it useful for analytics

allows them to make very quick

decisions about that just no offers I

want to provide those customers additional

ways that they can increase the customer satisfaction

Etc so in

summary I think that whether it's

you know reducing costs or increasing

valve at work keeping people

in compliance that

there are we know we have seen the

power of Self Service Information Management

to really drive that value

across all three of those dimensions in Arctic

sea Plus customer which

is that just because date

is coming up from the store from

data governance

think about that from the beginning is near death

in New Theater from new sources and

likewise I talked to Steve

earlier about did exploration Discovery


data profiling


it also when your profile in

your life and

for a lot of people Enterprise

standard your

turn sure

I'm just going to send it to this hundreds

of use cases across all kinds of Industries

I think that examples

of Procter

& Gamble who's been a customer of Arcadia

dated for a while and they had a top-down

initiatives to really look at how to leverage

this new date of the top there from

social media weather that

over 25 different data sets so they created

a data Lake and they wanted to provide

the product managers at P&G

ways to analyze over 600

Brands globally to measure the

impact of marketing campaigns weather

in other things that drive sales

and velocity of a product

through the Retailer's and use that as a service

back to the Retailer's to help them to

her plan with inventory supply chain

manufacturing products to

improve that process of raw so we

stayed awake to

combine all that information and the

problem they have that

there was no Taurus tool

which were talking about earlier that could

allow them to really give the the product

manager the way to drill down to detail and go

from country to stay the city to individual

stores and look at velocity of products

in those type of things until they are able to run

the analytic directly inside the data Lake where

I can access all that scale at once so

it's that native approach that was

seeing the been really

in power the business to do

something that's

one example of a quick one

voiding fines and regulations

North Texas things ever communicate combining

communication channels like chat logs text

messages and actually trade

to assess whether their compliance with regulations

and quite literally trying

to reconstruct the view of the world at

the time that a certain trait is executed

by a traitor in this allows them to assess

whether there was any kind of a violation

catching rectifier for

the regulatory authorities get involved in this a

couple of things are great

examples of you know what's Procter &

Gamble they were one of the first companies

to really Define

modern marketing and the really

glad to see the Procter & Gamble still

being very Innovative with marketing

collecting using

Data Warehouse environments but the number to environment

was what I'm now calling the

marketing it alike and again

big companies with lots of multi-channel data

are standing up to your legs

and I'm glad to hear people

stand up and talk about their Cloud implementation

I had

to get to work with some of the right

yes I did I

think that from your example that

I can get to that I can

bring to the table here to

have indeed been a lot of examples already

but we've been working with us Global manufacturer

and I remember

some of the early conversations for a 5

years ago when the discussion was more about okay

we got all this i o t data

from the equipment manufacturing

how can we start this day and

this was for 5 years ago that

problem as long been sold and

sends them a lot of the operational

data has been added to the system

Global manufacturer

has started in you offering

nothing like the UK's

we heard from Steve about Procter & Gamble

where some of their customers are

not purchasing

youth services from the organization to

make sense of the an Olympics has the

equipment today purchase from

this organization now that

is predominantly

let's go places with lots of analysis

happening but the next phase

of this day to Lake in this day that collection

analytics Anastasia's to feed

all that information back into operation

because if you are in global manufacturing

and you've got all these this equipment out there you

can start helping your customers be more

efficient make sure you prevent

outages in you you do preventative

maintenance if I can buy some

of that Temple generated ate at with

the traditional Erp data

to optimize your man manufacturing

properties to be able to

increase the app times as your equipment

soap without generating

more value for your customers so

I think really what was seeing

is Big Daddy. How

is that going to transform the business on

the one hand is going to improve customer

satisfaction is going to create

Can U versus opportunities as well just

going to be lines of offering some organizations

that are driven by the data

collection in the services driven

from that so that I think whether to guy

yeah excellent

excellent and I were

talking about the internet of things I

think we all

know it's come

out of hype pretty early in

fact mainly the industrial

side not too much the consumer side so what

are members of those who have

different kind of logistics places

where there's more sensors and machine decks on

vehicles rail cars shipping

pallets and Silver mobile

devices alright thank you Mark

for that will folks

don't be in the audience say thank

you for sending it to questions and let's get going here

and then shot I think I took a

good question for you here so what's up with you

so I have a question


says I would like to know what

type of tools do you use to recommend

to keep track of metadata data

quality and business rules

I know I'm happy to answer that

but I think there are the

first thing you have to understand in that process is

what types of data and

what types of people want to be

able to work with that metadata

that information there

are I think a number of very good tools

on the market from

the it side of the

house which handle metadata or

pulling information from

various different forces stitching them together

being able to work with the are diagrams and

what not but I think

the broader

trend is that those those capabilities

need to be able to Broad out be

brought out not only to the ITT but

to the business and so I

think not only some of the other folks

on the other vendors as well but

impact sawdust case we

actually have meditated

minute management facilities as

part of a broader system

that integrates state of koala

data profiling data integration

excetera really depends

on a dedicated capability

that I want is kind of a hub or is it

something that I want to embed as part of a

broader Suite of information

yeah let me get your take on something

then you know what

I'm hearing from a lot of our members I

think there's two things people really need for Self

Service besides the


and I think they need to just the right in for use

the tools in usual tools that are highest

views I

think it's absolutely critical Philip you

know the the way that we looked at the

market there are capabilities that

you need for understanding what data

that you have right and knowing

that the data sources

these are the different

data structures that are inside. And

then there's the ability to actually manipulate that

that data and I think that

the only way

that we can get value from data is to

provide as much contact as far as

possible when you're actually working with

that data and so we

have started working with a number of the catalog

and vendors actually embed

the business glossary

type information after the

date of transformation so that the end-users

actually have the context of what Does the

steel mean what are the rules around it

what are the store system to come from and

they can leverage that knowledge to

thin sculpting cremation into something useful

for an analytic use case for example

that was a question

for Carlos and carpet is asking what

is Hadoop or spark

trend for companies to use Python

what's the difference card

yeah I think what's interesting about to do cuz

there's a lot of headlines

out there about his

lost it or

whatever but if you look at the data service

I was the one from September

of last year from Gardner there's actually 72%

of organization from the survey

had either already or

replying Implement within the next 6 months or

something like that so absolutely

the most popular diary

places in many ways the Mac reduce

jobs with a lot of people running

on at Duke which will still be in place

for another bathtub to work clothes

but different

types of data pipeline not just

I think still by far the most popular

storage platform if you can't park


particular drizzle and library

for ANSI standards

yeah I know. Good question so I think

there's lots of us equal

variance and two of them projects out there

as I think the most popular in order if I remember

correctly are tithes

and Paula Sparks

equals right in their Apache drills really

close together still

and always others out there and I think level

if you want big jobs

very reliable Tremor

causes a great engine for that it's not

just Korean but also do transformation

some cases that are

cute Adidas feeling but

then there's Impala does similar

to drill in that it leverages memory and it's

faster to Mormon interactive analytics

and I say Sparks equal

made some of that but my recent

memory I don't think Sparks equals quite as Materia

and Sons of all the answer you variance

and you know specific functions

are supported somebody

said once it takes you 7 years to really develop

a strong database and all the way

around the sequel optimizers

no such a thing Soto things

out there I think it depends on

yeah I got those are good details thank

you for that Mark Mark

vanderweele from HDR software I

got a question for you here here we

go Trina dimitrina

ask where do you fit

data scientist in to let

me try that again into

what you're talking about

yeah that's it that's

a great question and I think there's just two

ass back to to

to address that question on

the one hand I think that the data

science really comes after the data collection

to once the data is

available for a in

a central data store where there

is the ability to start combining today

that with other what day is it from other sources

whether it's a coyote whether it's traditional

applications whether it's a legacy

system that was brought into

the day to like that absolutely We're

alone for the day to sign this weather is

today that volume available

at scale but then at the same time

we see at the month

for data science

on streaming data is well

aware we're really driving towards

an environment

where the

the latency between today's

originating and then I have

access to it from a day to science affected

with we're driving down the latency

2-0 that as much as we

can so that's where there's

a bit of a trade-off and I think it really depends

on the UK that really depends to

a certain degree on the industry but

at the end of the day it comes down to what's

the business value and whether

the data science fit in from

the perspective as to where

where are the five month

okay excellent great question we

have time for a one more question here


so Aaron date

of preparation to all and give

him the name of your company what

should you look for in a new

data exploring

it and also we

look for a good date of preparation School

yeah so it's

a good question because in

a play White stupid question because the

call philosophy around

date of preparation

pools in the

philosophies that you know he apparently we

generally don't generally

do not I

believe it even

though I think that I know

six out of here answer

that question probably better and

you know clap for is a great tool and so on.

Many many grade schools but

I'll tell you a bit about about

our date of preparation in

the way we see if


it begins with

the extraction

extraction the process


the production servers are great guys do whatever

to your data

infrastructure today

that you got to take care of you

know build your make

sure your data is stored in the

can you until

the actual extractor the date of where

the optimization query-optimization Manor

we think that the main

challenge there is that the

actual extractor of the value being

the date of Sciences


not the stakeholder in

the date of preparation in the date of

preparation faces date of preparation phase the actual

stakeholder is the data

engineer or DBA or whatever

what do you have

what these two parts are differentiated

that makes that

the between makes

a big problem in terms of your scale building

up your infrastructure and still believe you're positive

that's where we ready

fractal analysis so

we solve

this challenge with receiver

and that's why inside panoply there's

no data preparation basically

understand the data understand

the logic and then do all the data

preparation for you that

being said there are a lot of great tools

for building

process process we

mention stones in pounds is a great one work

with our partners at Stitch returns which are

great date of preparation pools

Newmar another great

tool there are many many great pools inside

the space for dinner preparation specially

around the EPL EPL team processes

process if

your company really wants to scale and

in 3 4 5 years from now I

don't think anybody will be doing it the same way they're doing it

today I hope that answers the question

yeah that was a pretty good run around and


for attending today you been listening

to it talk about the evolution

of data is management and water

from a lot of people here I want to thank the panelists

including Steve Rutledge from

Arcadia data Mart vandewiele

from hvr software and

By Golly

Wow data

management for Big Data Hadoop and

data likes goodbye