Webinar Panel

Data Management for Big Data, Hadoop, and Data Lakes

Big data is evolving rapidly. From ballooning in sheer size to the diversification of sources and structures, there's a multitude of things we all need to make adjustments for in how we manage data.

In this TDWI webinar, four data experts weigh-in on the evolution of data and the implications for its management and business use. 

Gain valuable insight on:

  • How data itself is evolving in size, source, structure and speed
  • How data management is changing with revised practices, new tools and platforms
  • How all of this impacts business use in terms of analytics, monitoring and transformation

Watch now:

Data Management for Big Data Hadoop and Data Lakes

Transcript

[00:00:00.000]
hello this is spell approximate DWI

[00:00:02.399]
and a welcome to a t DWI

[00:00:04.700]
webinar panel today

[00:00:06.700]
we're going to be talking to a

[00:00:08.800]
panel of Experts of data management

[00:00:11.300]
data and

[00:00:13.699]
data Lake and also

[00:00:21.399]
was first of all

[00:00:23.500]
I want to thank you for finding time in your

[00:00:25.500]
busy schedule to join us and

[00:00:27.500]
also we do want to hear your questions

[00:00:30.000]
and we're going to answer as many as we can at the end

[00:00:32.100]
of the webinar so as

[00:00:34.100]
we proceed to the webinar please

[00:00:36.299]
look in your browser-based user interface

[00:00:38.899]
and find a place for entering a question

[00:00:41.500]
that way your questions will be up

[00:00:43.700]
and ready for us to answer later now

[00:00:46.500]
for those of you please

[00:00:48.799]
use our calls.

[00:00:51.000]
On Twitter that would be hashtag

[00:00:53.399]
TD WI

[00:00:59.899]
it will be archived noon available for

[00:01:02.000]
playback for those of you who registered

[00:01:04.500]
for the webinar today we

[00:01:06.700]
will send you an email that includes a

[00:01:08.700]
link to the archive so feel

[00:01:10.700]
free to replay the webinar

[00:01:22.099]
set alarm today

[00:01:24.500]
as I said panel

[00:01:27.200]
and on our panel today we

[00:01:29.299]
have experts from for software firm

[00:01:31.500]
each representative advancement

[00:01:34.599]
including uses involving Big Data

[00:01:36.900]
Hadoop and data lights and that's what

[00:01:39.099]
we're going to talk about today so to

[00:01:41.099]
get ready for that I'm going to lay out some basic

[00:01:43.200]
call

[00:01:46.200]
introduce the panel members and

[00:01:48.200]
we're going to walk to be three groups

[00:01:50.200]
of questions first of

[00:01:52.299]
all how is data itself involving

[00:01:54.599]
station

[00:01:58.400]
of sources in structure and so forth

[00:02:00.700]
number to we'll talk about how

[00:02:02.799]
many

[00:02:11.300]
of us are revising our practices and

[00:02:13.300]
adopting you tools and platform

[00:02:19.199]
as we diversify

[00:02:21.199]
into more analytics business

[00:02:23.199]
monitoring and business transformation and

[00:02:25.599]
then finally will get to all of your

[00:02:27.699]
questions that you submitted through your browser-based

[00:02:29.699]
user interface well

[00:02:31.900]
let me just set a real quick staying here as

[00:02:34.099]
I mentioned there's a lot of evolution unstructured

[00:02:44.000]
to semi-structured to fully

[00:02:46.199]
structured even relational sources and

[00:02:48.800]
devices

[00:02:55.800]
from Vehicles

[00:02:57.800]
you know like frogs are real cars in freight

[00:02:59.900]
companies and so you're getting a lot more so

[00:03:07.400]
we all need to make some adjustments to

[00:03:09.599]
the way we manage data and tell data management

[00:03:11.800]
practice and to all the fart balding

[00:03:13.900]
accordingly so we have

[00:03:19.199]
applied as we access

[00:03:21.500]
stator for the first time and that's very different

[00:03:23.900]
from what we get we done and data warehousing for

[00:03:25.900]
years which is there

[00:03:31.599]
a quarter new platform people

[00:03:43.699]
themselves are getting more and more Savvy about

[00:03:46.199]
how to use data for some type of

[00:03:48.199]
organizational advantage or other forms

[00:03:50.500]
of business value of a

[00:03:55.300]
lot of your using the new data to extend reviews

[00:03:57.699]
of customers cases

[00:04:07.699]
as we go quick

[00:04:09.800]
word about Hadoop you know part of

[00:04:11.800]
my job every

[00:04:19.399]
question says Run 3 times in 2012

[00:04:22.000]
14 and 16 in 2016

[00:04:24.600]
we found at 20% of

[00:04:27.100]
data warehouse programs surveyed

[00:04:29.899]
had his clusters

[00:04:32.000]
in production

[00:04:34.199]
so it's

[00:04:43.800]
all fun and could you about their regularly

[00:04:45.800]
used primarily to Extended

[00:04:48.000]
warehouses and her provide

[00:04:50.300]
go to storage and analytic

[00:04:52.300]
computing power for analytic processing

[00:04:56.300]
data legs data

[00:04:58.600]
archiving a marketing data

[00:05:00.699]
specialist multi-channel marketing operation

[00:05:04.399]
day Toronto can

[00:05:07.399]
be physically located on Primacy

[00:05:09.899]
Zeno behind the firewall in your Enterprise or

[00:05:12.800]
I could be cloud-based or a combination of

[00:05:15.000]
both so weird things to do in clouds

[00:05:19.100]
let me say a few words about the daylight

[00:05:23.800]
most

[00:05:26.000]
of the day like it's not a platform you

[00:05:28.000]
would buy from a vendor or get through open

[00:05:30.000]
store something like that instead

[00:05:32.300]
the data like is a method

[00:05:34.300]
for organizing a large volumes

[00:05:36.399]
of data that

[00:05:38.399]
you design data based in the past

[00:05:40.399]
right you can't just start a database the

[00:05:47.800]
Guardians and all kinds of things like that for

[00:05:49.899]
the day like help to create two structures

[00:05:52.199]
for a check

[00:05:53.800]
combining and collecting data in

[00:05:55.899]
a very large way today

[00:05:58.399]
because dinner's

[00:06:06.199]
at this mean different file tax or

[00:06:16.399]
relationship database or

[00:06:18.600]
bow you know I think a lot of people dying

[00:06:30.000]
in the plug at a lake on a relational database or

[00:06:32.300]
a combination and of course you

[00:06:34.899]
know any of those doors

[00:06:42.000]
is a metronome among those

[00:06:44.000]
include customer channels and touch points

[00:06:46.300]
and I'm

[00:06:54.500]
not placing the lake in warehousing multi-channel

[00:06:57.600]
marketing right

[00:07:06.000]
again I think a lot of people just need your

[00:07:08.100]
reaction today like new messages

[00:07:18.199]
from a wide range of sources of modern

[00:07:20.500]
and traditional data

[00:07:31.300]
and maybe

[00:07:34.199]
thinking okay that sounds good but more

[00:07:43.399]
deeply adapting canoe and a lake

[00:07:45.399]
and a part of it is to

[00:07:47.500]
have a way to capture new data sources

[00:07:49.800]
and the voice

[00:07:55.500]
the prospect of having new inside

[00:07:57.899]
a new

[00:08:01.600]
well but

[00:08:09.100]
there's also a lot of your

[00:08:19.500]
360 degree views of

[00:08:21.800]
customers other facilities as well

[00:08:31.300]
Apartments Exedra until

[00:08:33.399]
you can do correlations mostly across

[00:08:36.399]
this more Diversified data calculations

[00:08:44.200]
that customer segmentation accuracy

[00:08:56.700]
for the recording

[00:09:07.899]
analytic especially visualization with

[00:09:10.000]
it so as you think about how you going to

[00:09:12.000]
want

[00:09:19.500]
some kind of Self Service access service

[00:09:31.299]
you have very data may be exactly

[00:09:33.500]
real-time data and if you

[00:09:43.299]
want to have management dashboards and that

[00:09:45.500]
doesn't have metrics with some managers

[00:09:47.600]
need refreshed regularly throughout

[00:09:49.700]
the hotel

[00:09:55.299]
for sale that's one way to capture that data

[00:09:57.600]
so that you can be more frequent management

[00:10:02.200]
driven delivery product

[00:10:04.799]
let's

[00:10:08.000]
get to the real me to the matter here while we're

[00:10:10.000]
going to have some conversations

[00:10:12.000]
with us today so

[00:10:14.200]
I'm going to go right

[00:10:17.200]
into the question so I will

[00:10:19.700]
hear from today will be Steve willing she's

[00:10:21.799]
vice president of marketing at Arcadia data

[00:10:23.899]
we also have Mark vandewiele Chief

[00:10:26.500]
technology officer from a tree

[00:10:31.200]
terribly and nurtured Waterpark

[00:10:33.799]
who is Chief product

[00:10:36.000]
officer at Texana okay

[00:10:38.700]
today today

[00:10:44.200]
I'm going to get people

[00:10:46.200]
to feel

[00:10:55.399]
free to pick a spot and dive in the

[00:10:57.399]
Steve take it away

[00:10:59.799]
hi everyone just

[00:11:01.899]
a quick background I've been in this market for

[00:11:04.000]
about 15 years that different

[00:11:06.000]
bi vendors like business objects

[00:11:08.100]
Thunders and currently

[00:11:10.799]
in Arcadia data and what I've really seen

[00:11:12.899]
in the past 15 years is that dated

[00:11:15.100]
really becoming more dynamic in

[00:11:17.100]
terms of how we Define it and that's

[00:11:19.299]
causing organizations to struggle with these rigid

[00:11:21.799]
or static date of pipelines that we can

[00:11:23.799]
place for a long time where you got to change

[00:11:26.500]
it into the Steam app for a report

[00:11:28.899]
or something within the day Rob could literally take

[00:11:30.899]
6 months or million dollars of cost

[00:11:32.899]
and this is a real figure from

[00:11:35.100]
my head on a customer Advisory

[00:11:37.100]
board company at

[00:11:48.500]
the same time we have these new formats like

[00:11:50.500]
Jason which allows self-description

[00:11:52.500]
to be present the data and they're

[00:11:54.500]
really designed for Machinery readability

[00:11:56.500]
at large scales

[00:11:59.700]
cannot make these rigid structure

[00:12:02.299]
and at the same time it needs to be able to survive

[00:12:04.399]
inside from that perspective

[00:12:12.100]
from streaming

[00:12:15.299]
analytics and set-top boxes and an advertising

[00:12:17.700]
concepts of security and

[00:12:19.700]
machine laws that are out there trade to balance

[00:12:21.700]
internet services on and at

[00:12:25.000]
the same time where we live in It's On Demand who world

[00:12:27.500]
where people want access to the data

[00:12:29.600]
in real time format

[00:12:35.899]
to talk about and I really

[00:12:38.000]
see that the applications in the future is business

[00:12:40.299]
intelligence of that we need to be able to read

[00:12:42.799]
that stuff in real time and allow

[00:12:45.100]
people to explore it so example

[00:12:48.299]
application for example customers

[00:12:51.700]
and Social Security and what they're trying to do is have

[00:12:53.700]
real-time incident response alerts go

[00:12:55.700]
off but then give their security

[00:12:57.700]
analyst the ability to drill down into

[00:12:59.700]
across all the endpoints the networks

[00:13:02.200]
in the users to see who's connected to

[00:13:04.500]
that incident and triage it and take you

[00:13:07.100]
got

[00:13:12.100]
Fleet managers let's say that

[00:13:14.100]
want to track drivers

[00:13:16.200]
was saying a truck week

[00:13:18.200]
and they want to be alerted to an incident and then

[00:13:20.200]
to look

[00:13:22.500]
at the aggression patterns

[00:13:24.799]
of those type of those drivers overtime

[00:13:27.100]
or what's the weather like in a different

[00:13:29.399]
the road

[00:13:31.399]
service yeah

[00:13:48.100]
thanks Philip then thank

[00:13:50.299]
everyone for listening

[00:13:52.399]
into today's webinar a little

[00:13:54.600]
bit about my background I've

[00:13:56.700]
been two decades in the industry

[00:13:58.899]
by now and basically

[00:14:01.100]
been in the in the database management

[00:14:03.399]
Arena as well as in

[00:14:05.399]
the real-time data movement space

[00:14:07.899]
solutely

[00:14:11.600]
I see some of the same Friends the

[00:14:14.200]
TV thing we absolutely see

[00:14:16.200]
the demand for real-time data

[00:14:19.600]
that's of course when we get involved from

[00:14:22.399]
cremation cost effective if

[00:14:25.200]
I just quickly go through

[00:14:27.200]
the list of questions and what changes do

[00:14:29.299]
I see in the beta we have

[00:14:31.299]
to let me see a significant increase

[00:14:33.299]
in volume some of that is driven

[00:14:35.399]
by a coyote but some of that is also

[00:14:37.500]
driven by unlocking data

[00:14:40.000]
in Legacy systems as well we we see

[00:14:42.200]
eternity in our customer

[00:14:44.200]
base that they want to just simply add

[00:14:46.600]
more and call data sources have

[00:14:51.600]
been in have been around and have been running for

[00:14:54.200]
many years we see volume

[00:14:57.299]
both from a transaction processing

[00:14:59.899]
the fact David as as well as for me,

[00:15:02.500]
tell Dana volume defective I

[00:15:05.100]
think what was driving to change is

[00:15:07.200]
simply that by now the technologies

[00:15:09.799]
have evolved to the point

[00:15:11.799]
that customer

[00:15:14.100]
start to see how they can get

[00:15:17.200]
the value out of the day that

[00:15:19.299]
if there's no there's no longer technological

[00:15:22.100]
limitations for today that would artificially

[00:15:25.399]
limit the volume of data that needs

[00:15:27.399]
to be collected or that would artificially

[00:15:30.200]
limit the

[00:15:32.399]
the real-time aspect of the day that

[00:15:34.500]
we now have to know if he's available that

[00:15:36.899]
even though some of those sources

[00:15:39.500]
are extremely busy we can actually

[00:15:41.700]
capturing and manage the data at

[00:15:43.799]
scale at volume and

[00:15:45.899]
we can start combining that to drive

[00:15:48.200]
this is about you so what

[00:15:50.899]
can we expect from today to the Future from

[00:15:53.299]
that angle I think we're we're only

[00:15:55.299]
scratching the surface we've

[00:15:57.700]
got several customers at

[00:16:00.000]
HR who are building out their day to Lakeland

[00:16:02.399]
well the one thing that we have

[00:16:04.399]
to let me see for all of those

[00:16:06.500]
customers is that they continue to make changes

[00:16:08.500]
to the environment of

[00:16:11.100]
course they add sources they add different

[00:16:13.500]
data type they decide

[00:16:15.700]
that we haven't seen before but they're

[00:16:17.799]
also changing Technologies and

[00:16:19.799]
all of that said that

[00:16:21.899]
would you already said it's

[00:16:26.500]
like we see that happening on

[00:16:28.700]
promises we see that happening in

[00:16:30.700]
the cloud that we see our client

[00:16:32.799]
shifting from one deployment model

[00:16:34.899]
to the other week it's it's really all

[00:16:37.100]
the trade it all over the map in

[00:16:39.200]
the one constant is really changed here so

[00:16:41.299]
I expect to see a lot more of this

[00:16:43.299]
going forward thank you

[00:16:47.200]
examples

[00:16:56.500]
yeah thank you I feel so I

[00:16:59.600]
absolutely agree with Steven

[00:17:02.299]
Mark Mark said about

[00:17:05.000]
being

[00:17:07.000]
in the collection and

[00:17:09.000]
aggregation of multiple data sources in for

[00:17:20.799]
our kind of like the same

[00:17:22.799]
I mean they haven't changed in the course

[00:17:24.799]
of time I mean in the

[00:17:26.799]
end of course you know the new

[00:17:33.000]
world Concepts and

[00:17:35.000]
I'm talking about techniques like the flirting and being

[00:17:38.500]
applied in dating site but

[00:17:40.700]
it

[00:17:44.599]
has been since kind of time

[00:17:46.500]
Pacific methodologies have been developed

[00:17:48.599]
to answer questions then apparently

[00:17:50.900]
we basically deal with Adam Sandler

[00:17:52.900]
local data so that

[00:17:54.900]
kind of limits Us in terms of the

[00:17:57.000]
data Trends So

[00:17:59.200]
my answer will only be

[00:18:01.200]
in regards to animal cool

[00:18:03.200]
so

[00:18:07.599]
what are we eating

[00:18:10.799]
is bad

[00:18:12.700]
the sources of data are getting Much More

[00:18:14.799]
Diapers. Kind of like what Mark

[00:18:17.000]
mentioned before being

[00:18:19.400]
more and more companies even smaller ones are utilizing

[00:18:21.599]
over a dozen different sources together

[00:18:23.599]
the data replace the ability to

[00:18:25.599]
gather data into

[00:18:27.700]
one plate with which makes

[00:18:29.700]
the ability to gather data into

[00:18:31.799]
one place organized

[00:18:34.900]
much

[00:18:37.099]
heart attack that were saying

[00:18:39.400]
is that smaller and smaller companies are

[00:18:41.500]
gathering more and more stores

[00:18:48.900]
are costly going down revolutionize

[00:18:51.900]
the space but the thing is

[00:18:53.900]
that most of the day that is still

[00:18:56.200]
kind of garbage you going to head up there they're collecting

[00:18:58.500]
meeting for

[00:19:01.099]
extracting Valium and most of it will probably

[00:19:03.099]
never be used but they're still

[00:19:05.200]
holding it away which is trading town

[00:19:07.900]
accessing the date of birth

[00:19:10.000]
actually matters so it's

[00:19:14.799]
becoming

[00:19:20.200]
easier everyday

[00:19:21.900]
two it's cheaper to store data storage

[00:19:24.500]
everyday and especially new company everybody

[00:19:27.299]
wants to be data-driven so there

[00:19:30.500]
are a growing number of date of organization

[00:19:34.700]
and we believe that we will

[00:19:36.700]
continue to grow and

[00:19:46.200]
the woman from on premise what they think Mark mentioned

[00:19:48.299]
earlier at

[00:19:51.200]
least multi-vendor are

[00:19:57.599]
yeah that's good examples as

[00:19:59.599]
well thank you thanks

[00:20:10.099]
thanks very much Philip and you

[00:20:12.299]
know I'll just briefly introduce myself I

[00:20:14.500]
am the chief product officer of Pat's

[00:20:16.500]
Auto and like the other distinguished

[00:20:19.000]
analyst I've been in the bi and analytics

[00:20:21.299]
page or too long I

[00:20:26.099]
actually

[00:20:34.400]
think what we're seeing is not

[00:20:37.099]
being data driven but being data disillusioned

[00:20:39.799]
so I think that the the

[00:20:41.900]
date of the big

[00:20:44.400]
lot of data that is now available is

[00:20:47.299]
forcing us into becoming a hoarders and

[00:20:49.799]
so there is there's

[00:20:51.900]
no question that was what the fox said

[00:20:53.900]
before the absolutely true which is that we're

[00:20:56.400]
getting data from

[00:20:57.500]
call nupur idea sources you know we

[00:20:59.500]
started off in the relational World

[00:21:01.700]
pulling lots of data from the transaxle application

[00:21:04.299]
and then you know social media

[00:21:06.599]
machine data are

[00:21:10.200]
coming from different places but the fact

[00:21:12.400]
is that what we're staying with our customers

[00:21:14.400]
is that they have lots

[00:21:16.900]
of data. The problem

[00:21:18.900]
has shifted because of

[00:21:20.900]
the economies of scale that they

[00:21:23.000]
did like architecture is provided what

[00:21:25.400]
are there really struggling is

[00:21:27.599]
actually turning that data into something that's

[00:21:29.700]
actually valuable so a lot

[00:21:31.799]
of our customers are telling us that the problem

[00:21:34.000]
is not getting the data in the first place we

[00:21:36.099]
now have solutions to that it's actually

[00:21:38.299]
making that date of valuable and

[00:21:40.299]
useful when it's portable or

[00:21:42.500]
for the challenges that that we're going

[00:21:44.500]
through and I think you know what's

[00:21:46.500]
it worth driving that change is that ultimately

[00:21:49.500]
being able to collect and store data

[00:21:51.799]
is a commodity right you

[00:21:54.299]
can continue to buy and if you look at the prices

[00:21:57.500]
Microsoft and and the cloud

[00:21:59.599]
vendors are driving from a storage perspective

[00:22:02.200]
you're seeing that the you know the limit

[00:22:04.599]
is not in how much data you can store

[00:22:06.799]
right but be but

[00:22:09.000]
the fact is that for the

[00:22:11.000]
rate that the storage is

[00:22:13.000]
going up we are certainly not seeing

[00:22:15.599]
a commensurate rate in business value

[00:22:18.000]
that's being being

[00:22:20.099]
driven from these folks so we have

[00:22:22.500]
customers like some of the largest banks

[00:22:24.599]
in the world where they they have

[00:22:26.700]
gone all-in on the data Lake architecture

[00:22:29.299]
your harvesting data from you literally

[00:22:31.599]
hundreds or thousands of sore but

[00:22:34.500]
the fact is that all of that harvesting hasn't

[00:22:36.900]
led to to be able to

[00:22:38.900]
exploit that information because the

[00:22:41.000]
data isn't clean contacts

[00:22:46.900]
it's not really usable for their

[00:22:48.900]
perspective so what

[00:22:51.000]
we do when we believe very strongly

[00:22:53.099]
is that what we're going to expect from data on

[00:22:55.099]
the future is that

[00:22:57.500]
how to turn into you know we're going to have to use

[00:22:59.500]
Smart in automation to

[00:23:01.700]
make data used for an exploitable

[00:23:03.700]
instead of just focusing on

[00:23:05.799]
collecting it and I think that's the next

[00:23:07.900]
order of capability that will

[00:23:09.900]
start to drive value of our customers and

[00:23:12.200]
it's certainly something that we attack Tata

[00:23:14.200]
are pioneering in the bucket second

[00:23:21.900]
group of question and I are

[00:23:33.799]
so much better than they were in the past and so

[00:23:35.799]
we can we can let go of a

[00:23:37.900]
lot of the crap that we did

[00:23:40.000]
in the past do crap on the Fly and

[00:23:42.400]
I need

[00:23:45.799]
to make as users in our best practices

[00:23:48.000]
and also in our portfolios

[00:23:50.200]
of data platform leverage

[00:23:54.599]
not just take care of the new day

[00:23:57.400]
new possibilities in the hardware

[00:23:59.500]
and software for real

[00:24:01.700]
yeah Great quest guide great

[00:24:03.700]
question Philip and so and

[00:24:06.500]
also I actually wanted to go

[00:24:08.700]
back to 9 checkouts, it's

[00:24:10.900]
about organization

[00:24:13.299]
simply talking to hoard

[00:24:15.299]
data more than 10 driving

[00:24:17.400]
business value out of it I do

[00:24:19.400]
think that tended to my earlier comment

[00:24:22.000]
did the day 2 Lego architectures

[00:24:24.299]
that we're seeing that the only call sideris changes

[00:24:26.900]
stuff that's really what's going on here

[00:24:28.900]
there is a lot of this data

[00:24:31.299]
being collected and in

[00:24:33.400]
an effort to drive the business

[00:24:35.500]
value dude there's an ongoing realization

[00:24:38.599]
the changes need to be made in

[00:24:40.700]
order to to drive that business

[00:24:43.000]
and a friend that's a fact if we

[00:24:45.000]
see technology changes and

[00:24:47.000]
that's where we see some

[00:24:49.799]
environment we see limitations if

[00:24:51.799]
the most traditional relational

[00:24:54.500]
Technologies and we see a lot

[00:24:56.500]
more organizations

[00:24:58.500]
a lot more of our customers shifting

[00:25:00.900]
to file system

[00:25:03.400]
from a management effective again

[00:25:05.900]
to nunchucks earlier point that

[00:25:08.000]
the sum of that is it simply driven by a

[00:25:10.599]
cough argument but

[00:25:13.599]
it's also I think from a from a volume

[00:25:15.900]
and scale perspective but

[00:25:18.000]
they said it's the change that

[00:25:20.000]
were sitting in this environment and at

[00:25:23.599]
the same time as well as customers

[00:25:26.099]
adopting does the new data

[00:25:28.099]
platform we see that there are

[00:25:30.099]
at least in the early phases

[00:25:32.400]
ironically

[00:25:35.599]
the traditional turn signal

[00:25:37.900]
processing applications because the date that

[00:25:39.900]
window systems is unlike

[00:25:42.000]
IRT date at which kinds

[00:25:44.200]
of kind of comes in as a stream of date

[00:25:46.400]
as a set of measurements and

[00:25:49.000]
in many cases in contrast

[00:25:51.299]
to that kind of date other traditional transactional

[00:25:54.099]
application process in traffic update

[00:25:56.299]
and delete

[00:25:59.500]
as well and organizations

[00:26:01.700]
initially really struggling

[00:26:03.900]
to make sense out of the date and

[00:26:06.000]
also to to the earlier point

[00:26:08.200]
that were made about data quality is

[00:26:10.200]
just the challenge to validate that date

[00:26:12.400]
and make sure that well data

[00:26:14.700]
quality is of course one aspect it

[00:26:16.900]
is what the quality of the date on

[00:26:18.900]
the stores and to the early appointed

[00:26:21.200]
by data warehousing cleansing gate at

[00:26:23.299]
that dress but it's not just the

[00:26:26.200]
point of cleansing the date and the master

[00:26:28.200]
data management aspect of it shifted

[00:26:30.200]
the kids challenges dealing with

[00:26:32.200]
well is the date

[00:26:34.200]
on the destination the same as

[00:26:36.400]
in the store so what types

[00:26:38.599]
of tools and platforms are they turning to hdfs

[00:26:42.000]
of coral to do one of them until

[00:26:44.400]
after you said that in your of the survey

[00:26:46.500]
results but also S3 and

[00:26:48.799]
some of the other class

[00:27:01.599]
what thank you Mark

[00:27:03.900]
and gentlemen I need to ask you to be

[00:27:05.900]
more concise in your answers

[00:27:13.000]
yes I'm searching for the changes

[00:27:15.500]
that uses are making tutor data

[00:27:17.799]
there's definitely more awareness of being data-driven

[00:27:21.700]
are there for the way that is stored

[00:27:23.900]
in Production service today is much more oriented towards

[00:27:26.200]
their data architecture than it

[00:27:28.200]
was the same a decade ago I'm

[00:27:30.299]
also because I mentioned

[00:27:32.700]
in the past in the earlier

[00:27:34.700]
your other question now you

[00:27:37.500]
take that awareness and turn it into the data

[00:27:39.799]
are out of there still a great Gap

[00:27:41.900]
build

[00:27:45.000]
their proprietary data infrastructure it's

[00:27:48.599]
really cool because of Technologies

[00:27:51.000]
are awesome and become

[00:27:53.400]
faster in and more towns

[00:27:55.599]
are being solved in the cooler way

[00:27:57.599]
I'd like to call Jesus. I just

[00:28:01.099]
mentioned what they don't realize is

[00:28:03.500]
that profession itself and

[00:28:05.599]
its we're

[00:28:09.900]
definitely seeing more and more companies especially the

[00:28:12.200]
smaller and checkered one I'm

[00:28:14.200]
turning into a self-managed

[00:28:16.500]
platform

[00:28:21.599]
optimizing passport in fact I'm

[00:28:23.599]
in the first in a survey that we public on

[00:28:25.700]
our site professionals

[00:28:28.000]
and I've seen similar surveys also

[00:28:30.299]
in Gardner's latest latest report was

[00:28:33.799]
by far the highest challenge come cab companies

[00:28:36.000]
are facing today

[00:28:38.900]
so just like just like Enterprises

[00:28:41.099]
are just now realizing the power of the cloud and there

[00:28:43.099]
is a huge wave of companies migraine to

[00:28:45.099]
Cloud infrastructure or at least some sort

[00:28:47.099]
of hybrid infrastructure the

[00:28:49.799]
companies that are more in the Forefront of

[00:28:52.000]
data Technologies are now beginning

[00:28:54.000]
to offload these challenges to

[00:28:56.400]
self optimizing self-managed platform

[00:28:59.000]
and this is just for

[00:29:01.400]
the sake of focusing

[00:29:03.500]
and cost optimization

[00:29:05.799]
okay thank

[00:29:10.000]
you very much happy

[00:29:20.000]
to answer that. So the first

[00:29:22.099]
thing that we have to ask is who is

[00:29:24.500]
the user who is part of of

[00:29:26.500]
the data management practices I

[00:29:28.900]
think we've seen the last decade or

[00:29:30.900]
so that there is for the inexorable push

[00:29:33.400]
toward democratizing information

[00:29:36.200]
across you know not just the

[00:29:38.599]
data scientist Community not just

[00:29:40.599]
the developer and engineer moving

[00:29:43.700]
beyond toward see the

[00:29:45.700]
average business user inside of an organization

[00:29:48.500]
frankly most of

[00:29:50.700]
those folks have been using

[00:29:52.900]
tools like Excel and access I'm sure

[00:29:54.900]
many of you are familiar with that writing

[00:29:57.500]
vlookups and macros

[00:29:59.900]
and pivot able to be

[00:30:01.900]
able to manage their data but

[00:30:04.000]
not really having tools or capabilities

[00:30:06.200]
to do this kind of work and so

[00:30:08.299]
the big change that we see happening

[00:30:10.400]
is that on the one hand the

[00:30:12.799]
end users have really adopted Self

[00:30:15.000]
Service tools like Tableau and click

[00:30:17.000]
and others

[00:30:17.900]
light on the other hand their IP colleagues

[00:30:20.200]
are working with data management infrastructure that's

[00:30:22.700]
allowing for the persistence and management

[00:30:25.000]
of poly structured data Json

[00:30:27.000]
about

[00:30:30.599]
their haven't been tools and capabilities

[00:30:33.000]
that allow you to bridge the gap between

[00:30:35.000]
those too especially for

[00:30:37.400]
the business business people and so

[00:30:39.700]
what we're seeing and certainly the

[00:30:41.700]
things that we're driving in the market

[00:30:44.000]
looking opportunity

[00:30:46.400]
to provide a self-service

[00:30:48.599]
mechanism or

[00:30:50.599]
groups of people can actually leveraged

[00:30:53.400]
the data collected and

[00:30:55.799]
give them the capability yeah

[00:31:07.299]
I have to agree with you those are all excellent

[00:31:09.599]
points and I mentioned

[00:31:12.299]
it earlier to the audience

[00:31:17.900]
wider range of users going to want to access

[00:31:20.099]
if we see a lot of marketers that's the

[00:31:22.099]
hotspot I think right now Texas

[00:31:42.200]
Santa to a portfolio

[00:31:45.599]
definitely I've got a lot of what

[00:31:47.700]
then she was also saying just

[00:31:49.900]
now it says that did it to my current

[00:31:51.900]
position of the access to the data

[00:31:53.900]
and the challenge of angry faces if you will get the

[00:31:56.900]
new type of structure

[00:31:59.099]
tools as I was pulling

[00:32:01.200]
out don't have them really well in a lot of questions

[00:32:03.799]
to even ask how can

[00:32:10.700]
we allow them discover and explore and

[00:32:13.400]
I think the fact

[00:32:21.599]
that up to one is data wakes it

[00:32:24.400]
it just doesn't scale as it's pointed

[00:32:26.799]
at our house or at the aggregate

[00:32:28.799]
their data to pull it out so you can I lose all the

[00:32:30.799]
granularity infidelity that data so there's

[00:32:33.700]
a whole new generation of Technologies I

[00:32:35.700]
like to call native or data native bi

[00:32:38.799]
tools are visual analytics that allow you to run

[00:32:40.900]
processing as

[00:32:43.299]
a promise directly by the day that we

[00:32:45.500]
former self and that gives people incredibly

[00:32:47.599]
fast access to the data as long as

[00:32:49.700]
they can access it and it's for the native format

[00:32:51.900]
like I was talking about earlier with Jason all the types

[00:32:54.000]
of things but other

[00:32:56.900]
things like store in Texas

[00:32:59.000]
or real-time sources like how can

[00:33:01.200]
others see cells

[00:33:10.299]
which is available now because we nor

[00:33:12.400]
Open Standards that sings like a pro

[00:33:17.900]
excellent and now I'm glad you

[00:33:19.900]
brought up a data exploration and

[00:33:21.900]
Discovery just

[00:33:24.099]
to get a grip on what the new data is

[00:33:26.200]
Daddy

[00:33:30.099]
and also what's the technical condition

[00:33:32.200]
of it from a steamer or a

[00:33:34.200]
day to call you before I head

[00:33:42.200]
a requirement gift

[00:33:54.099]
over and I know we touched on business value

[00:33:56.400]
but here's where we really dig into business value

[00:33:58.599]
and this for the rubber hits the road

[00:34:00.599]
it's it's time-consuming expensive to

[00:34:03.099]
expand

[00:34:06.400]
a dead man's going to grab the new data

[00:34:08.500]
but I think it's worth

[00:34:16.599]
right so I'm waiting

[00:34:19.199]
on the answer to the question

[00:34:21.199]
like this is something

[00:34:29.000]
you know it and everybody

[00:34:31.000]
can read about this online you know how for

[00:34:33.000]
turned a $1,200 lawsuit

[00:34:35.099]
to a prophet less than three years while

[00:34:37.400]
using data science and Kimberly-Clark know

[00:34:40.699]
they're massive massive amount of times they

[00:34:42.699]
have like over a billion clients and any different

[00:34:44.699]
countries do the dozens of ranz

[00:34:46.800]
and sewing across like multiple mobile Social

[00:34:48.900]
Web you now

[00:34:53.500]
that utilize Big Data everything from customers

[00:34:55.900]
retail level 2 inventory to improve

[00:34:58.000]
stock forecast in Target retail

[00:35:01.599]
company being one of

[00:35:03.900]
the largest employers are data

[00:35:05.900]
scientist in the United States and

[00:35:08.400]
that's just the tip of the iceberg it's

[00:35:10.800]
a known fact today that 80% of data

[00:35:12.900]
collected is not being utilized for

[00:35:15.199]
Analytics

[00:35:16.599]
exact value now that doesn't

[00:35:18.599]
mean that doesn't

[00:35:20.699]
mean that there's

[00:35:22.699]
actually 80% more

[00:35:24.900]
is

[00:35:27.099]
not being other

[00:35:29.400]
still there there's a big

[00:35:31.800]
chunk of data that you

[00:35:35.099]
can still utilize for checking more value

[00:35:37.099]
in the

[00:35:39.900]
volume 80%

[00:35:43.400]
of their time on prepping the data and

[00:35:45.599]
infrastructure no more

[00:35:51.400]
sophisticated

[00:35:53.500]
a platform offload

[00:35:56.400]
all those from the inside

[00:36:01.900]
this will be able to five

[00:36:04.800]
times more efficient in expected value

[00:36:09.800]
yeah bring good example thank you for

[00:36:11.900]
some Industries specific examples to

[00:36:21.199]
Sofia the First

[00:36:23.199]
the first thing is you know how do we how

[00:36:25.199]
do we quantify the value right now and I I

[00:36:27.199]
like I'm from New Jersey but I like to keep things

[00:36:29.300]
really looking for them

[00:36:33.400]
call so you're keeping them out of jail so

[00:36:35.599]
you're tired

[00:36:37.900]
of those things right approach

[00:36:42.800]
of folks and and Joan

[00:36:45.500]
right before me refer to this 80%

[00:36:47.900]
of the time that people stand in analytics projects

[00:36:50.400]
he's actually on the date of preparation process

[00:36:52.699]
right and so if you can

[00:36:54.699]
find new ways to actually be able to

[00:36:56.800]
allow people to interactively

[00:36:58.900]
transform data and basically

[00:37:01.599]
flipped the equation right what if we could spend

[00:37:03.599]
80% of our time on the value

[00:37:05.599]
generation instead of scrubbing and cleaning and

[00:37:07.900]
all the other things we enjoyed you so

[00:37:10.000]
what we have seen is still

[00:37:12.199]
open that is some very dramatic

[00:37:14.500]
return on the ability to work with

[00:37:16.699]
with big data on one of our largest customers

[00:37:19.400]
and financial services

[00:37:21.199]
they had over a billion dollars

[00:37:23.400]
in fines over the last 5 years do

[00:37:25.900]
the Regulatory Compliance challenges and

[00:37:28.500]
every additional regulatory report

[00:37:30.900]
because of the date of preparation process took

[00:37:33.300]
them 22 days to implement

[00:37:35.599]
but going with a self-service

[00:37:37.800]
approach we're actually business users

[00:37:40.000]
the ones who had the domain knowledge or able

[00:37:42.099]
to prep the data themselves.

[00:37:44.300]
We were

[00:37:46.300]
able to take that 22 day process and

[00:37:48.599]
turn it into a one-day process so

[00:37:50.599]
that's an example on the compliance by on

[00:37:53.500]
the customer side we're

[00:37:55.500]
seeing some of our customers were looking at

[00:37:57.500]
initiative bicycle view of the customer

[00:37:59.500]
and again they're pulling data from

[00:38:01.500]
multiple different data sources including web

[00:38:03.599]
logs including the classic

[00:38:05.699]
call center information and

[00:38:07.800]
the light and the ability

[00:38:09.800]
to take that data and very rapidly

[00:38:12.300]
Consolidated the duplicated

[00:38:14.699]
aggregated to make it useful for analytics

[00:38:17.400]
allows them to make very quick

[00:38:19.400]
decisions about that just no offers I

[00:38:21.400]
want to provide those customers additional

[00:38:23.900]
ways that they can increase the customer satisfaction

[00:38:26.099]
Etc so in

[00:38:28.400]
summary I think that whether it's

[00:38:30.400]
you know reducing costs or increasing

[00:38:32.599]
valve at work keeping people

[00:38:34.599]
in compliance that

[00:38:36.699]
there are we know we have seen the

[00:38:38.900]
power of Self Service Information Management

[00:38:41.199]
to really drive that value

[00:38:43.300]
across all three of those dimensions in Arctic

[00:38:45.599]
sea Plus customer which

[00:38:57.500]
is that just because date

[00:38:59.500]
is coming up from the store from

[00:39:09.500]
data governance

[00:39:21.099]
think about that from the beginning is near death

[00:39:23.300]
in New Theater from new sources and

[00:39:25.400]
likewise I talked to Steve

[00:39:27.500]
earlier about did exploration Discovery

[00:39:29.599]
exploring

[00:39:31.599]
data profiling

[00:39:35.300]
and

[00:39:45.800]
it also when your profile in

[00:39:47.900]
your life and

[00:39:50.199]
for a lot of people Enterprise

[00:39:52.199]
standard your

[00:39:58.800]
turn sure

[00:40:02.000]
I'm just going to send it to this hundreds

[00:40:04.199]
of use cases across all kinds of Industries

[00:40:06.199]
I think that examples

[00:40:10.099]
of Procter

[00:40:14.800]
& Gamble who's been a customer of Arcadia

[00:40:17.199]
dated for a while and they had a top-down

[00:40:19.199]
initiatives to really look at how to leverage

[00:40:21.300]
this new date of the top there from

[00:40:23.300]
social media weather that

[00:40:25.900]
over 25 different data sets so they created

[00:40:28.000]
a data Lake and they wanted to provide

[00:40:30.099]
the product managers at P&G

[00:40:32.099]
ways to analyze over 600

[00:40:34.199]
Brands globally to measure the

[00:40:36.300]
impact of marketing campaigns weather

[00:40:38.500]
in other things that drive sales

[00:40:40.500]
and velocity of a product

[00:40:42.599]
through the Retailer's and use that as a service

[00:40:44.599]
back to the Retailer's to help them to

[00:40:46.800]
her plan with inventory supply chain

[00:40:48.800]
manufacturing products to

[00:40:51.099]
improve that process of raw so we

[00:40:53.500]
stayed awake to

[00:40:55.800]
combine all that information and the

[00:40:58.000]
problem they have that

[00:41:00.300]
there was no Taurus tool

[00:41:02.599]
which were talking about earlier that could

[00:41:04.599]
allow them to really give the the product

[00:41:07.300]
manager the way to drill down to detail and go

[00:41:09.800]
from country to stay the city to individual

[00:41:12.099]
stores and look at velocity of products

[00:41:14.099]
in those type of things until they are able to run

[00:41:16.500]
the analytic directly inside the data Lake where

[00:41:18.599]
I can access all that scale at once so

[00:41:20.599]
it's that native approach that was

[00:41:22.599]
seeing the been really

[00:41:26.300]
in power the business to do

[00:41:28.300]
something that's

[00:41:30.500]
one example of a quick one

[00:41:36.900]
voiding fines and regulations

[00:41:39.000]
North Texas things ever communicate combining

[00:41:41.800]
communication channels like chat logs text

[00:41:44.099]
messages and actually trade

[00:41:46.300]
to assess whether their compliance with regulations

[00:41:48.500]
and quite literally trying

[00:41:50.699]
to reconstruct the view of the world at

[00:41:52.800]
the time that a certain trait is executed

[00:41:54.900]
by a traitor in this allows them to assess

[00:41:56.900]
whether there was any kind of a violation

[00:41:58.900]
catching rectifier for

[00:42:01.500]
the regulatory authorities get involved in this a

[00:42:13.000]
couple of things are great

[00:42:15.000]
examples of you know what's Procter &

[00:42:17.000]
Gamble they were one of the first companies

[00:42:19.400]
to really Define

[00:42:21.699]
modern marketing and the really

[00:42:26.000]
glad to see the Procter & Gamble still

[00:42:28.099]
being very Innovative with marketing

[00:42:36.900]
collecting using

[00:42:38.900]
Data Warehouse environments but the number to environment

[00:42:41.400]
was what I'm now calling the

[00:42:43.599]
marketing it alike and again

[00:42:45.699]
big companies with lots of multi-channel data

[00:42:48.099]
are standing up to your legs

[00:42:50.199]
and I'm glad to hear people

[00:43:01.400]
stand up and talk about their Cloud implementation

[00:43:03.800]
I had

[00:43:12.900]
to get to work with some of the right

[00:43:24.599]
yes I did I

[00:43:26.599]
think that from your example that

[00:43:28.699]
I can get to that I can

[00:43:30.699]
bring to the table here to

[00:43:33.900]
have indeed been a lot of examples already

[00:43:35.900]
but we've been working with us Global manufacturer

[00:43:38.000]
and I remember

[00:43:40.099]
some of the early conversations for a 5

[00:43:42.099]
years ago when the discussion was more about okay

[00:43:44.300]
we got all this i o t data

[00:43:46.699]
from the equipment manufacturing

[00:43:49.300]
how can we start this day and

[00:43:51.500]
this was for 5 years ago that

[00:43:53.800]
problem as long been sold and

[00:43:55.900]
sends them a lot of the operational

[00:43:58.099]
data has been added to the system

[00:44:00.400]
Global manufacturer

[00:44:02.900]
has started in you offering

[00:44:04.900]
nothing like the UK's

[00:44:07.300]
we heard from Steve about Procter & Gamble

[00:44:09.400]
where some of their customers are

[00:44:11.599]
not purchasing

[00:44:13.699]
youth services from the organization to

[00:44:16.500]
make sense of the an Olympics has the

[00:44:18.500]
equipment today purchase from

[00:44:21.000]
this organization now that

[00:44:23.099]
is predominantly

[00:44:24.599]
let's go places with lots of analysis

[00:44:26.599]
happening but the next phase

[00:44:28.900]
of this day to Lake in this day that collection

[00:44:31.199]
analytics Anastasia's to feed

[00:44:33.500]
all that information back into operation

[00:44:35.500]
because if you are in global manufacturing

[00:44:38.000]
and you've got all these this equipment out there you

[00:44:40.400]
can start helping your customers be more

[00:44:42.599]
efficient make sure you prevent

[00:44:44.699]
outages in you you do preventative

[00:44:46.699]
maintenance if I can buy some

[00:44:48.800]
of that Temple generated ate at with

[00:44:51.000]
the traditional Erp data

[00:44:53.300]
to optimize your man manufacturing

[00:44:55.699]
properties to be able to

[00:44:57.800]
increase the app times as your equipment

[00:44:59.800]
soap without generating

[00:45:01.800]
more value for your customers so

[00:45:04.800]
I think really what was seeing

[00:45:07.000]
is Big Daddy. How

[00:45:09.099]
is that going to transform the business on

[00:45:11.400]
the one hand is going to improve customer

[00:45:13.400]
satisfaction is going to create

[00:45:23.199]
Can U versus opportunities as well just

[00:45:25.300]
going to be lines of offering some organizations

[00:45:27.800]
that are driven by the data

[00:45:29.800]
collection in the services driven

[00:45:32.099]
from that so that I think whether to guy

[00:45:34.099]
yeah excellent

[00:45:36.099]
excellent and I were

[00:45:38.099]
talking about the internet of things I

[00:45:40.800]
think we all

[00:45:42.800]
know it's come

[00:45:48.099]
out of hype pretty early in

[00:45:50.400]
fact mainly the industrial

[00:45:52.400]
side not too much the consumer side so what

[00:45:59.300]
are members of those who have

[00:46:01.300]
different kind of logistics places

[00:46:03.900]
where there's more sensors and machine decks on

[00:46:06.000]
vehicles rail cars shipping

[00:46:08.500]
pallets and Silver mobile

[00:46:10.500]
devices alright thank you Mark

[00:46:12.500]
for that will folks

[00:46:15.000]
don't be in the audience say thank

[00:46:17.099]
you for sending it to questions and let's get going here

[00:46:19.199]
and then shot I think I took a

[00:46:21.199]
good question for you here so what's up with you

[00:46:23.300]
so I have a question

[00:46:25.300]
Carla

[00:46:27.500]
says I would like to know what

[00:46:30.000]
type of tools do you use to recommend

[00:46:32.699]
to keep track of metadata data

[00:46:35.199]
quality and business rules

[00:46:38.500]
I know I'm happy to answer that

[00:46:40.500]
but I think there are the

[00:46:43.099]
first thing you have to understand in that process is

[00:46:45.400]
what types of data and

[00:46:47.800]
what types of people want to be

[00:46:49.800]
able to work with that metadata

[00:46:52.300]
that information there

[00:46:54.400]
are I think a number of very good tools

[00:46:56.699]
on the market from

[00:46:58.800]
the it side of the

[00:47:01.000]
house which handle metadata or

[00:47:04.099]
pulling information from

[00:47:06.199]
various different forces stitching them together

[00:47:08.199]
being able to work with the are diagrams and

[00:47:10.199]
what not but I think

[00:47:12.199]
the broader

[00:47:15.000]
trend is that those those capabilities

[00:47:17.000]
need to be able to Broad out be

[00:47:19.099]
brought out not only to the ITT but

[00:47:21.400]
to the business and so I

[00:47:23.599]
think not only some of the other folks

[00:47:25.900]
on the other vendors as well but

[00:47:28.099]
impact sawdust case we

[00:47:30.199]
actually have meditated

[00:47:32.400]
minute management facilities as

[00:47:34.500]
part of a broader system

[00:47:36.500]
that integrates state of koala

[00:47:38.400]
data profiling data integration

[00:47:40.500]
excetera really depends

[00:47:42.500]
on a dedicated capability

[00:47:44.699]
that I want is kind of a hub or is it

[00:47:46.699]
something that I want to embed as part of a

[00:47:48.800]
broader Suite of information

[00:47:52.500]
yeah let me get your take on something

[00:47:55.000]
then you know what

[00:47:57.099]
I'm hearing from a lot of our members I

[00:48:05.500]
think there's two things people really need for Self

[00:48:07.599]
Service besides the

[00:48:09.900]
glossary

[00:48:17.400]
and I think they need to just the right in for use

[00:48:19.400]
the tools in usual tools that are highest

[00:48:21.599]
views I

[00:48:29.800]
think it's absolutely critical Philip you

[00:48:32.900]
know the the way that we looked at the

[00:48:35.000]
market there are capabilities that

[00:48:37.000]
you need for understanding what data

[00:48:39.199]
that you have right and knowing

[00:48:41.400]
that the data sources

[00:48:43.800]
these are the different

[00:48:46.000]
data structures that are inside. And

[00:48:53.900]
then there's the ability to actually manipulate that

[00:48:56.000]
that data and I think that

[00:48:58.000]
the only way

[00:49:00.199]
that we can get value from data is to

[00:49:02.300]
provide as much contact as far as

[00:49:04.400]
possible when you're actually working with

[00:49:06.599]
that data and so we

[00:49:08.800]
have started working with a number of the catalog

[00:49:11.000]
and vendors actually embed

[00:49:13.000]
the business glossary

[00:49:15.400]
type information after the

[00:49:17.500]
date of transformation so that the end-users

[00:49:20.099]
actually have the context of what Does the

[00:49:22.099]
steel mean what are the rules around it

[00:49:24.300]
what are the store system to come from and

[00:49:26.500]
they can leverage that knowledge to

[00:49:28.599]
thin sculpting cremation into something useful

[00:49:30.699]
for an analytic use case for example

[00:49:39.900]
that was a question

[00:49:43.400]
for Carlos and carpet is asking what

[00:49:45.699]
is Hadoop or spark

[00:49:47.800]
trend for companies to use Python

[00:49:52.099]
what's the difference card

[00:49:54.699]
yeah I think what's interesting about to do cuz

[00:49:56.699]
there's a lot of headlines

[00:49:58.800]
out there about his

[00:50:02.099]
lost it or

[00:50:05.000]
whatever but if you look at the data service

[00:50:07.300]
I was the one from September

[00:50:09.400]
of last year from Gardner there's actually 72%

[00:50:12.400]
of organization from the survey

[00:50:14.500]
had either already or

[00:50:17.000]
replying Implement within the next 6 months or

[00:50:19.099]
something like that so absolutely

[00:50:25.300]
the most popular diary

[00:50:31.599]
places in many ways the Mac reduce

[00:50:33.599]
jobs with a lot of people running

[00:50:35.699]
on at Duke which will still be in place

[00:50:37.699]
for another bathtub to work clothes

[00:50:39.900]
but different

[00:50:43.699]
types of data pipeline not just

[00:50:54.800]
I think still by far the most popular

[00:50:57.300]
storage platform if you can't park

[00:51:09.900]
in

[00:51:20.099]
particular drizzle and library

[00:51:25.300]
for ANSI standards

[00:51:36.500]
yeah I know. Good question so I think

[00:51:38.599]
there's lots of us equal

[00:51:40.699]
variance and two of them projects out there

[00:51:42.800]
as I think the most popular in order if I remember

[00:51:45.000]
correctly are tithes

[00:51:47.000]
and Paula Sparks

[00:51:49.099]
equals right in their Apache drills really

[00:51:51.199]
close together still

[00:51:53.300]
and always others out there and I think level

[00:51:56.500]
if you want big jobs

[00:51:58.900]
very reliable Tremor

[00:52:02.000]
causes a great engine for that it's not

[00:52:04.000]
just Korean but also do transformation

[00:52:06.599]
some cases that are

[00:52:12.800]
cute Adidas feeling but

[00:52:15.300]
then there's Impala does similar

[00:52:18.199]
to drill in that it leverages memory and it's

[00:52:20.199]
faster to Mormon interactive analytics

[00:52:22.199]
and I say Sparks equal

[00:52:24.199]
made some of that but my recent

[00:52:26.800]
memory I don't think Sparks equals quite as Materia

[00:52:29.099]
and Sons of all the answer you variance

[00:52:31.400]
and you know specific functions

[00:52:33.699]
are supported somebody

[00:52:38.699]
said once it takes you 7 years to really develop

[00:52:41.599]
a strong database and all the way

[00:52:44.099]
around the sequel optimizers

[00:52:46.300]
no such a thing Soto things

[00:52:49.099]
out there I think it depends on

[00:53:01.400]
yeah I got those are good details thank

[00:53:03.400]
you for that Mark Mark

[00:53:05.699]
vanderweele from HDR software I

[00:53:07.800]
got a question for you here here we

[00:53:09.900]
go Trina dimitrina

[00:53:12.500]
ask where do you fit

[00:53:14.500]
data scientist in to let

[00:53:17.300]
me try that again into

[00:53:20.099]
what you're talking about

[00:53:21.599]
yeah that's it that's

[00:53:23.699]
a great question and I think there's just two

[00:53:25.699]
ass back to to

[00:53:28.199]
to address that question on

[00:53:30.300]
the one hand I think that the data

[00:53:32.300]
science really comes after the data collection

[00:53:34.599]
to once the data is

[00:53:36.699]
available for a in

[00:53:40.300]
a central data store where there

[00:53:42.400]
is the ability to start combining today

[00:53:44.400]
that with other what day is it from other sources

[00:53:47.300]
whether it's a coyote whether it's traditional

[00:53:50.199]
applications whether it's a legacy

[00:53:52.199]
system that was brought into

[00:53:54.300]
the day to like that absolutely We're

[00:53:56.300]
alone for the day to sign this weather is

[00:53:58.500]
today that volume available

[00:54:00.699]
at scale but then at the same time

[00:54:02.900]
we see at the month

[00:54:04.900]
for data science

[00:54:07.199]
on streaming data is well

[00:54:09.199]
aware we're really driving towards

[00:54:12.199]
an environment

[00:54:14.300]
where the

[00:54:16.300]
the latency between today's

[00:54:19.300]
originating and then I have

[00:54:21.500]
access to it from a day to science affected

[00:54:23.900]
with we're driving down the latency

[00:54:26.300]
2-0 that as much as we

[00:54:28.400]
can so that's where there's

[00:54:31.400]
a bit of a trade-off and I think it really depends

[00:54:33.699]
on the UK that really depends to

[00:54:35.900]
a certain degree on the industry but

[00:54:37.900]
at the end of the day it comes down to what's

[00:54:40.300]
the business value and whether

[00:54:42.800]
the data science fit in from

[00:54:45.699]
the perspective as to where

[00:54:47.900]
where are the five month

[00:54:50.599]
okay excellent great question we

[00:54:53.699]
have time for a one more question here

[00:55:04.099]
okay

[00:55:07.900]
so Aaron date

[00:55:12.300]
of preparation to all and give

[00:55:17.400]
him the name of your company what

[00:55:21.199]
should you look for in a new

[00:55:29.099]
data exploring

[00:55:31.500]
it and also we

[00:55:40.199]
look for a good date of preparation School

[00:55:43.800]
yeah so it's

[00:55:46.099]
a good question because in

[00:55:49.400]
a play White stupid question because the

[00:55:53.199]
call philosophy around

[00:55:55.300]
date of preparation

[00:55:57.400]
pools in the

[00:56:00.300]
philosophies that you know he apparently we

[00:56:02.500]
generally don't generally

[00:56:05.400]
do not I

[00:56:08.300]
believe it even

[00:56:10.400]
though I think that I know

[00:56:12.599]
six out of here answer

[00:56:14.900]
that question probably better and

[00:56:17.199]
you know clap for is a great tool and so on.

[00:56:19.900]
Many many grade schools but

[00:56:22.199]
I'll tell you a bit about about

[00:56:25.300]
our date of preparation in

[00:56:28.400]
the way we see if

[00:56:30.400]
so

[00:56:33.099]
it begins with

[00:56:35.400]
the extraction

[00:56:38.300]
extraction the process

[00:56:42.400]
from

[00:56:44.599]
the production servers are great guys do whatever

[00:56:46.599]
to your data

[00:56:48.900]
infrastructure today

[00:56:56.099]
that you got to take care of you

[00:56:58.199]
know build your make

[00:57:03.800]
sure your data is stored in the

[00:57:07.800]
can you until

[00:57:10.300]
the actual extractor the date of where

[00:57:12.300]
the optimization query-optimization Manor

[00:57:23.900]
we think that the main

[00:57:26.000]
challenge there is that the

[00:57:30.000]
actual extractor of the value being

[00:57:32.099]
the date of Sciences

[00:57:34.000]
is

[00:57:36.199]
not the stakeholder in

[00:57:38.900]
the date of preparation in the date of

[00:57:40.900]
preparation faces date of preparation phase the actual

[00:57:43.599]
stakeholder is the data

[00:57:45.800]
engineer or DBA or whatever

[00:57:48.599]
what do you have

[00:57:50.599]
what these two parts are differentiated

[00:57:54.199]
that makes that

[00:57:56.300]
the between makes

[00:57:58.400]
a big problem in terms of your scale building

[00:58:00.400]
up your infrastructure and still believe you're positive

[00:58:02.500]
that's where we ready

[00:58:06.500]
fractal analysis so

[00:58:08.699]
we solve

[00:58:10.900]
this challenge with receiver

[00:58:13.199]
and that's why inside panoply there's

[00:58:15.400]
no data preparation basically

[00:58:18.599]
understand the data understand

[00:58:20.900]
the logic and then do all the data

[00:58:22.900]
preparation for you that

[00:58:26.199]
being said there are a lot of great tools

[00:58:28.400]
for building

[00:58:30.400]
process process we

[00:58:32.500]
mention stones in pounds is a great one work

[00:58:36.699]
with our partners at Stitch returns which are

[00:58:38.699]
great date of preparation pools

[00:58:41.099]
Newmar another great

[00:58:43.300]
tool there are many many great pools inside

[00:58:45.400]
the space for dinner preparation specially

[00:58:47.500]
around the EPL EPL team processes

[00:58:54.199]
process if

[00:58:56.699]
your company really wants to scale and

[00:58:58.800]
in 3 4 5 years from now I

[00:59:01.000]
don't think anybody will be doing it the same way they're doing it

[00:59:03.099]
today I hope that answers the question

[00:59:07.199]
yeah that was a pretty good run around and

[00:59:09.199]
everybody

[00:59:31.400]
for attending today you been listening

[00:59:33.400]
to it talk about the evolution

[00:59:35.400]
of data is management and water

[00:59:43.300]
from a lot of people here I want to thank the panelists

[00:59:45.800]
including Steve Rutledge from

[00:59:47.800]
Arcadia data Mart vandewiele

[00:59:50.099]
from hvr software and

[00:59:54.099]
By Golly

[00:59:56.199]
Wow data

[01:00:07.500]
management for Big Data Hadoop and

[01:00:10.099]
data likes goodbye