Category: Big Data Ecosystem

July 26, 2018 - John Thuma

FNU: For Non-Unicorns: What is Apache Spark?

This blog was first published on Medium. WARNING:  This is not for the high-tech unicorns, you mythical beasts who sparkle SQL and Java and make code bloom wherever you go. This is for the regular person who wants to understand Apache Spark at a pedestrian level. There are many resources online that help you take a…

July 24, 2018 - John Thuma

Three Ways Apache Kudu Supports BI on Apache Hadoop

This article was first published on Medium. Apache Kudu is a columnar storage system developed for the Apache Hadoop ecosystem. Kudu runs on commodity hardware, is horizontally scalable, and supports highly available operation.   Apache Kudu has a tight integration with Apache Impala, providing an alternative to using HDFS with Apache Parquet. Before Kudu existing formats…

July 20, 2018 - Shant Hovsepian

Five Things Soccer Analytics Teaches Us About Data Lakes

This blog was first published on Forbes. With the World Cup upon us, it’s an apt time to draw inspiration from soccer. In 1950, Charles Reep, an accountant, attended every game of the Swindon Town soccer team’s season, tracking events and recording statistics. He analyzed his data and concluded that long passes were the most effective way…

July 19, 2018 - John Thuma

The Data Science Iron Triangle – Modern BI and Machine Learning

Originally posted here. The New Iron Triangle It is cliché to discuss IT/business solutions as people, process, and technology. Some call it the “golden triangle,” but in this blog, we refer to it as the iron triangle. Since the 1960s, technology has disrupted business through the advent of computing and information management. These systems replaced…

July 17, 2018 - Steve Wooledge

Three Surprises about Data Lakes, Hadoop, and the Cloud

If you’ve been paying attention to trends around Apache Hadoop, data lakes, big data analytics, and the cloud, you’ve probably noticed the see-saw hype around each of these. In 2012, there was no end in sight to what Hadoop could do, and organizations were beginning to build data lakes to augment or replace data warehouses…

July 10, 2018 - Dale Kim

What’s the Difference between Hadoop and a Data Lake

I recently participated in a webinar hosted by DBTA titled, Unlocking the Power of the Data Lake, where one of the audience members asked, “will data lakes be replacing Hadoop in the future”? I think the three speakers sufficiently answered the question on the webinar, but considering that many others might have similar questions, I…

June 12, 2018 - Paul Lashmet

Alternative Data Strategy: How and Why

An alternative data strategy is a collaborative, iterative, and exploratory process that is driven by domain subject matter expertise. That is our take from the research survey that we commissioned from Greenwich Associates to understand how buy-side portfolio managers, chief investment officers, and fund managers “Put Alternative Data to Use.” Our primary focus was to…

June 5, 2018 - Dale Kim

Democratizing Big Data: The Power of a Unified View of Data as a Competitive Tool

Industry experts and data-driven corporations around the world know there are far too few data scientists to meet the current demand, and those that are available are expensive. As a result, we’re seeing the emergence of power users referred to as “citizen data scientists” who are able to leverage powerful big data analytics tools. This…

May 29, 2018 - Dale Kim

Simplifying Big Data Analytics Acceleration

In the blog post titled, Beyond the Cube: Embrace Analytical Views, we discussed how analytical views represent a new way to accelerate queries in a production environment. The next blog in the series, A Closer Look at Query Acceleration with Analytical Views, discussed analytical views in more detail in how to set them up. In…

May 24, 2018 - Dan Woods - Evolved Media

The Best Ways to Get Value from Data Lakes

This post was guest-written by Dan Woods, CEO of Evolved Media & Chief Analyst of Early Adopter Research. Dan creates ideas about technology products, based on a broad technical understanding. By writing as an analyst in Forbes and working with Evolved Media’s clients, he sees the magic in technology and why it matters to IT buyers. We…

May 22, 2018 - Dale Kim

The New World of the Citizen Data Scientist

You might be familiar with the concept of citizen journalism, which refers to “common citizens who play an active role in the process of collecting, reporting, analyzing, and disseminating news and information.” The term became commonplace in the late 1980s, due to open publishing, collaborative editing, and online content. This use of the term “citizen”…