Learn about the latest big data analytics and BI trends in Apache Hadoop, the cloud, data lakes, IoT analytics, data visualization, and more as you browse through these insightful posts.

September 4, 2018 - Dale Kim

Big Data Analytics the Way It Was Intended with Search-Based BI

When was the last time you had to explain to someone how to use an Internet search engine? Perhaps never? That’s not surprising considering that search engines are amazing in their simplicity from a user perspective while also delivering the outcomes we desire. Can we aspire to have a similar system for the data we…

August 30, 2018 - John Thuma

What about Excel? Can I connect Excel to Hadoop?

A few days ago someone asked me why people aren’t using more Microsoft Excel with ApacheHadoop. I started to think about it and on the surface it made sense. After all, the biggest databases and most important applications in the world run on Microsoft Excel, right? Some will laugh at this statement but there is truth to it….

August 23, 2018 - John Thuma

How to Leverage Real-Time Streaming Analytics for Surveillance

Co-written by Paul Lashmet Banking surveillance is often fragmented in which distinct surveillance teams monitor customer interactions, electronic communications, market activity, voice recordings, building sensors, video feeds, and social media within silos. This limits the ability to spot sophisticated risk activities that cross multiple locations, lines of business, and functional areas. Our earlier posts describe…

August 14, 2018 - John Thuma

How Traditional BI Nearly Killed Hadoop

“Traditional Business Intelligence tools are killing Apache Hadoop!”   There… I said it! (Keep reading…. There is a solution) Traditional business intelligence (BI) tools such as Tableau, Qlik, and others evolved with traditional relational database management systems (RDBMSs). Both use ANSI SQL for querying (i.e., retrieving) data for analysis.  Traditional SQL databases include products like…

August 9, 2018 - John Thuma

What is Apache Solr

This blog was first published on Medium. A LITTLE HISTORY: Developed by Yonik Seeley in 2004, Solr was an in-house project at CNET Networks to provide search capabilities for its company website. CNET Networks then donated it to the Apache Software Foundation in 2006. In 2009 Yonik Seeley joined Lucidworks which provided commercial support, training, and consulting…

August 7, 2018 - Richard Tomlinson

Typical Cloud BI Deployment Patterns with Arcadia Data

An increasing number of our customers and prospects are asking us what options we provide for cloud BI with Arcadia Enterprise. Some customers are just experimenting, others are moving their test and dev environments off premises, and a few brave souls are all in on cloud for their enterprise production applications. The key driver for…

August 1, 2018 - Dale Kim

Is the Big Data Bully Impairing Your Analytics?

Do you know what big data is? Do you excel in big data analytics? What if I told you that you don’t actually know what big data is about? You might think you do, but you don’t. Or maybe you DO know. Interestingly, I’ve found that many people have a misconception about big data. Think…

July 31, 2018 - Paul Lashmet

Keys to a Successful Alternative Data Strategy

Asset management firms increasingly leverage alternative data to enhance their investment strategies and gain an informational edge over the rest of the market.  It is becoming the normal course of business to use new types of data in alternative ways and soon “alternative data” will simply be referred to as “data.” The ability for one firm…

July 26, 2018 - Paul Lashmet

Consolidated Audit Trail: Outside Looking In

The primary purpose of the Consolidated Audit Trail (CAT), a rule under the Securities and Exchange Act, is to arm regulators with the data they need to effectively conduct market surveillance and investigations into suspicious trading activities across all national exchanges.  The difference between this and current trade reporting regimes is that it covers more…

July 26, 2018 - John Thuma

FNU: For Non-Unicorns: What is Apache Spark?

This blog was first published on Medium. WARNING:  This is not for the high-tech unicorns, you mythical beasts who sparkle SQL and Java and make code bloom wherever you go. This is for the regular person who wants to understand Apache Spark at a pedestrian level. There are many resources online that help you take a…

July 24, 2018 - John Thuma

Three Ways Apache Kudu Supports BI on Apache Hadoop

This article was first published on Medium. Apache Kudu is a columnar storage system developed for the Apache Hadoop ecosystem. Kudu runs on commodity hardware, is horizontally scalable, and supports highly available operation.   Apache Kudu has a tight integration with Apache Impala, providing an alternative to using HDFS with Apache Parquet. Before Kudu existing formats…

July 20, 2018 - Shant Hovsepian

Five Things Soccer Analytics Teaches Us About Data Lakes

This blog was first published on Forbes. With the World Cup upon us, it’s an apt time to draw inspiration from soccer. In 1950, Charles Reep, an accountant, attended every game of the Swindon Town soccer team’s season, tracking events and recording statistics. He analyzed his data and concluded that long passes were the most effective way…