June 26, 2018 - Randy Lea | Big Data Ecosystem

When is Architecture Important for BI on Big Data? When it Saves You Experts, Money and Time

Would you try to ski down a mountain with water skis?  Would you try to ski on a lake with snow skis? After all, they are both skis, right?  I’m sure whether you’re a snow skier, water skier, or both, your answer is “no”. Why?  Although similar, they are both architected differently. Snow skis are narrow and have metal edges to dig into the snow in order to carve out a turn.  However, in water, the narrow surface of the snow ski would be hard to plane on the water and the sharp metal edges would cut through the water making it even more difficult to keep from sinking.  Conversely, a water ski is wider, making it easier to plane on the water and the edges are rounded allowing the ski to glide through the water allowing an easier turn. However, the smooth edges on a water ski would be very difficult to catch an edge in snow and more than likely you’d just slide down the hill.  Could either be done? I’m sure you could find an expert, give them enough money and time and they could do one or the other or both. However, they’d look fairly foolish compared to the other skiers using their “built for a purpose” skis.

I was fortunate enough to have a long and successful career at Teradata where architecture still plays in their market leadership.  Much like a ski is a not a ski, we educated the market that a database is not a database. When relational databases were first architected they were designed for online transaction processing (OLTP) with fast single record updates.  Examples would be updating your bank account from an ATM withdrawal or updating accounts payable with a payment. So, the architecture was to highly index a single record for quick access retrieval and update. Teradata, on the other hand, was designed for decision support, later called data warehousing, and focused not on a single record but the ability to scan billions of records to extract insight.  So, the Teradata parallel relational database was architected from inception to spread data across 100’s of Intel 8085 chips (look that chip up in the Intel history books) connected to a single hard drive where a SQL query was executed in parallel to all Intel chips at once and the result from each chip was assembled and presented in seconds compared to minutes and hours in an OLTP database. The most powerful mainframe or OTLP database didn’t stand a chance.  Could you make an OLTP database scan billions of records? With enough experts, money and time you could, but again many companies looked foolish and failed; giving way to the creation of “two database standards” in companies, one database for OLTP and one database for a data warehouse.

So, when is a business intelligence (BI) tool not a BI tool?  When a data warehouse BI tool is applied to data lake technology such as Apache Hadoop or cloud object stores such as AWS S3 and Azure ADLS (Azure Data Lake Store).  BI tools were architected for data warehouses and integration with a relational database. Due to relational parallel database technology not allowing other software to run on the database nodes themselves, BI tools were architected in a two-tier architecture for both cube or aggregation data management and visual rendering.  This architecture is highly optimized for a data warehouse but it does require loading data in two places, securing data in two places, and optimizing data for processing performance in two places which can be complex and is costly. Can you get data warehouse BI tools to work with Hadoop? Yes, with enough experts, money and time as they treat Hadoop just like any other database but you’re still required to load data in two places, securing data in two places and optimize the data in two places.

Arcadia Enterprise is BI on Hadoop done right. It was architected from inception to be a hyper parallel BI and visual analytics engine that runs natively on all of your Hadoop data nodes where data is loaded once, secured once, and optimized continuously in a single tier, never moving data out of the Hadoop cluster. Arcadia Enterprise is the most scalable BI engine out there, bar none because we run natively within your data lake, not external like data warehouse BI tools.

Bi Architecture


Our capabilities leverage your data lake scale-out resources and grow in performance with your data. Business users work directly with Arcadia Enterprise’s visual front end to connect to your Hadoop data directly, examine it by blending and running ad-hoc analysis, and build and share a common semantic layer.  They can build customized data applications with email integration, alerting & job scheduling with role-based access controls with native integration to Cloudera and Hortonworks to secure and monitor access to the system.

Imagine the power, performance, and reduction in data movement and complexity if Teradata or Netezza had an integrated BI server and visual front end that ran on every database node to leverage the indexing and advanced optimization functions in the database instead of moving data into an external BI server for aggregations and cubes.  That’s what you get with Arcadia Enterprise and your Hadoop or object store data lake. Since the Arcadia Enterprise visual front-end is integrated with our parallel BI and visual analytics engine, we monitor user usage to model and optimize the data in your data lake allowing for ELDT, where D stands for Discovery. Discover insights first, then Arcadia’s Smart Acceleration recommends and implements physical models of your data in Hadoop based on usage for quicker time to business value.  Arcadia Enterprise even recommends visuals to best represent and analyze your data based on cardinality and other attributes of your data.

So, Arcadia Enterprise was architected for scale-out systems such as Hadoop to help you save on experts, money and time while enabling you to realize the promise of big data analytics and Hadoop in delivering business value. No other analytics tool in the market does this today. With other solutions, companies are forced to cobble together a duct-taped version of their data warehouse BI tool on the data lake, and one that seldom survives the ever-increasing scale of data and hunger for its consumption.

This is why leading enterprise companies are now choosing “two BI standards”, one for their data warehouse and one for their data lake.  In fact, a major US financial institution recently made an Arcadia Data BI on Hadoop purchase decision for their data lakes and it was their only new BI tool decision in over 10 years.  The Tableau BI “skis” just did not work on the data lakes. So, think of Arcadia Data as “Tableau for your data lake” but with the linear scalability to support high-performance analytics on ALL of the data in your data lake without moving your data.  You can load Arcadia Enterprise in the morning and start discovering new insights in the afternoon.

See for yourself by downloading Arcadia Instant or learn how Smart Acceleration makes BI on Hadoop as fast and smooth as carving down a mountain with well-tuned skis designed for high performance!

Related Posts