When was the last time you had to explain to someone how to use an Internet search engine? Perhaps never? That’s not surprising considering that search engines are amazing in their simplicity from a user perspective while also delivering the outcomes we desire. Can we aspire to have a similar system for the data we use for our jobs?
Remember that when Internet search engines emerged in the 1990s, they were the first public displays of big data in action. Back then, the software systems were custom-built to handle the new type of data workload, but not in a “big data way.” The task of indexing and searching all World Wide Web pages used a traditional architecture, where you had a couple of massive computer servers doing all the processing. It wasn’t until the 2000s that distributed computing took off, in which you had a large cluster of small, inexpensive machines that enabled parallelized data crunching. As the web grew, search engines could keep up by simply adding more computers to the cluster. This was a key benefit of the system because old architectures were inefficient with scale and required hardware upgrades to handle larger loads. As fast as the web grew, a distributed cluster of commodity hardware was the only way to practically handle scale.
Most people didn’t really care about how the backend processing worked though, or how all web pages could be indexed into a single location. They were more enamored with the user-facing output, that is, a simple interface or “search bar” that anyone could use to find specific information on the rapidly growing web.
Over the years, the high-tech industry sought to build a similar environment for corporate users to get scale and processing power for big data analytics in an economical way. As many of us now know, most of the tools required to replicate that type of big data environment are available today. We have Apache Hadoop and variety of processing engines like MapReduce and Apache Spark to do interesting work on your huge volumes of data. But these systems are more about the backend processing – the part that mostly benefits the technical teams.
What about everyone else? Do we have the easy-to-use interfaces that let us ask questions in a natural way? Many business intelligence (BI) analysts might say that they’ve long had access to a variety of self-service BI tools on the market that make analytics relatively easy. Those tools, however, still require a significant learning curve to get started, and are typically used only by power users or what some now call “citizen data scientists.” Not only that, organizations find that those tools aren’t a good match for data stored in distributed platforms, so a significant burden is put on the IT teams to give analytics access to business users. Things like continual data movement, duplication of effort, and compounded security and data governance issues made the overall effort more troublesome than it should be. Ultimately this meant that business users were limited in their ability to access live data at scale. At the same time, this also meant that self-service could not be achieved since there were tasks that necessarily required IT expertise.
The pursuit of making big data analytics easier for all users is the key mission at Arcadia Data. Our next major chapter in that journey is the “search-based BI and analytics” capability that we announced today. This lets any user ask questions as if they were using an Internet search engine, and they can get answers in a visual format. To be clear, this is not “search” as in “full-text search” (ala Apache Solr or Elasticsearch), where answers are lists of documents that are deemed relevant to the question based on a term-matching algorithm. Rather, this is similar to an Internet search where you type almost any question using natural language and get an answer. For example, you can type “what is the population of San Francisco” in either Google and Bing and you’ll get a clear answer. This is what we’re enabling with our search-based BI fully integrated in our existing self-service BI platform.
Take a look for yourself on this page, where we have a video that shows search-based BI in action. Do you think this would make self-service BI truly self-service and get you out of the game of building reports for end users? If this is something that your organization can use, please reach out to us.