March 5, 2019 - Dale Kim | Big Data Ecosystem

Natural Language Query on Your Data lake

Today’s world of big and diverse data is forcing the BI market to go through some significant upgrades. Enough change has occurred over the years that newer labels like “visual analytics,” or “analytics and BI,” or “modern BI” emerge to designate a new wave of innovation. Interestingly, we’ve already seen some of the recent analytic innovations in other contexts. The notion of natural language query (NLQ) is a great example. Many of us have seen NLQ in action, especially in internet search engines, and we know it works well for any type of user without requiring special training. You simply ask the questions you want answered in a casual way and the search engine figures out what you mean and gives you answers. It makes sense that this query model should be applied to enterprise data.

Another capability we’re familiar with is geographical maps in a web-based application. So much data today is geo-encoded, so being able to view data in a geospatial context lets you see insights that you would not otherwise see in other visuals. Whether you want to see hot spots of certain phenomena like earthquakes, or economic data per county, seeing that data superimposed on a map gives you a much better idea on how the data points relate to each other.

Last fall, we announced two separate capabilities. First was the NLQ capability, which we call “search-based BI.” Second was our dynamic, high scale mapping capability leveraging an integration with Mapbox GL. These give you the power and simplicity of internet search in our flagship product, Arcadia Enterprise. Other vendors had these capabilities, and more have recently announced them, so clearly these are interesting components for today’s analytics. But not all NLQ and mapping products are alike. The challenge entails scaling to give users unfettered access to ALL available data. You may recall that Google wasn’t the first search engine, but they won the race with the right architecture.

We believe we are fulfilling an “internet search in the enterprise” objective via our architecture. For search-based BI to work well, you need to promote three main objectives. First, you need easy scale. For our customers, that starts with a data lake built on a distributed architecture that can cost-effectively scale by adding new nodes to the cluster. By having many enterprise data sets available in your data lake, a larger community of end users can ask more questions. Second, you need the ability to immediately run live queries on all your data, without first having to spend time putting data into an optimized format. You first search all your data and let the system optimize performance for you. Third, you need a modern self-service BI model that eliminates much of the IT effort that is otherwise required in RDBMS-based analytical environments. By loading raw data into a data lake and adding structure as you explore the data (a paradigm known as “schema-on-read”), you remove much of the time-consuming, modeling and remodeling negotiations between the business team and the IT team that is common in a data warehouse environment.

Of course, scale and self-service are only a start. Our search-based BI capability can be useful in a variety of stages in an analytic workflow. For example, you might want to provide a “search bar” to your casual users to ask any question they want, but what if that search bar were also included as part of a dashboard of commonly needed KPIs and visuals? The dashboard visuals give users the answers they need on a regular basis, but they also have the ability to ask new ad hoc questions that arise. Also, search-based BI through NLQ can be a powerful tool for data scientists and analysts who aren’t necessarily familiar with all the data in their data lake. Asking NLQ questions gives them a way to easily explore the data in the data lake with more freedom than traditional BI interfaces. And finally, analysts who need to quickly build dashboards for production analytic applications can use search-based BI as a quick and easy way to develop visuals in the dashboard. Again, the advantage is about letting almost anyone get started without a steep learning curve.

Our search-based BI capability has a number of features to make the NLQ processing very powerful. It recognizes comparison statements (e.g., includes, between, >, <), aggregation terms (e.g., median, stddev), order and ranking terminology, time, and stemming, among others. You can set up synonyms so that if your database tables have an obscure name like “ppl_count” you can attach the more meaningful “population” to it. Or if your users simply use different terms for the same concept, you can make sure there’s a match. Another powerful feature is the word expansion capability, in which you can assign abstract concepts to specific data points. For example, you might assign “need repair” to “equipment status > 120” so users can ask questions as simple as, “which locations need repair.”

When it comes to geospatial analytics, we’ve integrated with Mapbox GL to deliver rich map-based visualizations within Arcadia Enterprise. The difference with our implementation from others is that we built our system to handle more data points, so you can get the granularity of billions of records and hundreds of thousands of geographic boundaries. What that means is you get finer micro-segmentation of your data, so for example you aren’t limited to data that is aggregated at a county level, but you can even go to a neighborhood level for a much greater level of detail. Mapping in Arcadia Enterprise is highly dynamic, so you can change dimensions on the fly, unlike other mapping solutions that only cover static data. For example, you can view average income levels in each of your geographic boundaries in one instant, then switch to education level in the next. This rendering speed also lets you view real-time geospatial data sources. And with Mapbox Studio, you can define your own custom geographic boundaries so you can analyze data in your own custom-defined regions, and you are not limited to only using someone else’s predefined regions.

If this sounds interesting, you can see all of this for yourself firsthand. Start by downloading Arcadia Instant so you can experiment with both search-based BI and mapping. In this desktop version, you won’t get the full power of our Arcadia Enterprise, but you will get a good sense of how these capabilities might fit into your analytics environment. Since Arcadia Instant is available for free, unlimited use, you don’t have to worry about time-limited trials. Check out our articles Getting Started with Search-Based BI and Getting Started with Dynamic Mapping on Mapbox GL and Arcadia Data to help you explore our software. And if you want to learn more, please don’t hesitate to contact us.

Related Posts