February 7, 2019 - John Thuma | Big Data Ecosystem

Five Things That Make a Great Universal Semantic Layer

The latest buzzword or phrase in big data and business intelligence (BI) today is the “universal semantic layer.” So what exactly is a universal semantic layer, or USL, and what problems does it solve? To borrow another vendor’s perspective shared in an announcement about its universal semantic layer technology, Matt Baird put it simply: “Historically, data stored in data lakes goes unused because the organization has not figured out a way to match the performance, security, or business tool integration they created with their legacy system.” Though true, what he refers to is not a semantic layer, nor is it a universal semantic layer! And he might be giving traditional tools too much credit regarding performance. That’s because traditional BI tools cannot perform well with large data sets even if they are on a traditional database platform, massively parallel system, or a modern data lake environment like Hadoop or cloud storage. This is true regardless of semantic layer. What Matt really means is he thinks a semantic layer should be secure, fast, and usable by any BI client tool that uses it. These BI tools include Tableau Desktop, Qlik, Business Objects, and Power BI. This is a universal access layer and not necessarily a universal semantic layer.

So what is a semantic layer anyway? Wikipedia defines a semantic layer as “a business representation of corporate data that helps end users access data autonomously using common business terms.” In other words, it translates technical metadata into friendlier business metadata. Business metadata is often highly esoteric and hence not universal, even within a department let alone a large enterprise. Think of a semantic layer as a lens into your physical data. One lens cannot possibly fit all scenarios. As things like organizational structure, business goals, vernacular, topology, and nomenclature change, so too must the semantic layer it represents. It is never a one-size-fits-all approach, and the semantic layer is in a constant state of entropy. This means you have to be very clear on your expectations with a universal semantic layer.

The remainder of this blog provides five features required for a universal semantic layer. These include connect anything, virtual modeling, version control and governance, defense in depth security by default, and performance and concurrency.

So marketing fluff aside, your are more likely to find Bigfoot before you will find a true universal semantic layer

Connect Anything: Being able to connect any BI client tool is a must-have for any universal semantic layer. A proper USL must also have an exposed programmatic and data access application programming interface (API). A USL cannot have semantic layer lock, meaning that data about the semantic layer cannot be stored in a format that only the manufacturer has knowledge of; it must be open. The metadata must be stored in JSON, Parquet, or other format that is exposed to all users and interfaces of varying technical complexity. Of course ODBC and JDBC drivers are a must-have!

Virtual Modeling: Do I define business rules into a universal semantic layer? Maybe! The reason we have so many different semantic models today is because of their rigidity. What does that mean? Well let’s take a very simple example. Every for-profit business has multiple definitions for net profit. Each department or geographical region can have a different lens to what net profit truly represents. They have different business rules that define them. They have different vernacular and lexicons that describe them. My point is the base data objects that comprise all the permutations that make up net profit must be included in the semantic layer. Not only the fields but the terms and language we use to describe net profit can also be a one-to-many. So my universal semantic layer must be have a layer on top of itself that is virtual to the elements that make up the business rule and the terms and vernacular that describe them. Universal semantic layers cannot contain static business rules that are monolithic.

Version Control and Governance: We must be able to track the lineage and history of things that make up a universal semantic layer. A proper USL will automatically track changes like a general ledger tracks changes to an account. This is vital as we cannot compare reports today to reports from five years ago. This is true even with the same vernacular that describes our business rules. So let’s take our example of net profit once again. Today net profit might mean something completely different than it meant five years ago. So even though they might share the same name, comparing the two would create a false comparison. The only way to be able to make the comparison is to understand the difference historically. I would need to look at the version of net profit from five years ago and change the business rules from that time to show me the same calculation created today. Version control at our universal semantic layer must enable business to be able to compare ORANGES to ORANGES!

Defense in Depth Security by Default: The universal semantic layer must be controlled and contains data about the data in an organization. Data is one of the most valuable assets all organizations have and so it must be guarded. Check out rule number one, in the 10 Rules for Data of Any Size, to understand why. The universal semantic layer can provide the keys to the kingdom so having LDAP, authentication and authorization, Access Controls (ACLs), secure at rest and in-motion protections, as well as row-level security is a must. This semantic layer must ensure that any BI tool, program, or API use is controlled and logged for access.

Performance and Concurrency: Ultimately the universal semantic layer is going to be used to render cubes, materialized views, or other objects necessary to make big data fast! A valuable universal access layer will provide features that enable organizations to use them regardless of data size. Our data landscapes are enormous, but today’s BI tools cannot keep pace with required non-functional requirements such as raw reporting speed and concurrency. This optimization layer must be automatic as well as controlled by human interaction because it is impossible to speed up everything! It must take into account human behavioral skew when it comes to data appetite based on changing business demands. To that end, I must control what I can optimize as all use cases do not always fit nicely into a bell curve. There is always behavior skew when it comes to query activities! Check out this piece on materialized views to understand what this means.

Conclusion: Keep Looking for Bigfoot
So marketing fluff aside, you are more likely to find Bigfoot before you will find a true universal semantic layer that fits the five features described above! Why? Organizations are dynamic and under constant effect of change. A USL that fits all use cases, hierarchies, changing data relationships, and organizational metrics is a lofty goal! The universal semantic layer is hardly reachable as there are so many disparate tools today that do many of these things but not all collectively! Basically, a true universal semantic layer will provide traceable flexibility that is secure and fast. We will evolve! What we really need is a multi-tool that enables business to generate their own semantic layer without complexity. This is where Arcadia Data fits in! We give you a semantic layer UI that business analysts can use to share expertise on data sets with other analysts. In addition, our semantic layer is the basis for automated dashboard acceleration that lets you deploy your big data dashboards into a production environment with hundreds of business users.

There’s so much more to discuss, so contact us to see if we can help with your big data BI issues.


Related Posts