BIG DATA VS. RISK
Real-Time Trade Surveillance in Financial Markets
Original Air Date: Oct 6, 2016
Who’s winning the deep forensic analysis ‘arms race’ for compliance?
Real-time trade surveillance in global financial markets has created a data tsunami.
With greater volumes of data comes greater compliance risk. CNBC reports U.S. Banks have been fined over $200B since the financial crisis. How are compliance teams fighting back to make more of the data and stay out of regulatory hot water?
Rapid response to suspect trades means compliance teams need to access and visualize trade patterns, real time and historic data, to navigate the data in depth and flag possible violations.
Join Hortonworks and Arcadia for this live webinar: we’ll cover the use case at a top 50 Global Bank who now has deep forensic analysis of trade activity. The result: interactive, ad hoc data visualization and access across multiple platforms – without limits on historic data – to detect irregularities as they happen.
In-depth expert presentations by:
- Shailesh Ambike, Executive Co-Chair of Compliance & Legal Section (CLS) Education Sub-Committee of the Investment Industry Regulatory Organization of Canada (IIROC)
- Vamsi K Chemitiganti, GM - Financial Services at Hortonworks
- Shant Hovsepian, Co-Founder and CTO at Arcadia Data
Q&A session follows the presentation.
David Fishman: Good morning, good afternoon, and good evening, everyone, and welcome to today's live webcast from Arcadia Data and Hortonworks on Real-Time Trade Surveillance in Financial Markets: Winning the Deep Forensic Analysis Arms Race for Compliance.
My name is David Fishman. I'm the marketing guy here at Arcadia Data, and with me today are a distinguished panel of guests, our very own Shant Hovsepian from the home team here at Arcadia. He is my colleague, [00:00:30] co-founder, and our CTO. Vamsi Chemitiganti. Vamsi is the general manager for financial services at Hortonworks. And our special guest is Shailesh Ambike. Shailesh is executive Co-Chair of the Compliance and Legal section of the education subcommittee at IIROC in Canada, the Investment Industry Regulatory Organization. I've gotten in trouble for saying it's like the American SEC, so I won't. [00:01:00] But Shailesh, in addition to playing a role as an industry thought leader has a day job at the Royal Bank Canada Capital Markets, and so in addition to helping direct the industry to contend with changes in markets and technologies, he actually has to make it work on the job every day.
So, a couple of quick comments on housekeeping. For those of you who've never attended a webcast, the mechanics are simple. [00:01:30] There's a question window. We'll leave some time at the end to get to as many questions as we can. The additional questions we do not reach, we don't have time to talk about, we will answer them offline and post them on our respective blogs, so rest assured that we will get to your questions whether we get to them live or not. And with that, let's take a quick look at the agenda. We'll start with some background from Mr. Ambike about [00:02:00] what is going on in the markets, how technology has created kind of a feedback loop of changes in behavior and risk and how that affects the reality.
And then from there, we'll turn to our technologists, Shant, and Vamsi to talk about the role of big data and the things we've learned from working with our customers at Arcadia Data and at Hortonworks about [00:02:30] how to solve for the kinds of problems that this new reality engenders. And then finally, we'll talk about a real-world customer use case about how to apply the technology and bring some real-world lessons right before we conclude with Q&A. So with that, it's my pleasure to introduce Shailesh Ambike from the CLS IIROC and the Royal Bank of Canada Capital Markets. Shailesh, over to you.
Shailesh Ambike: Thank you [00:03:00] very much, David. Appreciate the introduction, and good morning, good afternoon, good evening to everyone. My name is Shailesh Ambike. I'm Co-Chair of the CLS Sub-Committee on Education here in Canada. We're a large group, a large organization that represents compliance from Canadian broker dealers, foreign broker dealers that operate in Canada. And as David mentioned, it is sort [00:03:30] of like ... I represent IIROC, which is our investment regulatory organization, that is, to a certain extent, the Canadian version of the SEC. My day job, as David mentioned, I work in compliance. I'm in surveillance. I have a team that monitors market conduct in equity trading here at RBC Capital Markets. We have oversight over all flow of electronic, and cash, and equity trading, [00:04:00] as well as any of our discount brokerage flow in Canada and into the United States.
So, what I want to do today is just really present kind of a backdrop, kind of give you sort of a perspective on the landscape from the compliance community, what we're seeing market surveillance, market conduct, and the trends and changes, and how that's evolved over the years [00:04:30] to where we are now and the need for big data, big data analytics. And it's becoming increasingly not just a nice to have, it is almost a requirement based on the scrutiny that we're under with our regulatory community and the expectations that we're to uphold with regulatory standards, rules and laws.
So, can flip to the first slide. [00:05:00] Perfect. Thanks. Just to give an idea, so in Canada, we have what's known as Universal Market Integrity Rules, or UMIR. This has been in place since 2001, and a lot of Canadian rules are no different than US rules, market access rules and market conduct rules, so anything that I say here does have definite parallels into the United States, into the [00:05:30] FSA in the U.K., in other jurisdictions. RBC, as well as our foreign broker dealers, we all operate globally. So, our regulatory standards over the last 10 years has really been elevated. It's ramped up big time.
The requirements for broker dealers that are processing and trading on behalf of clients [00:06:00] or on a proprietary basis is to monitor everything: to monitor the orders, the executions, the trades. And our regulators make it a requirement that we review not only the insider trading, sort of the [inaudible 00:06:19], making sure that that type of fraudulent activity isn't happening, but also the granular equity executions that happen on multiple marketplaces. And the type [00:06:30] of requests and type of cases that regulators are taking broker dealers to panels and courts deal with spoofing, which is fictitious order volume or order entry, artificial pricing, which is essentially a high closing of stock, an issuer near the close to affect the valuation of the stock. Quote stuffing, [00:07:00] where you enter in algorithmically, hundreds of thousands of orders into various marketplaces in effect to jam up all of that traffic to those marketplaces to seek advantage, some type of arb type of effect between multiple marketplaces. So, it's essentially gaming the system, and any other type of non-bonafide fake [00:07:30] orders that are being sent to the marketplace.
One of the big ones that we're working on right now is multi-jurisdictional "manipulation," where you would have one security that's inter-listed in one jurisdiction and trades concurrently at the same time at another jurisdiction. Is there any type of way to manipulate the quote barring the effects differential to your advantage? And regulators are acutely aware of that. [00:08:00] They do speak to other global regulators. There is a board of a body, a more prudential body under IOSCO that has oversight of global standards to ensure that reviews and market conduct rules are prescribed consistently across the world in various financial hubs. If you ever get a chance to Google any of these names: ITG, Knight [00:08:30] Capital, E-Trade, and very recently, Merrill, with regulatory fines, you'll see a host of cases that speak to these type of market manipulations and market activities. So, I'll leave that, but definitely do read on that to get a little bit more context to what I'm speaking about. So, go to the next slide.
Some of the ideas [00:09:00] here really just we're trying to convey here is that regulators are concerned about high velocity, low touch electronic trading, trade flow which could interfere with market integrity. There're a large number of orders that are sent by either retail clients, so you know, the "little guy", as well as institutions that are very sophisticated, that have multi-million dollar systems and teams that support their trading algorithmically.
[00:09:30] The other item that's a very big issue within our community is electronic communications or E-com, which adds to this challenge where we are not only reviewing the trading, the orders, the trades, the executions, of our clients as well as our own internal trade flow, but we also need to monitor electronic communications: emails, chats, Bloomberg, instant messages, [00:10:00] what have you. If there's any type of communication or any type of breach in information barriers that could degrade the integrity of market transparency. And the big challenge for surveillance here is to differentiate the abusive nature of the market conduct by means through which the activity is conducted. So, the idea here is you have to be able to separate the [00:10:30] trading and the abusive activity and really sort of find that needle in the haystack, and with our staff, it's a difficult thing. Most broker dealers in general have their challenges with that, trying to find that needle in the haystack. So, this is sort of leading us to needing more technology and more platforms, enhanced sophisticated platforms [00:11:00] to do our day jobs.
The next slide. Just to underscore sort of the backdrop here is that, direct electronic access and foreign routing arrangements are the way of the future. Gone are the days of someone in an office with little pink and blue tickets, order tickets, that'll run through a trade desk and send those [00:11:30] to the trade floor, "Buy 100,000, sell 100,000," with paper tickets. Those days are pretty much done. Everything now, by and large, within broker dealers is conducted electronically, which means there's more accuracy, there's more chance for things to go wrong, and our scope in terms of our client base has expanded. It's no longer just regional, local clients, we're dealing with [00:12:00] global clients that have access into our local marketplaces from around the world. And by and large, those foreign entrants, investors, or those that have access to our marketplaces aren't very aware of certain local regulatory rules. So, it adds to that challenge that we have to educate and we have to have the proper surveillance tools to review all of that [00:12:30] trading activity and data.
The result of an increased need for market conduct or conduct surveillance with this order activity, here, what we're seeing is that we need to have a link between the executions, trades, news, insider trading, e-communications, as I mentioned before, the MNPI. So, this is the insider information within the broker dealer as well as with our clients [00:13:00] and various types of pump and dump schemes. So, these are some of the challenges that we see on a daily basis because of the trade flow that's coming into our central hub or our central trade floor.
So, the next slide is essentially the new reality. This is the new reality for compliance and our teams is that we do have a requirement or [00:13:30] big data analytics, meaning, we do need broker dealers, need to have the right platforms and the right tools to do their day job in order to have oversight of this trading, not only in our local jurisdiction, but also global jurisdictions. The interrelationship between asset classes is another area, so equities and options that are trading concurrently to manipulate the market. [00:14:00] So, you're long in north or within regional, so you're long in North America and you're short that same name in the EU market. You definitely need to have a more scalable, and reliable, and efficient data platform to harness all of this and ingest all of this data so that we can at least then have a better oversight in pinpointing trouble spots, trader IDs that are attempting [00:14:30] to manipulate, or clients of our clients, so to speak, that are trading somewhere locally far away.
This all adds to the requirement for larger databases, larger data infrastructure. Our regulators, by the way, are asking us for reviews from one, two, and three years prior. So, it does [00:15:00] require some type of platform that has an archiving ability and its ability to be retrieved very quickly. Regulators usually give us probably a week to 10 days to respond to requests for activity that occurred maybe three years ago. So, it adds to that level of complexity because they are implicitly testing us on our audit trail and as well as our ability to store that level [00:15:30] of detail in our data infrastructure environment. And, if you can just go to the next slide.
That's it. Okay, that's pretty much it for me. I hope I gave kind of an idea of our landscape. If I were to impart anything here, it's that the environment is expanding. We have global trading that's occurring, a high [00:16:00] volume of multi-jurisdictional multi-products and hence the need for big data from our side at the broker dealer level.
Vamsi: Great. Thank you Shailesh, and good morning everybody. This is Vamsi. I'd like to just speak about the increasing role of big data as we see it playing out across a spectrum of banking and financial [inaudible 00:16:28] and then maybe spend a couple [00:16:30] of minutes talking about the impact we're seeing in capital markets. So, if you go to the next slide, David.
So, Shailesh spoke eloquently about the role of big data in the area of surveillance in ensuring that the trades, the orders, the life cycles are all based on appropriate business and personal reasons and that there's not, under the radar, a rogue trading activity [00:17:00] going on that could impact the markets. But taking a step back and looking at not just capital markets, but the spectrum of banking, if you will, as is depicted on the slide on the left. IDC just did a study, which is their big data and analytics report for 2016, where they basically forecast the market and big data to be around be 200 billion dollars by 2020, and all of this increase is [00:17:30] being driven in a big part owing to the need for business to get better analytics. And the banking industry specifically was called out as being the big driver across different industries and industry verticals.
From my standpoint, when I look at banking, obviously there's a whole spectrum of institutions that make up global banks, so obviously, capital markets with investment operations, trading operations where multiple trading desks are all present and they're doing trading in equities, fixed income, [00:18:00] commodities, Forex, etc., They have been a big driver historically in looking at big data. But in the more day to day type of banking in retail and consumer lines of business, we're seeing big data being used in business impacting ways across areas like customer segmentation, marketing analytics, real-time customer interactions, etc. etc.. And again, back on the capital market side where there's [00:18:30] fast data, it's big data, and it's quick data, you're looking at stock exchange and the hedge funds there doing a variety of things with it.
And last but not least, when we look at corporate banking and corporate lending and trade finance, a big impact there is around ensuring that anti money laundering activity is kept to as much of a minimum as possible and that any violations of AML are reported duly to the industry authority or over to FinCEN, or whoever have you, in your area of jurisdiction.
[00:19:00] And finally, in wealth and asset management where banks are typically on more on the buy side, where you're working with high net worth investors or you're creating mutual funds or financial products that are being sold to investors with specific investment horizons and financial goals. The goal again, is to bring in a whole range of data that spans multiple types of data, structured, unstructured, semi-structured, and to run analytics on the data and to be able to do batch analytics, real-time analytics, streaming analytics.
[00:19:30] So, as we talk about the impact of big data in banking, when we drill it down across all of these segments, there emerged certain core areas which are shown on the right. So, any kind of surveillance application as we're discussing today, risk management, anti money laundering on so called defensive side as well as cyber security, a lot of those workloads are a great candidates for a modern data architecture and a big data architecture, but also on the side of things where banks [00:20:00] are looking to generate more revenues per customer, doing things around cross-selling, up-selling in capital markets or on the retail side or in credit cards, banks are looking to put together a single view of a trade, a single view of a customer, whether that's an individual customer in the retail site, or an institutional customer across multiple desks in the capital market space. So, we're seeing more and more of impact every year in the sense that big data's slowly moving from the traditional IT zone to [00:20:30] being more of a business impacting framework, a business decision, and a business value driver frameworks, so to speak, in terms of enterprise architecture.
So, let's take maybe five minutes to look into some of the specific challenges in the capital markets here in terms of the data volume, velocity, and the variety, and then also talk about some specific use cases as well in those areas. So David, if you go to the next slide.
[00:21:00] So, we all have heard about the 3Vs, right? We all talk about the volume, the velocity, and the variety of data. And to break it down into what it means in the capital market space, which is our focus for today, number one, we're obviously dealing with larger data sets. So, Shailesh spoke well about the fact that the types of data that we're looking at both to drive better trading strategies, or to do a portfolio back-testing, or to [00:21:30] do compliance and surveillance reporting, spans a whole gamut of data. So, we're looking at various omni-channel types of data. We're looking at a lot of position data, any new stories, any social media data, etc., across different time horizons, across multiple desks. And not only is this data large, but it's also of a variety of different types of structure.
So, we're not just talking about the traditional ticket data or the core banking data, we're talking about [00:22:00] data that's over the counter contracts. We're talking about social stories, we're talking about sentiment analysis, but the fact is this data isn't suited to be typically handled by your relational database or your enterprise data warehouse. And that kind of is what's led to the movement to look a data lake as a natural repository of all these different types of data. So, you have larger data and you have different kinds of data, newer kind of data, so that speaks to the volume challenge as well [00:22:30] as the variety challenge, but you also are challenged in terms of velocity. So, this data is being pumped into the enterprise architecture, into your real world applications at a rapid clip.
So, what that presents you with is different types of problems from an architectural and from a technology standpoint, the need to provide analytics on the data at a really low, an extremely low latency of analysis, we're typically talking about microseconds in some cases, ranging to analysis that needs to be performed in [00:23:00] the day a few times, or at the end of the month reporting. So, you have a challenge of real-time streaming batch and interactive type of analysis that needs to be done on the data depending on the use case. But while you're doing all of this, we're all under cost pressure on budgetary realities, so the need is now to look at more of an open source approach where you're looking at an x86 architectural way of deploying things, but also looking at low cost storage.
As Shailesh mentioned, regulators are asking for data in the surveillance space [00:23:30] that are a year or two old. In anti money laundering or AML, there's a need to look at data that's seven years, six years, five years old, and need to look at that data in near real-time in terms of take an actual wire transfer and merging that with historical data patterns of that single customer and looking to deciding, "Hey, do I need to file a suspicious activity report or not?" So, all of this being said, when we look [00:24:00] at all these technology challenges that you have, the final one is to basically be able to provide advanced visualization capabilities. Obviously, a picture's worth a thousand words. You want to be able to ... and Shant will talk about some of this, but the point is to provide the analyst, if there's a fraud analyst or surveillance analyst, with all the information that she or he needs at their fingertips and information that's correlated in terms of providing them with advanced support when they do their analysis of that [00:24:30] single trade, or that order book, or that set of core banking transactions or AML transactions.
So, moving on to the next slide, David. So, when we look at capital markets as a whole, the use cases that we're seeing out there can be decomposed, in my mind, to four broad areas. The first area is the focus for today, which is, basically, the need to solve the challenge of [00:25:00] trade and market surveillance in terms of collecting and processing all these disparate types of data, but in a way that we can do it in a timely manner so that we can monitor either trading activity across markets in one institution. So, we're talking about internal trade repo, or we're talking about a repo that is run by a regulator or buy a stock exchange, which encompasses typically millions of market participants. So, that's typically the first bucket of use cases we run into.
The second one is typically around trade life cycle. So, [00:25:30] if you have new trade strategies that are being developed by your quant teams, the need to back-test these across years worth of historical data and to be able to do that in a fast amount of time where, unlike the older days of old where you had a new algo that you tweaked or you developed, to test that on three years of historical data, it took you two days just given the nature of the technology available, now, banks are doing that in hours, if not 30 minutes or a few minutes because of the ability to use [00:26:00] Hadoop to paralyze execution on the data lake.
But while we're doing all of this, there's also use cases which help banks drive revenue in the capital market space. The first one obviously around areas like commodities or precious metals, the need to be able to do sentiment driven trading based on market data, social media data, and other data that you can onboard and drive up and down trading decisions and then to right size client portfolios based on those decisions is a big use [00:26:30] case. But the ability to do all of this depends on being able to get that single view of the customer. So, after Dodd-Frank and the Volcker Rule, most banks in North America have had to kind of jettison their proprietary trading desks, so they're moving to more of a flow base trading model where the need to see that single view in terms of how much is this institutional client bringing us in terms of revenue, but how much risk are we carrying for them, [00:27:00] right? To be able to do your sales activity as well as your compliance activity based on that use case.
And the final area is to create new products, right? So, if in retail banking, we all talk about data products, and the need to look at all the data assets that you've accumulated over the years, and the need to be able to drive better product decisions based on that. Much like a Zillow, or a Trulia in real estate, or like Uber in the transportation space. But even in capital markets, we're starting to do [00:27:30] more things around client benchmarking and being able to do more TCA and things of that nature by re-imagining the data that you have of across asset, cross geographical, cross-jurisdictional data that spans multiple types and to be able to drive more predictive, more richer types of machine learning and data science analytics on the data. So, move to the next slide.
So, what we want to do here is to talk about the value that's being generated by the partnership that Arcadia [00:28:00] data and Hortonworks have. And Shant is the best person to do this as the CTO in Arcadia, let me turn it over to him. Shant.
Shant Hovsepian: Hi everybody. Thank you so much Shailesh and Vamsi. I'm Shan't Hovsepian, I'm the CTO and co-founder of Arcadia Data. I'm going to spend a little bit of time talking about Arcadia Data and Hortonworks as joint solution for this space. It's very exciting, Shailesh I think gave a great overview of some of the business and regulatory problems that him and his colleagues are faced with every day. And [00:28:30] Vamsi did a great job in talking about how big data played an important role in this both from an architecture standpoint as well as the sheer variety of different data sources. But it's very exciting for me to present our joint visualization and application solution for this problem. Really bridging those two gaps and bringing it together.
So first, a little bit of history about Arcadia Data. We set out to find this company with the one goal of helping people create business value from their big data. My co-founders and I, we came from lots of big data traditional [00:29:00] enterprise companies that have worked very closely with financial services institutions over the years and we really wanted to take things to the next level and enable cutting edge visualization and applications. We're a venture funded company that focuses on the Global 2000, and high concurrency use cases with strong SLAs focused on analyzing data sets on the order of hundreds of billions. Arcadia built a high performance and scalable visualization tool for big data with absolutely zero data [00:29:30] movement.
So, just real quick, I think we understood what some of the business problems and the frustrations are with trade surveillance in a big data world, but I want to touch a little bit about architecture and what this looks like from a technology perspective. So, if you look at the current slide and read it from the bottom up, you can see a lot of the operational data sources that we discussed, things like order book data, market data, e-comm, trader data, even regulatory data sources like OATS and other regulatory data pools. A lot of this data gets [00:30:00] fed into a single system. These cases it's becoming like Vamsi sort of described, the open source model makes it very cost effective to use Hadoop for storing and leveraging a lot of this data.
From there, we do a lot of staging transformation, but usually multiple phases of extraction happened. There's a data warehouse, spent a data mart, the BI server, BI tool, eventually, you have your end users. This type of data movement situation is not only very complicated but it creates various data traps as you move up and down [00:30:30] the stack. Things like data summarization, the fidelity loss of big data, the inability to collaborate, the security risks, even though logically these datasets represent the same sort of problem, the same situation, there're multiple physical copies of it which need to be secured independently, as well as the general management and operational complexity associated with this. So, it's a very difficult world both in the front office and the back office to manage this type of a compliance architecture [00:31:00] given the huge velocity and variety of data sources.
So, how does Arcadia Data change this? Arcadia provides a converged analytics platform. This is an architecture for data analysis as well as a sophisticated visualization tool purpose-built with big data in mind from the very beginning. We allow you to visualize historical and real-time data in a single platform with a single tool than having to go back and forth between your different tool sets. [00:31:30] This type of closed loop navigation allows you to go all the way down to granular data rather than just visualizing summaries. And when you're trying to rebuild a trade sequence or the general context around the large block transaction, the ability to look at the finest granularity of order and settlement data is extremely critical.
We can also combine this a lot of fast and ad-hoc iterative analysis. I like to think of compliance officers as their own private investigators, I think we did a great needle in [00:32:00] a haystack analogy, but these teams are using creative methods to kind of figure out what quick iterative calculations, what kind of analysis they can do to detect illegal or irregular behavior. Because on the other end of the spectrum, we have traders, we have broker dealers who are just trying to get their trades across. So, you need a flexible tool that allows your compliance officers to be very iterative in their analysis. And of course, at the end of the day, leveraging the power that is the Hortonworks data platform, we can [00:32:30] run a distributed BI engine at a cost effective and scalable manner across commodity hardware.
So, just real quick, what makes Arcadia different? We are the only Hadoop native analytics platform on the market, so it's a powerful and simplified architecture. Everything runs on cluster. You don't have to move data. You get your business users to have direct access to that data while still leveraging the scale, the security, and the infrastructure that you have in place. [00:33:00] You can explore your data quickly and directly. You don't have to build a model, or a tube, or an extract first, you can do all sorts of exploratory and iterative analysis and then build your model along the way. This is great for both concurrent access as well as single individuals doing their own ad-hoc analysis. And then of course, we make self service advanced analytical insight possible.
Excel has made visualization sort of commodity, right? The types of data sets that the compliance officers and trade [00:33:30] surveillance employees are dealing with are vast and they are large, and the types of questions, the types of visualizations you have to build aren't your standard types of visualization. So, the ability to do more advanced analysis, being able to combine free text search, for example, for e-comm data as well as with real time streams, or the ability to do very quick segmentation or behavior based cohort analysis of the data points in time events analysis, complex event processing, as well as call on some sophisticated machine [00:34:00] learning to do things like fit models, regression, correlation, and potentially even forecasting in a very point and click manner without requiring the need for an engineer to write some code.
To talk a little bit about the solution that we have together with Hortonworks, is modern data architecture. You can see here where Arcadia runs the top layer applying a lot of visual applications' accessibility as well as directly on the data system where the visualizations tera and [00:34:30] the analytics engine runs directly next to the Hortonworks data management and data access components.
This is especially exciting from a governance and security perspective because fundamentally, data never has to leave the system. Any kind of audit or governance that takes place across the Hortonworks platform is automatically inherited and enforced when using visual analysis. So, you have one place where you can log into it and look at all of the touch points with the end users and the data. [00:35:00] This is great because it organically combines with the rest of your data platform. For multi-tenant systems, much of our financial services customers run their risk, they run their trade surveillance, they run AML all within the same cluster. So, being able to leverage multiple workloads as well as the data sets from a single environment is really revolutionary in the financial services space that's long been relegated to IBM zSeries mainframe. So, you can connect Arcadia directly to the Hadoop clusters, makes it very easy to share the data, and you get really high [00:35:30] performance direct access to HDP.
The integrated management and security with Hortonworks makes it to that your IT team is relaxed and happy, and even as we go forward, you can deploy both on premise, in cloud, and in hybrid environments so that if you needed to do a thorough investigation over the last three years of data, you can spin up some elastic nodes, do that analysis, build your visualizations, put together a report, and send it over.
With [00:36:00] that, I want to hand it back over to Vamsi to talk a little bit about the use case that we have deployed together.
Vamsi: Great. Thanks, Shant. So, what am I going to do in this section is talk about a customer case study with large financial services and banking major in North America, and talk about some of the typical problems that our customers in financial services are facing around the [00:36:30] existing data practices and the practices that essentially talk about the way they deployed their analytics among them. So, moving on to the next slide, David.
So, if you look at the data architecture of a large bank at this point, and this is generic across a whole range of customers as seen across the world, there's definitely a bit of a siloed mentality in how data assets have been deployed in pursuit of business goals. One of the first things [00:37:00] we see typically is a high degree of data duplication, and this is from a book of record transaction system to book of record transaction system. And what that essentially does is it leads to multiple inconsistencies in how the data is basically laid out right from a summary level down to the transaction level itself, and also the fact that multiple groups on the data based on the business function they're performing, and because of that, the feeds of the data, the injection frameworks around [00:37:30] the data as well as the analytics and the calculators that are applied on the data, end up being duplicated as well.
So essentially, you have a challenge of multiple development languages, multiple frameworks, as well as the fact that there are multiple injection technologies that have been created and stood up and all of these have the cumulative problem of inhibiting data agility. So, when we talk about areas around trade surveillance, or risk data aggregation, or AML, one of the big problems that banks have is the fact that [00:38:00] when algorithms have developed and models are developed across all of these areas, when you have more data that's being thrown in from a business standpoint, as we've discussed at the start of the Webinar, you have the problem of having to go and modifying your analytics to reflect the changes in the incoming data.
But because of the fact that the injection itself and the processing of the data itself is fairly siloed, what you end up with is that every time you've got to go back and make a change to a data feed, [00:38:30] it's a process that goes through IT trouble tickets, and multiple groups and handoffs and you're basically spending a lot of time dealing with the modeling of the data up front even before the analytics can be applied to the data. And when you look at the different groups that apply their calcs on the data, there's also the problem of ... I've seen shops where a certain group has their risk data calculators out of C++, while another group that's doing [00:39:00] credit risk maybe using Java, another group that's doing liquidity risk, maybe using Scala, etc., right?
So, you have this problem of heterogeneity in the development of these architectures themselves, and what we then see is, because of a lot of these challenges, you also end up with having data auditing, data cleanliness issues, where when you apply the analytics on the data, you're basically looking at directors reports that are off by a few hundred million dollars, or the raw data itself is has lance [00:39:30] on essential EDW or whatever have you. There's issues around different teams watching to the cleanliness of the data and the auditability of the data.
So, the challenge that we're left with is that the entire process that deals with the data architecture, the data analytics, the data ingestion, and the data transformation, and data reporting even, reporting in some cases, is all fraught with silos and that leads to a lack of confidence in [00:40:00] the original data itself. So, when you kind of look back at the CCAR in North America, a lot of the large banks were basically non-compliant with CCAR when they first did the stress tests because the data itself, the quality of the data itself was so questionable. Move on to the next slide, David.
So, when we talked about having a reference architecture for market surveillance system, you will see a few commonalities start to emerge in terms [00:40:30] of what does architecture needs to look like and working solution like Hortonworks Data Lake add a lot of value in with Arcadia's analytics solution working in conjunction. So, the first thing we see out there when we talk about a reference architecture is that from market surveillance typically, is the need to support end to end monitoring as it's shown on the left side. And this end to end monitoring has to be supported through a variety of financial instruments that operate across multiple venues of trading.
So, when you bring [00:41:00] this data in, there is a need to provide a platform that can ingest from tens of millions, and at times the high hundred millions off market events, and this market event ingestion should span a variety of different financial instruments. So, we're talking equities, bonds, Forex, commodities, derivatives, etc., from thousands of institutional participants, but we're also talking about the need to store different kinds of data, not just the typical row column data that goes into a relational database [00:41:30] or a EDW. But once you have the data in one place, the ability to add new business rules to the analysis of the data, this is using either a model based system that supports advanced analytics like machine learning and deep learning, that's the key requirement.
As we can see from the talk that Shailesh gave, market manipulation as an activity itself constantly pushes on the boundaries in terms of business pattern in new and unforeseen ways. So, we have the data in there so there's a need [00:42:00] to provide a platform that can ingest from a variety of different sources, but the need is to keep the data in one logical platform because you have all the data in one place, you have all the position data, you have all the trade data, you have the entire order life cycle, the need is to then go in and apply advanced analytics on the data to help surveillance officers to be able to perform deep cross-market analysis.
So, to be able to view the data, to be able to correlate the data, and [00:42:30] to be able to look for any illegal behavior in the data in terms of market manipulation, insider trading, watcher or secretive trading, or unusual pricing in the data. But again, as you're looking at the data, the goal is then to be able to provide information that's rapidly consumable by a business audience or by your regulators, and also to be able to help your data scientists and your quants with development interfaces using either existing analytic tools or some of the new age tools that are being [00:43:00] driven out there, python or whatever have you. So finally, we're also talking about a visualization challenge with being able to integrate the analysis in such a way that you're providing a real intuitive dashboard to the compliance officer and also providing them with a visualization of the data.
So, to kind of decompose what the reference architecture requirements are into four or five different areas, the first one is to provide insights [00:43:30] on all this market data, insight that is real-time driven. And their definition of real-time can be from seconds, to milliseconds, to even when you're doing batch analysis on a range of data to be able to look for any patterns that could be repetitive or that could raise a flag in the [inaudible 00:43:53]. But while you're doing all this as well for the high net-worth customers, or high value customers, or customers that I have a high propensity to [00:44:00] commit some kind of irregular activity, you need to have the single view off a customer, of a trade, or of a transaction. And whatever architecture you come up with using your data platform, the goal is also to be able to keep it loosely coupled in such a way that it permits micro-services based development, but also something that could be deployed on a cloud or on frame. And then finally, to be able to provide something that is scalable yet cost effective as we see in the pattern.
[00:44:30] So, for this reason, when we work with this client, the combination of the architectural paradigms Arcadia and Hortonworks bring was deemed to be a great pattern for them to develop and deploy on. So essentially, what was done at this client is that a shared data repository that we all liked to affectionately call a data lake was created, and this data lake can capture every order across lifecycle and [00:45:00] capture the lifecycle of the order, so the creation, the modification, the cancellation, and the ultimate exchange, the execution of the order, the exchange, but also to be able to provide visibility of all the data as it relates to inter-day trading activities.
So, once this architecture is stood up, the compliance group then basically accesses this architecture or this data lake using Arcadia's tools and analytics frameworks to be able to process every position, every execution, and balance data [00:45:30] as it's on the lake, and then also to be able to work on fresh data as it's generated inter-day, but also to be able to work on historical data that's stores for at least two years or more. Again, that speaks to the value of using a Hadoop driven architecture like the Hortonworks data platform, because the underlying technology is built on an abstract global file system, the Hadoop distributed file system, though you could also deploy it on any given cloud file system as well with [00:46:00] the range of integrations that we support and provide, and you can augment the data lake with incremental feeds from intra-day trading systems using technologies that are things like Sqoop, or Kafka, or Storm, and then to be able to perform analytics on the data using Arcadia's platform.
So again, if I could sum the architectural design of the system in a couple of different ways; what you want to look for is a technology platform that serves [00:46:30] as a common point of ingestion of data from multiple venues, multiple sources, across multiple types of data, but also lets you provide agility in terms of adding new types of data into the lake and to be able to work on the data. So, in Hadoop, we speak about schema on read, which is a big win in surveillance architectures because the data schema does not need to be modeled up front before you bring the data in. And once you have the data in the lake, the biggest thing to [00:47:00] do is to be able to push compute to the data instead of taking data to the compute by using the technology as present in the stack and in Arcadia's platform.
So, with that in mind, and in the interest of time, let me turn to Shant to take a deeper look into the architecture from Arcadia's standpoint. Shant.
Shant Hovsepian: Thanks, Vamsi. So everyone, I just want to give you a sense of what this really looks like at the end of the day. We call it the compliance or the surveillance officer toolbox. It's a web-based front end that the users can log in. It allows you to do both [00:47:30] sort of canned reporting of very well known attributes as well as the ability to do completely ad-hoc data exploration at high scale and performance. And if it wasn't obvious enough from what Vamsi was saying, I want to stress the importance of having a system that's dynamic enough to handle Schema on read. So, for example here, the ability to do attribute filtering at high performance and scale across hierarchies, asset classes, even doing free tech [00:48:00] search for Econ data is very critical, and you can't spend all your time building indexes for every single possibility.
So, having a analytics tool and data engine that can leverage schema on read on a storage platform that provides you an efficient way of doing it, it's very critical, especially in the surveillance use case because in a lot of times you don't actually know what you're looking for, right? Apprehending bad behavior really has to lead to asking some very interesting and dynamic questions. Also emphasize [00:48:30] the importance of the ability to drill down to the raw data, so always being able to look at individual transactions, being able to look at the settlement history and how things happen and kind of piece this together. Even the ability to be able to export this data to other tools like Excel or make it easy to run predictive analytical models on a subset of the data is critical to the exploration process as well.
At the end of the day, what we're trying to do is we're trying to build a complete picture for that point in time of the trade history that happened [00:49:00] across all the markets, across all the exchanges, factoring in any additional communications that may have happened at the time. And one of the other big use cases here from a visualization standpoint, are kind of new and cutting edge visualizations to help do things like reconstruct order flow or to look at serious amounts of time series data and how that changes over time. So, I just want to get into some of the properties of this type of a solution and what we've seen as critical in the field. So, as we touched [00:49:30] upon before, the ability to inspect the data issue ad-hoc queries, very fast filtering across the attributes. That's an important property because going into these situations in some cases the surveillance officers or surveillance team may not exactly know what they're looking for, they may have a hunch and it's up to them to prove that hunch out.
Incorporating the unstructured data; critical, especially because of the regulatory requirements associated with it. So, when a trader is at their desk or a broker picks up the call to place [00:50:00] an order, is not just what they typed into the computer, but it's what was on the news at that time. Were they reading some other article at the same time? Did they get an instant message on their Bloomberg terminal to their blackberry vibrate? What else was happening to sort of understand what was going on on that traders mind and having a visualization tool that can work with the free form and unstructured data sets is critical.
The ability that we discussed; combine the real time and the historical data so that you can kind of correlate these events, especially as [00:50:30] new bits of information coming, especially for longer types of compliance issues where you got a trickle out effect. And an important use case, it's very critical for our customers that most people always forget about is case management application. At the end of the day, with a lot of the stuff, you have to report the results that you've found and the ability to sort of embed final reports or actual data traces of what may have happened is very critical both from like a static reporting standpoint, so sticking [00:51:00] it into a PDF or an email, but also even in some newer case management software that's web based, the ability to embed an interactive visualization so that your reporting team can later on go in there and drill down.
The ability to do email alerting on metrics changes, the ability to do what we call derived data analysis, kind of multi-path iterative analysis on subsets of data, pivot the data, try different transformations over and over again without having to extract everything to a spreadsheet, [00:51:30] have multiple copies in multiple sheets. The ability to kind of audit that data trail and that history trail in a single solution is very important. And really, all of this is just about being able to quickly retrace the activity around the large trade transaction block in a very easy point and click manner.
And with that, I'm going to hand it back to David to give you some of the resources and talk a little bit about Q&A.
David Fishman: Terrific. Shant, and Shailesh, and Vamsi, I want to thank you very much for a really comprehensive view [00:52:00] of the application of this technology and kind of a cutting edge domain. Much of what we hear about these technologies is been put in practice in compliance in capital markets and exposing the history to the point where banks can manage their risk and the regulators can ensure to the ongoing integrity of the markets.
I'm going to touch briefly on the resources here; the Arcadia [00:52:30] and Hortonworks blogs of course, will provide you a breadth, but for me as a generalist a blog, it really provides great insights into the impact of this technology in the financial services' industry. Reading is fun, but talking about it with real life people is even more fun and we'll be traveling together with Hortonworks on a roadshow that they've got going called Future of Data. And here are [00:53:00] the three days, Toronto in October, 20th, Atlanta on November 17th, and New York City on the 8th of December. And finally, you can find several resources to get you started on the Hortonworks website, both the software as well as an overview, a high level overview about our joint solution.
I'll open up to questions now. We've had a number of them come in during the course of the webcast and so I'm going to touch on those, [00:53:30] all three of them. But I want to start with ... this one occurred to me before someone else submitted it, and Shailesh, I think it goes to you is; can you tell us a war story? Is there a specific situation where having access to this kind of rich dynamic platform helped identify some manipulation and catch the misbehavior?
Shailesh Ambike: Yeah, absolutely. There's several war stories that [00:54:00] are on the go or over the course of my experience in the market conduct space and it's not particularly geared to one product or the other. But I could share a couple really briefly. We have trading that's conducted in a global environment these days, so issuers, they want their companies issued and marketed globally and traded globally. [00:54:30] So, we have instances where we have, and I think I alluded to this before, we've had clients or underlying ultimate clients of our affiliates, broker dealers, down the chain, that are effectively day-trading a security here in Canada while manipulating the quote while trading that same security on a foreign [00:55:00] exchange. And what happens effectively is that the foreign regulator will never see the Canadian misconduct, what they'll only see is that this individual is trading in a foreign jurisdiction, buying low and selling high. But what they don't see is that this individual is trading or at least manipulating the market or the quote in Canada, which is [00:55:30] a jurisdiction outside of their local jurisdiction.
And so, what that required us to do, and it's pretty complex, is that we have to overlay our local data with our affiliates data from that local jurisdiction and to sort of normalize it so that we can sort of paint a picture that this individual here in the foreign jurisdiction was trading a stock to their advantage while [00:56:00] entering and deleting orders in Canada. And to do that isn't an easy thing, you have to jump obviously through several hoops to have that, to get access to that data. I mean, we're talking about multiple jurisdictions, and privacy, and going through different foreign broker dealers.
But when you have regulators holding hands in both [00:56:30] Canada and in Europe together and saying, "We want to know who this person is, and we want to see the trading rationale, and we want to see if there was any misconduct." That requirement requires us to pull access to that type of data. And we see that quite often, especially not only in the equity space, but in the auction space as well, where [00:57:00] someone's trading a stock or they're manipulating the quote, we're trying to pin down the stock or an option, especially during expiration with an option until now we're talking about different products or even convertible bonds.
So, one of the big war stories for us really is, yeah, just foreign broker dealers and having to deal with foreign jurisdictions and multiple products, and also regulators [00:57:30] that aren't quite versed in local requirements. So, it requires long hours when you're dealing with that type of stuff.
David Fishman: I can just imagine. We have time for one more question and this one just came in. So, let me see if I can make sense of it. Vamsi, in your observation are there banking functions that are kind of looking to what compliance is achieved with big data technologies [00:58:00] and saying, "Hey, we want to do what they're doing. How can we adopt some of these technologies?" And, I don't know, trading strategies or know your customer or other kinds of dimensions of the banking industry where they're kind of borrowing directly from what compliance has already done with this surveillance tool kit?
Vamsi: Great question, David. And the answer is unequivocal unambiguous, yes. Right. So, we're speaking [00:58:30] about surveillance and compliance with surveillance reporting, but the other major forms compliance I think of AML, more on the retail side but definitely has a lot of impact in capital markets as well. But one of the drivers around looking at a big data platform and a platform like Arcadia's is the need to basically standardize the data in one place, and then to be able to perform multiple intelligence functions on the data. [00:59:00] And compliance leads the effort typically in terms of the scale of the lake and the scale of the technologies they bring in, while we see this is a common story where once compliance gets its start, there's some kind of KYC, or single view of a customer or a client requirement that gets added on and then that moves the adoption of the technology to different areas more on the front office of the bank, right?
So, the answer is a yes, and one of the things I can provide in terms of [00:59:30] my trusted advisor role to banks, is to not look at using big data in one use case or an isolation, you have to obviously take more of an organizational overarching approach to the data, and to ensuring that whatever data you put on the lake is data that you can reuse in a multi-tenant fashion.
So typically, most of the banks have the role of the CDO that they've created and the CDO's team, when we go work with them, the advice always [01:00:00] is to think of data as being a reusable asset that could be used across multiple functions or multiple lines of business and then to be able to drive the value drivers in terms of the adoption and not to basically have to reinvent the wheel every time there's new project that needs to leverage a Hadoop base or a advanced analytics platform like Arcadia's. To be able to take that view, which is typically long-term. And as you start using the data, you'll see that you'll start with compliance and you get effective outcomes, as Shailesh mentioned, [01:00:30] but then that use of this platform starts to percolate to other business areas and that leads to an overall better utilization of the platform and more return investment.
David Fishman: Good to know and a useful question. I appreciate the expansion. We are at the top of the hour and we've timed out on the schedule that we've allocated for today's webcast. There are another dozen or so questions that we'll circulate among the expert panelists that we have here [01:01:00] and put together some more information for you out there. So, thank you all for those questions. I'm sorry we didn't have time to get to them.
With that, I want to bring today's webcast to a close and thank Shant Hovsepian, Vamsi Chemitiganti, Shailesh Ambike, for the time they've invested to prepare and present. On behalf of Hortonworks and Arcadia Data, I want to thank you all for the time you've spent with us today and wish you a very good morning, afternoon, [01:01:30] or evening, wherever you are. Thanks again everyone.