Cybersecurity is one of the great and constantly evolving technology challenges of our time. Details remain murky, but the recent Equifax breach is the latest major story to place cybersecurity in the news headlines.
Though companies have poured hundreds of billions of dollars into cybersecurity solutions, new threats are always on the horizon. In fact, the overall cybersecurity market is expected to grow from $137.85 billion this year to $231.94 billion by 2022, according to B2B research firm MarketsandMarkets
This ongoing and growing investment reflects the continuing challenges and hurdles associated with cybersecurity. New technologies deployed into the enterprise may have as-yet undiscovered vulnerabilities. This is particularly true with the Internet of Things. At the same time, attackers and their attacks continuously evolve as they look for ways to hide their tracks so systems and applications that appeared secure yesterday are at risk of exposure today.
Whether attackers are individuals or part of well-coordinated networks, they may use sophisticated techniques to cover their tracks and make it difficult to identify them. Or in some cases, they may use very basic means of intrusion via social engineering, and have access to data and systems while appearing to be legitimate users. In any case, their attacks can be damaging. They may not only steal valuable information, but they may also do incredible damage to your organization’s reputation and your company’s stock value.
In light of these concerns and challenges, your security operations center has two primary goals in addressing cyber attacks.
- Reduce the time it takes to identify a breach to minimize any potential damage and related fallout, and
- Detect potential threats before they occur to prevent attacks.
These goals are difficult to achieve. Relevant data, especially network packets, may be dispersed across many locations. And there is a lot of that data, so it might never have been collected in the first place. Also, point solutions may be designed to address known attack patterns but are often lacking in addressing new or evolving threats.
A New Approach
Fortunately, there is a new approach—essentially a combination of best practices and solutions—to cybersecurity threat detection and response that overcomes these challenges.
The “big data approach,” that is, putting all data that may be relevant for cybersecurity investigations into a single repository to get a more holistic view of your network, has seen a significant increase in adoption in recent years.
By applying a visual analytics layer on top of your big data cybersecurity platform, you can greatly simplify the efforts of analysts to identify threats. Network and flow diagrams make it much easier to visually identify threats and attacks than simpler types of graphs, and they also make it easy to share information about these threats with other internal organizations and stakeholders.
The third essential component of this new approach is encouraging a collaborative effort as a community. Cybersecurity is a common good that we should work on together and thus leverage expertise from across industries. One framework for this collaboration is Apache Spot (Incubating) (formerly the Open Network Insights project), a community-driven cybersecurity project built from the ground up to bring advanced analytics to all IT telemetry data on an open, scalable platform. Spot contributors include experts from Intel, Cloudera, StreamSets and other leading cybersecurity companies and technologies.
There are several advantages to combining these three core technologies—a modern big data platform, a visual analytics layer, and a framework for collaboration (Apache Spot)—compared with traditional approaches to cybersecurity threat detection and response.
By leveraging Spot on a big data platform such as Cloudera, you can collect and normalize data from thousands of security data sources and make it accessible to security analysts from a single location. Spot enables you to perform analytics in batch or stream. This is important because immediate and historical analysis should always be done in tandem. As you see suspicious patterns in recent data, you will want to further investigate historical data to see if other potential breaches have occurred that you previously missed. And if your historical analysis patterns show various types of suspicious behavior, you can use that as a model for patterns to look for in recent data, especially in a real-time alerting framework.
Spot’s machine learning component also helps separate the signal from the noise by using anomaly detection through topic modeling to find suspicious or uncommon patterns within your network so analysts can respond quickly to high priority events. How does this work? Analysts simply receive the output of Spot’s suspicious connects analysis, which assigns an estimated probability of suspicious network behaviors of each IP address, and events with the lower scores are flagged as “suspicious” for further analysis. Over time, the underlying predictive model gets better and better as more events are analyzed and scored by analysts.
The Spot machine learning model should be flexible enough to recognize new potential threats as they develop. This means Spot can continue to evolve and be a long-term platform for combating cyber threats. So as you face new threats, you can easily develop new analytics to understand and resolve these threats or attacks using a combination of machine learning, noise filtering, whitelisting, and heuristics.
All of these benefits help the security operations center (SOC) team meet their two primary objectives of reducing time to resolution for breaches and preventing future attacks.
We at Arcadia Data will continue to contribute to Spot to make it a powerful cybersecurity solution. Our recent contributions will give you a good start for deploying Spot in your own environment, and hopefully they will encourage/inspire you to make some contributions yourself. We need to keep building up our cybersecurity defenses as a community, and Spot is a great framework upon which we can rally.
To briefly share an overview of our recent contributions:
- The ODM setup script provides a straightforward process to build the directories and tables necessary for housing data from specific sources to fit the Apache Spot Open Data Model (ODM) schema. Users also are given the choice of building the tables in either Parquet or Avro, depending on which storage type fits their needs best when they run the ODM setup script.https://github.com/apache/incubator-spot/pull/119
- Three new dashboards in the Apache Spot community focus on tracking and exploring security events related to users, endpoints, and vulnerabilities. These dashboards run in our visualization platform (you can try these yourself in Arcadia Instant, a free visualization tool available for download here, upon which you can create more dashboards that you can share). Additional components that also power these dashboards and provide user, endpoint, and vulnerability data are StreamSets ingestion pipelines, which are configured to bring source data from Centrify (user events), Windows Event Logs (user events), Qualys scanning (endpoint events), and KnowledgeBase data (vulnerability context) which have also been contributed back into the Apache Spot community.
StreamSets ingestion pipelines
If you are investigating cybersecurity solutions, I strongly recommend considering the combined solution outlined above – a modern big data platform with Apache Spot and a visual analytics layer. If you want to see such a combination in action, here’s a video that shows how you can visually analyze user, network and endpoint data in Cloudera and Apache Spot with Arcadia Enterprise. The video demonstrates how easy it is to visually explore massive amounts of data to identify and respond to threats with the native visual analytics of Arcadia Enterprise on top of cybersecurity data in Cloudera and Apache Spot. It’s important to note that the visualizations also make it much easier for anyone outside your security operations center to understand the nature of such threats and attacks.
This video demonstrates how you can visually explore massive amounts of data to identify and respond to threats with the native visual analytics of Arcadia Enterprise on top of cybersecurity data in Cloudera and Apache Spot.
To learn even more, be sure to check out the accompanying blog by Cloudera. Also check out their press release here. As a final comment, I’d love to learn what you’re doing with respect to cybersecurity and big data. Feel free to email me at firstname.lastname@example.org if you have interesting information to share, or if you have any questions about our approach to cybersecurity. Or if you’d like to learn more about how Arcadia Data can address your analytics requirements in other use cases, please be sure to contact my colleagues who can help you out.