Big data is one of the hottest trends in computing in recent years given the continued digitization on businesses which open up new channels of communication and commerce. But with big data, comes big responsibilities. The potential for data security challenges should be at the forefront of every CTO and IT manager’s mind when their organization works with massive amounts of data and has a need for big data analytics. They often need to quickly investigate breaches, or do greenfield threat hunting. Data breaches are becoming more frequent with the explosion of digital touch points and could be even more catastrophic, creating untold and unprecedented legal and financial repercussions.
Platforms for consuming and processing large amounts of data, such as MongoDB, Elasticsearch, Apache Hadoop and Apache Spark, present their own set of IT security questions for customers and users. With that in mind, what is the current state of cybersecurity right now, and how is it affecting big data?
The state of internet security
Like other forms of security, cybersecurity tends to be noticeable only when things go wrong. As long as everything appears to work correctly and keep out intruders or malicious actors, few people outside of IT pay attention to the organization’s cybersecurity efforts.
However, plenty has gone wrong lately across the Internet, with news of cyber attacks reaching the front pages of international news outlets. In 2016 alone, major tech companies such as LinkedIn, Oracle, Dropbox, Yahoo!, Verizon, and Cisco announced they experienced security vulnerabilities and breaches, exposing the personal information of millions of customers and employees. Even venerable U.S. institutions such as the Department of Justice, the Internal Revenue Service and the Democratic National Committee fell victim to major data breaches, which are believed to be the responsibility of foreign actors.
Along with the exposure of sensitive information, malicious agents have been increasing the frequency and intensity of distributed denial of service (DDoS) attacks, intended to prevent legitimate use of a website or service by flooding it with traffic and forcing it to stop or slow down. According to Akamai’s State of the Internet Security Report released in November 2016, the total number of DDoS attacks increased by 71 percent when compared to the previous year. In addition, October 2016 saw the largest DDoS attack on record, bringing down major websites several times for hours at a time. The attack affected news organizations such as the New York Times, Fox News, and CNN; tech firms like Twitter, PayPal, and Amazon; and telecom companies such as Verizon, Comcast, and HBO.
All of this adds up to an extremely shaky picture of cybersecurity at the beginning of 2017 – one that should present concerns for the C-suite of any organization, large or small, no matter how well they believe they are being protected.
Cybersecurity for big data
Hackers and threat actors have not only constrained themselves to attacking high-profile targets, however. Many of them have successfully infiltrated improperly secured databases, poking holes and searching for vulnerabilities until they find ways to enter.
In January 2017, news broke of several groups of attackers who together had deleted more than 10,000 MongoDB databases that were improperly installed and therefore exposed to any Internet user savvy enough to know how to access them. Affected users were asked to pay a ransom for the return of their files, although it is unclear whether the hackers had actually backed up the data before deleting it. Soon afterward, a similar story emerged of so-called “ransomware” hacker groups who had wiped data from more than 600 Elasticsearch clusters; by then, the number of affected MongoDB databases had climbed to over 34,000.
Many, if not most, security breaches are preventable. Databases are often not properly secured, and victims too easily fall for standard social engineering techniques such as phishing. This point is incredibly important for individuals concerned with big data security issues to understand, especially those using similar services (such as Apache Hadoop and CouchDB).
Indeed, soon after the news of the MongoDB and Elasticsearch attacks, another story hit: more than 100 openly accessible Hadoop instances were subjected to a deletion attack. While this latest attack has not yet resulted in ransom demands, the number of breaches is only likely to rise in the future; it is estimated that thousands of Hadoop deployments are accessible through the Internet.
The benefits of in-cluster security
Although big data platforms such as MongoDB, Elasticsearch, and Hadoop which are improperly secured can be points of vulnerability and potential attack vectors, they can also themselves be used to combat hacks and breaches. This is because of the scale and data agility they provide to more quickly combine all relevant network, endpoint and user data in one security data warehouse, hub, or lake. Big data solutions such as Cloudera can be used in combination with big data analytic techniques in order to detect malicious actors, scan for malware, and identify insider threats.
Hadoop-based security data lakes or data hubs are the perfect place for security analysts and security operations centers (SOCs) to start leveraging big data for their cybersecurity needs. Within this hub, IT and security teams have a singular, unified, secure location for collecting and analyzing vast amounts of data in order to detect and mitigate threats.
Arcadia Data is a “data native” approach to analyze data in-cluster and in-cloud. It integrates natively with scale-out data platforms such as Hadoop to provide a comprehensive Hadoop-based analytics platform that businesses can use when building cybersecurity solutions. Arcadia’s platform provides users with real-time insights and visualizations using both streaming and historical data sets all the way down to the lowest level of detail available because data does not have to be moved, summarized, or sampled. Users are also able to leverage the power of next-generation data management tools from Apache – including Apache Spot, Spark, Impala, and Kudu – in order to query, process, and store the massive quantities of information that modern cybersecurity solutions require. With Apache Spot, organizations get a head-start on hackers by providing open data models (ODMs) based on community best practices for organizing and analyzing relevant data to perform forensic analysis on cyber threats. Apache Spot has seen a lot of adoption in a short amount of time given the need to take a community approach to combating cyber attacks.
Read on for more details about Arcadia Data’s in-cluster cybersecurity solution that uses data-native visual analytics. There is a product demo of Arcadia for cybersecurity which shows you what’s possible when you can unify insights across users, endpoints, and networks in a “single pane of glass” and drill down instantly to the finest level of detail.