Apache Spark Input Validation Vulnerability – CVE-2022-33891

Apache Spark is an open-source framework for distributed computing that has gained significant popularity in the big data processing industry. It offers a fast, general-purpose computing engine for large-scale data processing.

Spark provides the ability to develop applications in several languages, including Scala, Java, Python, and R, making it very useful for a variety of tasks and applications such as machine learning, graph processing, and data streaming.

It is designed to work with distributed data stored in Hadoop Distributed File System (HDFS), allowing it to leverage the benefits of Hadoop's fault-tolerance features.

The Vulnerability

Recently, a vulnerability was discovered in Apache Spark's ACL feature, which could potentially allow unauthorized access to sensitive data. An ACL (Access Control List) is a security mechanism that defines who is authorized to access a particular resource or perform a particular action on the Apache Spark framework. Responsible for this is a flaw in the ACL implementation, which could allow an attacker to construct a specially crafted request to bypass the ACL checks.

The 8.8 CVSS base score vulnerability was caused by improper handling of the username provided during authentication. Enabling ACLs in Apache Spark introduces a threat where the HttpSecurityFilter component can be exploited to perform impersonation by allowing an attacker to provide any arbitrary user name.

In other words, this vulnerability could allow an attacker to impersonate any user, regardless of their actual identity or privileges, potentially granting them unauthorized access to sensitive data and functionality within the cluster. It is important to properly configure and secure the ACL system to prevent such attacks.

spark.acls.enable true

Our scanners have picked up over one thousand affected devices across the internet, make sure you are not one of those:

The Risk

The vulnerability can be remotely exploited without any user interaction, meaning that an attacker could gain unauthorized access to the system without the need for a user to unwittingly download malware or perform other risky actions.

If exploited, the vulnerability could potentially allow an attacker to steal sensitive data, modify or delete files, execute shell commands, and even abuse the compromised computing resources to perform malicious and illegal activities, such as botnet creation, DDoS attacks, password cracking, and many other activities that could damage your organization’s reputation and result in serious legal issues.

Affected Versions

CVE-2022-33891 affects versions of Apache Spark 3.0.3 and before, 3.1.1 up to (and including) 3.1.2, and 3.2.0 up to (and including 3.2.1).

Considering that Spark was coded in Scala only since version 3, we could be led to believe that this is a relatively new vulnerability, but according to RedHat, it was present even in version 2.4, which was written in Java. Thankfully, versions with the vulnerable code haven’t been shipped in any release of RedHat.

Mitigation

To mitigate the risk of CVE-2022-33891 you can undertake some of the following steps:

Upgrade to a patched version: Apache Spark has released patched versions that address this vulnerability. Upgrade to the latest patched version to ensure that the vulnerability is mitigated.

Disable ACLs: If you are not using ACLs, you can disable them to eliminate the risk of this vulnerability. However, this may not be a feasible solution in all cases.
Restrict access: Ensure that only authorized users have access to Apache Spark clusters. Restricting access to only trusted users can help prevent unauthorized access to sensitive data and functionality.

Monitor network traffic: Monitor network traffic for any suspicious activity, such as unexpected network connections or data transfers. This can help detect and prevent attacks before they cause damage.
And last but not least, you can subscribe to the DataGrid Surface blog. Right now, we are actively scanning the web for vulnerable Apache Spark machines, so if you suspect that you might be targeted - don’t hesitate to get in touch with us!

Subscribe to be updated on the new content!

DataGridSurface

Apache Spark Input Validation Vulnerability – CVE-2022-33891

The Vulnerability

The Risk

Affected Versions

Mitigation

More like this