Skip to content Skip to footer

AI Developer Platform Hugging Face Exposed to Malware and Backdoors

The AI developer platform Hugging Face has recently been the center of attention due to a report by security firm JFrog. The report suggests that code submitted to the platform had covertly installed backdoors and other types of on end-user machines. The discovery has raised concerns about the security of machine learning models available on AI sites. Researchers from JFrog have analyzed around 100 submissions and found that most flagged machine-learning models appeared benign proofs of concept uploaded by researchers or curious users. However, 10 were found to be “truly malicious” and compromised users' security when loaded.

One particular model drew more attention as it opened a reverse shell that gave a remote device on the Internet complete control of the end user's device. The submission loaded a reverse shell but took no further action when the researchers loaded it into a lab machine. The remote device's IP address and identical shells connecting elsewhere raised the possibility that the submission was also the work of researchers. However, an exploit that opens a device to such tampering is a significant breach of researcher ethics. Therefore, models available on AI sites can pose serious risks if not carefully vetted first, similar to code submitted to and other developer platforms.

The model that spawned the reverse shell was submitted by a party with the username baller432 and could evade Hugging Face's scanner by using Pickle's “reduce” method to execute arbitrary code after loading the model file. Pickle is commonly used in to convert objects and classes in human-readable code into a byte stream so that it can be saved to disk or shared over a network. This process, known as serialization, allows hackers to sneak malicious code into the flow.

JFrog's Senior Researcher David Cohen explained the process in more technically detailed language. He said that a common approach involves utilizing the torch in loading PyTorch models with transformers.load() function, which deserializes the model from a file. When dealing with PyTorch models trained with Hugging Face's Transformers library, this method is often employed to load the model along with its architecture, weights, and any associated configurations. Transformers provide a comprehensive framework for natural language processing tasks, facilitating the creation and deployment of sophisticated models. In the context of the repository “baller423/goober2,” the malicious payload was injected into the PyTorch model file using the reduce method of the pickle module. As demonstrated in the provided reference, this method enables attackers to insert arbitrary code into the deserialization process, potentially leading to malicious behavior when the model is loaded.

Hugging Face has since removed the model and the others flagged by JFrog. However, it is worth noting that malicious user submissions to open-source code repositories have been a fact of life for almost a decade. In October 2018, a package that sneaked into received 171 downloads before researchers discovered it contained hidden code designed to steal cryptocurrency from developer machines. A month later, someone snuck a similar into event-stream, a code library with 2 million downloads from . of the bitcoin wallet CoPay went on to incorporate the trojanized version of event-stream into an update, which caused CoPay to steal bitcoin wallets from end users by transferring their balances to a server in Kuala Lumpur.

Therefore, JFrog's discovery suggests that such submissions to Hugging Face and other AI developer platforms are likely to be expected. This discovery highlights the significance of ensuring that models available on AI sites are thoroughly vetted and that should pay close attention to the serialization process.

Leave a comment

Newsletter Signup
Address

The Grid —
The Matrix Has Me
Big Bear Lake, CA 92315

01010011 01111001 01110011 01110100 01100101 01101101 00100000
01000110 01100001 01101001 01101100 01110101 01110010 01100101

What you see on these screens up here is a fantasy; a computer enhanced hallucination!Stephen Falken

Deitasoft © 2024. All Rights Reserved.