Skip to content Skip to footer

Malicious AI Models Discovered in Hugging Face Platform: Security Threats and Risks of Prompt Injection

A security firm, JFrog, has discovered around 100 malicious artificial intelligence ()/machine learning (ML) models on the Hugging Face platform. These models contain code execution issues, where loading a pickle file can lead to the attacker gaining complete control over the victim's machine via a . The payload of the rogue model allows the attacker to initiate a reverse shell connection to IP address 210.117.212[.]93, which belongs to the Korea Research Environment Open Network (KREONET), enabling access to critical internal systems and potentially paving the way for large-scale data breaches and corporate espionage. The authors of one of the models even warned users against downloading it, raising the possibility of researchers or practitioners publishing the model. However, posting real working exploits or malicious code is against the principle of security research, which was breached when the malicious code attempted to connect back to a genuine IP address.

This discovery highlights the threat posed by repositories that are at risk of being poisoned for nefarious activities. Researchers have also developed efficient ways to generate prompts that can elicit harmful responses from large-language models (LLMs) using beam search-based adversarial attacks (BEAST). Additionally, security researchers have developed a generative worm called Morris II, capable of stealing data and spreading through multiple systems. Morris II uses adversarial self-replicating prompts encoded into inputs, such as and text, to trigger models to replicate the input as output and engage in malicious activities.

The attack technique, ComPromptMized, is similar to traditional approaches like buffer overflows and SQL injections, embedding code inside a query and data into regions known to hold executable code. It affects applications that rely on the output of a generative service or those that use retrieval augmented generation (RAG) to enrich query responses. The study is not the first to explore prompt injection to attack LLMs, as previous attacks using and audio recordings have injected invisible “adversarial perturbations” into multi-modal LLMs, causing the model to output attacker-chosen text or instructions. The victim can be lured by the attacker to a webpage with an exciting image or an email with an audio clip. When the victim inputs the image or clip into an isolated LLM and asks questions about it, the model will be steered by attacker-injected prompts.

Leave a comment

Newsletter Signup

The Grid —
The Matrix Has Me
Big Bear Lake, CA 92315

01010011 01111001 01110011 01110100 01100101 01101101 00100000
01000110 01100001 01101001 01101100 01110101 01110010 01100101

Remember, hacking is more than just a crime. It's a survival trait.Razor

Deitasoft © 2024. All Rights Reserved.