Skip to content Skip to footer

Google Open Sources Magika: AI-Powered File Identification Tool

recently announced the open-sourcing of Magika, an -powered tool that accurately detects binary and textual file types. The software is designed to outperform conventional file identification methods, providing an overall 30% accuracy boost and up to 95% higher precision on traditionally challenging to identify but potentially problematic content such as VBA, JavaScript, and Powershell. To achieve this, Magika uses a custom, highly optimized deep-learning model that enables the precise identification of file types within milliseconds. The software implements inference functions using the Open Neural Network Exchange (ONNX).

uses Magika at scale internally to improve users' safety by routing Gmail, Drive, and Safe Browsing files to the proper security and content policy scanners. In November 2023, the company unveiled RETVec, a multilingual text processing model designed to detect potentially harmful content in Gmail, such as spam and malicious emails.

The company believes deploying at scale can strengthen digital security and “tilt the cybersecurity balance from attackers to defenders.” Google also emphasizes the need for a balanced regulatory approach to usage and adoption to avoid a future where attackers can innovate. Still, defenders are restrained due to governance choices.

To this end, Google's Phil Venables and Royal Hansen noted that “ allows security professionals and defenders to scale their work in threat detection, malware analysis, vulnerability detection, vulnerability fixing, and incident response. affords the best opportunity to upend the Defender's Dilemma and tilt the scales of cyberspace to give defenders a decisive advantage over attackers.”

However, concerns have been raised about using web-scraped data for training generative models, which may also include personal data. The UK Information Commissioner's Office (ICO) pointed out, “if you don't know what your model is going to be used for, how can you ensure its downstream use will respect data protection and people's rights and freedoms?”

Additionally, new research has shown that large language models can function as “sleeper agents,” which may be seemingly innocuous but can be programmed to engage in deceptive or malicious behavior when specific criteria are met or special instructions are provided. startup Anthropic researchers warned that such backdoor behavior can be persistent so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it).

Want to read more? Check out the original article available at The Hacker News!

Read More

Leave a comment

Newsletter Signup

The Grid —
The Matrix Has Me
Big Bear Lake, CA 92315

01010011 01111001 01110011 01110100 01100101 01101101 00100000
01000110 01100001 01101001 01101100 01110101 01110010 01100101

- Dade: You look good in a dress.
- Kate: You would have looked better.
Dade & Kate

Deitasoft © 2024. All Rights Reserved.