
Best recognition rates when using multimodal AI: Instead of analyzing individual events, multimodal AI simultaneously examines entire data streams, evaluates images and text, and thus recognizes complex relationships more quickly.
Given today's threat landscape, artificial intelligence (AI) is not an option in cyber defense; it's a requirement. But even here, development must continue to advance to stay one step ahead of cybercriminals in their game of cat and mouse. In this context, Younghoo Lee, Principal Data Scientist at Sophos X-Ops, has taken a closer look at the effectiveness of multimodal AI for even better detection and classification of spam, phishing, and unsafe web content.
Monitoring multiple data streams
Multimodal AI is a system that integrates different data types into a unified analysis framework. It represents a significant shift in the development and use of AI in cyber defense by enabling it to process multiple data streams simultaneously and synthesize data from multiple inputs, instead of traditional single-mode analysis. This now makes it possible to process both text and image content simultaneously and anticipate their complex relationships.
For example, in phishing detection, multimodal AI examines the linguistic patterns and writing style of text, as well as the visual fidelity of logos and brand elements. At the same time, it analyzes the semantic consistency between text and image components. This holistic approach enables the system to detect complex attacks that might appear legitimate to traditional systems. Furthermore, multimodal AI learns from the relationships between different data types and adapts automatically.
Highly effective in detecting
The effectiveness of multimodal AI is significantly higher than that of traditional machine learning models. For comparison, SophosAI conducted a series of empirical experiments with resounding success. The results: traditional models performed well in detecting known threats, but struggled with new, unknown phishing emails. Their F1 scores (a measure of precision and accuracy between 0 and 1) were as low as 0,53 for unknown samples, reaching a peak of 0,66. Multimodal AI (using GPT-4o) performed much better in the tests for detecting new phishing attempts, achieving F1 scores as high as 0,97 even for unknown brands.
"AI is a key component in cyber defense and, combined with purely technical defense at the endpoint and the still necessary human detection, provides excellent protection," says Michael Veit, security expert at Sophos. "In conjunction with the Sophos cybersecurity ecosystem, multimodal AI represents another milestone and will elevate cyber defense to a significantly higher level of detection."
More at Sophos.com
About Sophos More than 100 million users in 150 countries trust Sophos. We offer the best protection against complex IT threats and data loss. Our comprehensive security solutions are easy to deploy, use and manage. They offer the lowest total cost of ownership in the industry. Sophos offers award-winning encryption solutions, security solutions for endpoints, networks, mobile devices, email and the web. In addition, there is support from SophosLabs, our worldwide network of our own analysis centers. The Sophos headquarters are in Boston, USA and Oxford, UK.