Reality of AI in cybersecurity

Reality of AI in cybersecurity

Share post

There is a lot of hype surrounding the use of artificial intelligence (AI) in cybersecurity. The truth is that the role and potential of AI in security are still evolving and much remains to be researched and evaluated. a comment by Chester Wisniewski, Principal Research Scientist, Sophos.

In order to further develop AI as quickly as possible and to be able to use it even more efficiently in security, the overarching exchange between researchers and AI experts is particularly important. For this reason, Sophos AI has committed to openly sharing its research results with the security community in order to make the use of AI more transparent and to actively contribute to the discussion and positioning of AI in cybersecurity. One of the most important topics in the further development of AI in cybersecurity is the different models and methods of how AI learns with old and new data.

"Catastrophic forgetting" as an AI recognition model

Malware detection is the cornerstone of IT security, and AI is the only approach that can learn patterns from millions of new malware samples in a matter of days. When using AI for malware detection, however, two questions arise: Should the model keep all malware samples forever in order to enable optimal detection - but at the expense of learning and updating speed? Or should it do some selective fine-tuning that allows the model to better keep pace with the rate of change of malware - albeit with the risk of forgetting older patterns? The latter is known as "catastrophic forgetting". Today retraining a model takes about a week. It takes about an hour to update a good fine-turning model.

The Sophos AI team wanted to see if it was possible to design a fine-tuning model that could keep pace with the rapid evolution of the threat landscape, learn new patterns, but still remember older samples with minimal impact on performance. The researcher Hillary Sanders took on this task and evaluated a number of update options, which she described in detail in the Sophos AI Blog.

The Discovery Dilemma

Keeping malware detection up to date is no easy task. Every step that one takes to ward off an attack, the opponents acknowledge with new ideas on how to circumvent it. You develop updates using different code or techniques. The result: hundreds of thousands of new malware samples appear every day.

Making detection even more difficult is the fact that the latest and most effective malware is seldom entirely “new”. It is often a combination of new, old, shared, borrowed, or stolen code, as well as adopted and adapted behaviors. In addition, old malware reappears years later and is integrated into new attack methods in order to take defenses by surprise. Ergo, detection models must ensure that they also detect older malware samples, and not just the newest ones.

Update of AI detection models

When it comes to updating AI detection models with new malware samples, vendors have two options.

Option 1: The storage of each individual sample and the recurring re-training of the model with ever larger amounts of data. This leads to better overall performance, but also to slower updates and fewer releases.

Option 2: The detection model is only updated with new samples. This is called fine tuning. At each step of the fine-tuning process, the model updates with the newly added knowledge and with the impact on all available patterns. As a result, the model may “forget” (“catastrophic forgetting”) the old patterns it previously learned. The benefit: Training a model with less data means it can be updated and deployed more quickly to better keep up with rapidly changing malware.

Continuous training of the AI ​​recognition models

Regardless of the two options mentioned, continuous training of the AI ​​detection models with new samples is crucial. Because the patterns that an AI learns from malware samples enable detection not only in relation to what is immediately trained. The AI ​​also recognizes previously unknown samples that are at least somewhat similar to the training data. However, over time, new samples diverge so much that the effectiveness of an old model deteriorates and it has to be updated.

Sophos AI detection 1

The diagram shows how the detection performance decreases over time if the models are not updated (Image: Sophos).

The left side of the diagram (next to the dashed line) shows the model on the time axis with the older trained samples. The recognition rate is consistently high here. On the right-hand side, new samples are added on which the model has not yet been trained, which results in a lower recognition rate.

Three options for updating malware detection

The three options for updating malware detection as rated by Hillary Sanders are:

1. Learning with a selection of old and new samples

This is called a "data rehearsal" and involves a small sample of old samples being mixed with the new, never-before-seen training data. In this way, the model is “remembered” of the old information it needs to recognize older patterns, while at the same time learning to recognize the newer ones.

2. Adjusting the learning speed

With this approach, the learning speed of the model is adjusted. This is achieved by defining how much the model can change after seeing a particular sample. If the learning rate is too fast (in which case the model can change a lot with each sample added), it will only remember the most recent samples. If the learning rate is too slow (the model can change only slightly with each sample added), it will take too long to learn. The difficulty is to find the perfect compromise between learning rate, keeping old information and adding new information.

3. Elastic Weight Consolidation (EWC)

Sophos AI evaluation

In the diagram, all three approaches perform better with older malware samples (left of the dashed line) than with newer samples - right of the dashed line (image: Sophos).

This approach was inspired by the work of Google's DeepMind in 2017. Like an elastic spring, it pulls a new model back onto an older one should it begin to "forget". Hillary Sanders has a more detailed description of this principle on the Sophos AI blog published.

As shown in the diagram, all three approaches performed better with older malware samples (left of the dashed line) than with newer samples (right of the dashed line).

Conclusion

Learning with a selection of old and new samples (data rehearsal) is the best compromise

In malware detection, the ability to remember the past is almost as important as the ability to predict the future. However, this must be weighed against the cost and speed of updating the model with new information. Data rehearsal is a simple and effective way to keep old malware detected while significantly increasing the speed with which new models can be updated and released.

More on this at Sophos.com

 


About Sophos

More than 100 million users in 150 countries trust Sophos. We offer the best protection against complex IT threats and data loss. Our comprehensive security solutions are easy to deploy, use and manage. They offer the lowest total cost of ownership in the industry. Sophos offers award-winning encryption solutions, security solutions for endpoints, networks, mobile devices, email and the web. In addition, there is support from SophosLabs, our worldwide network of our own analysis centers. The Sophos headquarters are in Boston, USA and Oxford, UK.


 

Matching articles on the topic

Report: 40 percent more phishing worldwide

The current spam and phishing report from Kaspersky for 2023 speaks for itself: users in Germany are after ➡ Read more

IT security: NIS-2 makes it a top priority

Only in a quarter of German companies do management take responsibility for IT security. Especially in smaller companies ➡ Read more

Stealth malware targets European companies

Hackers are attacking many companies across Europe with stealth malware. ESET researchers have reported a dramatic increase in so-called AceCryptor attacks via ➡ Read more

Cyber ​​attacks increase by 104 percent in 2023

A cybersecurity company has taken a look at last year's threat landscape. The results provide crucial insights into ➡ Read more

The AI ​​Act and its consequences for data protection

With the AI ​​Act, the first law for AI has been approved and gives manufacturers of AI applications between six months and ➡ Read more

MDR and XDR via Google Workspace

Whether in a cafe, airport terminal or home office – employees work in many places. However, this development also brings challenges ➡ Read more

Mobile spyware poses a threat to businesses

More and more people are using mobile devices both in everyday life and in companies. This also reduces the risk of “mobile ➡ Read more

Test: Security software for endpoints and individual PCs

The latest test results from the AV-TEST laboratory show very good performance of 16 established protection solutions for Windows ➡ Read more