We attribute the effectiveness of our products to the HuMachine Intelligence concept, which is a basis of what we call True Cybersecurity. The essence of HuMachine Intelligence is a fusion of three fundamental things: big data, machine learning, and our analysts’ expertise. But what is behind these words? Let us try to explain this without getting deep into technical details.
Big data and threat intelligence
The term big data should not be taken literally, as if it were a large array of information stored somewhere. It is not just a database; it is a combination of technologies that allows for instant processing of large volumes of data in order to extract threat intelligence. In this case, that data relates to all objects, whether they are clear, purely malicious, or potentially usable for malicious means. For our purposes, big data means, first of all, a vast collection of malicious objects. Secondly, it includes the distributed Kaspersky Security Network, which constantly delivers new malicious objects and multiaspect data on various cyberthreats from all over the world. Thirdly, big data refers to the miscellaneous categorization tools that process the data.
A collection of malicious objects
We have devoted ourselves to computer security for more than 20 years, and over that span of time we have analyzed a large number of objects. Information about them is securely stored in our databases. And when we speak about “objects,” we refer not only to files or chunks of code but to Web addresses, certificates, and execution log files for clean and malicious applications as well. All of these data are stored not only with labels such as “dangerous” or “safe,” but also with information about the relationships between objects: what website a specific file was downloaded from, what other files were downloaded from this website, and so forth.
Kaspersky Security Network (KSN)
KSN is our cloud security service. One of its functions is to quickly block the newest threats on the client side. At the same time, it allows each and every client to participate in increasing global security by sending depersonalized metadata on detected threats into the cloud. We study each detected threat from various angles and add its traits to our threat databases. With that done, our expert systems can precisely detect not only that threat, but also similar ones. Thus, our collection receives fresh data in real time.
Categorization tools are internal technologies that enable us to process information we collect, register the abovementioned relationships between malicious objects.
The technology of machine learning
Outlining what machine learning is and how it is used in Kaspersky Lab is no small task. We have to begin by explaining that we use a multilayered approach. Therefore, the algorithms of machine learning are used in different subsystems at different layers.
Every day our systems receive hundreds of thousands of objects that need prompt analysis and categorization (as dangerous or not). Even more than 10 years ago, we knew we could not manage without automation. The first task was understanding if a suspicious file resembled a malicious one we already had. That solution used machine learning: We wrote an application that analyzed the entire collection and, when a new file was fed into its input, informed our analysts what the closest object in our collection was.
It soon became clear that knowing whether an object was similar to other malicious objects was not enough. We needed a technology that would allow the system to give a verdict independently. So we built a technology based on decision trees. Trained on our vast collection of malicious objects, the technology detected an array of criteria, specific combinations of which may serve as indicators that unambiguously define a new file as dangerous. While analyzing a file, a mathematical model “asks” the antivirus engine questions such as these:
- Is the file larger than 100 kilobytes?
- If so, then is this file compressed?
- If it is not, are its section names something a human might choose or nonsensical?
- If it is the former, then…
And the list of questions goes on.
After answering all of these questions, the antivirus engine receives a verdict from the mathematical model. The verdict can be either “the file is clean” or “the file is dangerous.”
Behavioral mathematical model
Following our principle of multilayered security, our mathematical models are also used for dynamic detection. As a matter of fact, a mathematical model can analyze the behavior of an executable file right when it is executed. It is possible to build and train the model according to the same principles as the ones applied to mathematical models of static detection, but using execution log files as “training material” instead. However, there is one big difference. In field conditions, we cannot afford to wait until the code finishes executing. The decision should be made after analyzing a minimum of actions. Currently, the pilot of this technology, based on deep learning, is showing excellent results.
Experts in machine learning agree that no matter how clever a mathematical model is, a human will always be able to get past it, especially if the person is creative and can get a look at how the technology works, or if there is much time for running numerous experiments and tests. This is why first, each part of the model must be updateable; second, the infrastructure must work perfectly; and third, a human must supervise the robot. Those are all the case here.
About 20 years ago, our Anti-Malware Research (AMR) team operated without the assistance of automatic systems. Today, most threats are detected by expert systems that were trained by our researchers. In some cases, the system cannot give an unequivocal verdict or thinks that the object is malicious but cannot relate it to any known family. Then, the system sends a warning to an AMR analyst on duty, providing a full set of indicators so that the analyst can make the final decision.
Detection Methods Analysis Group
Inside our AMR is a dedicated research team called the Detection Methods Analysis Group, which was created in 2007 specifically to work on our machine-learning systems. Currently, only the head of the department is an experienced virus analyst. The other employees are pure data scientists.
Global Research and Analysis Team (GReAT)
Last but not least, let us speak about our Global Research and Analysis Team (GReAT). The researchers on this team investigate the most complicated threats: APT, cyberespionage campaigns, major malware outbreaks, ransomware, and underground cybercriminal trends around the world. Their unique expertise on the techniques, tools, and schemes of cyberattacks enables us to develop new methods of protection that can stop even the most complex attacks.
We have not yet discussed even half of our Next Gen technologies and departments involved in the development of our solutions. Many other experts and different methods of machine learning work together to protect you optimally, but we wanted to illustrate the HuMachine Intelligence principle specifically.