{"id":43886,"date":"2022-03-11T05:51:59","date_gmt":"2022-03-11T10:51:59","guid":{"rendered":"https:\/\/www.kaspersky.com\/blog\/?post_type=emagazine&#038;p=43886"},"modified":"2022-07-27T06:48:32","modified_gmt":"2022-07-27T10:48:32","slug":"machine-learning-training-security","status":"publish","type":"emagazine","link":"https:\/\/www.kaspersky.com\/blog\/secure-futures-magazine\/machine-learning-training-security\/43886\/","title":{"rendered":"How to keep a business safe from attacks through AI&#8217;s machine-learning"},"content":{"rendered":"<p>We hear every day of new uses companies find for artificial intelligence (AI,) but security of machine-learning models that underpin AI doesn\u2019t get as much attention. When machine learning goes wrong, it\u2019s never good for business \u2013 like when a <a href=\"https:\/\/arxiv.org\/abs\/1908.07125\" target=\"_blank\" rel=\"noopener nofollow\">text-generation model spews racists slurs<\/a> or <a href=\"https:\/\/arxiv.org\/abs\/1702.08138\" target=\"_blank\" rel=\"noopener nofollow\">social media comment filters are tricked into displaying toxic comments<\/a>.<\/p>\n<p>But some businesses are taking action \u2013 especially those at the top of their game. The cybersecurity community is producing guidelines like <a href=\"https:\/\/atlas.mitre.org\/\" target=\"_blank\" rel=\"noopener nofollow\">MITRE ATLAS to help businesses to secure their machine learning models<\/a> and tech giants like <a href=\"https:\/\/www.microsoft.com\/security\/blog\/2020\/10\/22\/cyberattacks-against-machine-learning-systems-are-more-common-than-you-think\/\" target=\"_blank\" rel=\"noopener nofollow\">Microsoft<\/a> and <a href=\"https:\/\/www.wired.com\/story\/facebooks-red-team-hacks-ai-programs\/\" target=\"_blank\" rel=\"noopener nofollow\">Meta<\/a> are assembling expert teams to safeguard the machine learning beneath their business-critical AI.<\/p>\n<p>One way Kaspersky protects the machine-learning technologies powering everything from our malware detection to antispam, is we attack our own models.<\/p>\n\t\t\t<div class=\"c-promo-product\">\n\t\t\t\t\t\t<article class=\"c-card c-card--link c-card--medium@sm c-card--aside-hor@lg\">\n\t\t\t\t<div class=\"c-card__body  \">\n\t\t\t\t\t<header class=\"c-card__header\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<p class=\"c-card__headline\">Ethical AI fights unethical AI<\/p>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<h3 class=\"c-card__title \"><span>When AI goes low, go high<\/span><\/h3>\n\t\t\t\t\t\t\t\t\t\t\t<\/header>\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"c-card__desc \">\n\t\t\t\t\t\t\t<p>AI is maligned for nefarious uses. But where AI is a problem, it may also be a solution.<\/p>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<div class=\"c-card__aside\">\n\t\t\t\t\t<a href=\"https:\/\/www.kaspersky.com\/blog\/secure-futures-magazine\/ethical-ai-artificial-intelligence\/42213\/\" class=\"c-button c-card__link\" target=\"_blank\" rel=\"noopener nofollow\">Read more<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/article>\n\t\t<\/div>\n\t\n<h2>Machine learning is vulnerable because it learns like us<\/h2>\n<p>Known threats to machine-learning models, called adversarial examples, don\u2019t stem from error or misconfiguration \u2013 they\u2019re <a href=\"https:\/\/arxiv.org\/abs\/1905.02175\" target=\"_blank\" rel=\"noopener nofollow\">part of how the machine-learning model is made<\/a>.<\/p>\n<p>Alexey Antonov, Lead Data Scientist at Kaspersky and expert in machine learning-based malware detection, has a great way to describe these threats.<\/p>\n<blockquote><p>We sometimes see optical illusions because our brains interpret images using past experience. An artist who understands that can exploit it. The same is true of machine learning threats.<\/p>\n<cite><p>Alexey Antonov, Lead Data Scientist, Kaspersky<\/p><\/cite><\/blockquote>\n<p>Antonov\u2019s team set out to <a href=\"https:\/\/securelist.com\/how-to-confuse-antimalware-neural-networks-adversarial-attacks-and-protection\/102949\/\" target=\"_blank\" rel=\"noopener\">attack their own malware detection model<\/a>. Rather than being programmed with rules, this model learns to tell malicious from legitimate software by training on troves of malware examples collected over the years. It\u2019s like how our minds learn \u2013 and like our minds, it can be fooled.<\/p>\n<p>\u201cWe craft specific data to confuse the algorithm\u201d, says Antonov. \u201cFor example, you can glue pieces of data to a malicious file until the model stops recognizing it as malicious.\u201d<\/p>\n<p>Unlike bugs in traditional software, adversarial examples are hard to fix. There\u2019s not yet a universal way to protect against them, but you can improve your machine learning\u2019s security by adding adversarial examples to training data.<\/p>\n<h2>Diluting the impact of poisoned data<\/h2>\n<p>Machine-learning models are so called because they aim to model what happens in the real world. But what they really do, is describe the data used to train them mathematically. It can produce biased results if the training data is biased, for example, a <a href=\"http:\/\/proceedings.mlr.press\/v81\/buolamwini18a\/buolamwini18a.pdf\" target=\"_blank\" rel=\"noopener nofollow\">face-recognition model trained predominantly on white faces will struggle to recognize People of Color<\/a>. If an adversary can modify your training data (for example, if you use openly available datasets,) they will change your models \u2013 known as \u2018data poisoning.\u2019<\/p>\n<p>Nikita Benkovich, Head of Technology Research at Kaspersky, says his team realized a <a href=\"https:\/\/securelist.com\/attack-on-anti-spam-machine-learning-model-deepquarantine\/105358\/\" target=\"_blank\" rel=\"noopener\">model protecting enterprises from spam could be attacked with data poisoning<\/a>. Models are frequently retrained because spam is always evolving, so an adversary could send spam emails using a legitimate company\u2019s <a href=\"https:\/\/sendpulse.com\/support\/glossary\/email-header\" target=\"_blank\" rel=\"noopener nofollow\">technical email headers<\/a>, perhaps causing the model to stop all customers receiving that company\u2019s real emails.<\/p>\n<p>\u201cWe had many questions,\u201d says Benkovich. \u201cCan you actually do it? How many emails would we need? And can we fix it?\u201d<\/p>\n<p>After verifying such an attack was possible, they looked at ways to protect the system, coming up with a statistical test that would flag anything suspect.<\/p>\n<h2>How businesses can prevent adversarial attacks on AI<\/h2>\n<p>Adversarial attacks can affect fields as broad as <a href=\"https:\/\/arxiv.org\/abs\/1906.11897\" target=\"_blank\" rel=\"noopener nofollow\">object detection<\/a> to <a href=\"https:\/\/arxiv.org\/abs\/2004.15015\" target=\"_blank\" rel=\"noopener nofollow\">machine translation<\/a>, but Alexey says real-world adversarial attacks aren\u2019t yet happening on a large scale. \u201cThese attacks need highly-skilled data scientists and much effort, but they can be part of <a href=\"https:\/\/apt.securelist.com\/\" target=\"_blank\" rel=\"noopener\">Advanced Persistent Threats (APTs)<\/a> for targeted attacks. Also, if a security solution relies solely on machine learning, an adversarial attack can be highly profitable \u2013 fooling the algorithm once, the adversary can use the same method to make new strains of malware the algorithm can\u2019t detect. That\u2019s why we use a multi-layered approach.\u201d<\/p>\n<p>Benkovich says pay close attention to where your machine-learning training data comes from, or \u2018data provenance.\u2019 \u201cKnow where your training sample comes from.\u201d<\/p>\n<blockquote><p>Use diverse data because it makes poisoning harder. If an attacker poisons an open dataset, your hand-picked one might be harder to meddle with. Monitor the machine-learning training process and test models before deployment.<\/p>\n<cite><p>Nikita Benkovich, Head of Technology Research, Kaspersky<\/p><\/cite><\/blockquote>\n<p>Both experts agree the best way to keep your models protected is to test by attacking them yourself before others do. Alexey quotes Chinese military strategist Sun Tzu for the best advice: \u201cIf you know the enemy and know yourself, you need not fear the result of a hundred battles.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When using AI, training machine-learning models with quality data matters. But are businesses vulnerable to attack through their AI training data?<\/p>\n","protected":false},"author":2544,"featured_media":43890,"template":"","coauthors":[3585],"class_list":{"0":"post-43886","1":"emagazine","2":"type-emagazine","3":"status-publish","4":"has-post-thumbnail","6":"emagazine-category-artificial-intelligence","7":"emagazine-category-safer-business","8":"emagazine-tag-machine-learning"},"hreflang":[{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/secure-futures-magazine\/machine-learning-training-security\/43886\/"},{"hreflang":"en-us","url":"https:\/\/usa.kaspersky.com\/blog\/secure-futures-magazine\/machine-learning-training-security\/26247\/"}],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/emagazine\/43886","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/emagazine"}],"about":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/emagazine"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2544"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/43890"}],"wp:attachment":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=43886"}],"wp:term":[{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/coauthors?post=43886"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}