{"id":53327,"date":"2025-04-23T14:57:10","date_gmt":"2025-04-23T18:57:10","guid":{"rendered":"https:\/\/www.kaspersky.com\/blog\/?p=53327"},"modified":"2025-04-23T14:57:10","modified_gmt":"2025-04-23T18:57:10","slug":"ai-slopsquatting-supply-chain-risk","status":"publish","type":"post","link":"https:\/\/www.kaspersky.com\/blog\/ai-slopsquatting-supply-chain-risk\/53327\/","title":{"rendered":"How AI creates &#8220;slopsquatting&#8221; supply-chain risks"},"content":{"rendered":"<p>AI-generated code is already widespread \u2014 by some estimates <a href=\"https:\/\/itsconchur.substack.com\/p\/41-of-code-is-now-ai-generated-should\" target=\"_blank\" rel=\"nofollow noopener\">around 40% of new code<\/a> this past year was written by AI. Microsoft CTO Kevin Scott predicts that in five years this figure will <a href=\"https:\/\/www.businessinsider.com\/microsoft-cto-ai-generated-code-software-developer-job-change-2025-4\" target=\"_blank\" rel=\"nofollow noopener\">hit 95%<\/a>. How to properly maintain and protect such code is a burning issue.<\/p>\n<p>Experts still <a href=\"https:\/\/www.researchgate.net\/publication\/362859580_Security_Implications_of_Large_Language_Model_Code_Assistants_A_User_Study\" target=\"_blank\" rel=\"nofollow noopener\">rate the security of AI code as low<\/a>, as it\u2019s teeming with <a href=\"https:\/\/www.researchgate.net\/publication\/364458131_An_empirical_evaluation_of_GitHub_copilot%27s_code_suggestions\" target=\"_blank\" rel=\"nofollow noopener\">all the classic coding flaws<\/a>: vulnerabilities (SQL injections, embedded tokens and secrets, insecure deserialization, XSS), logical defects, outdated APIs, insecure encryption and hashing algorithms, no handling of errors and incorrect user input, and much more. But using an AI assistant in software development adds another unexpected problem: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hallucination_(artificial_intelligence)\" target=\"_blank\" rel=\"nofollow noopener\">hallucinations<\/a>. A new study examines in detail <a href=\"https:\/\/arxiv.org\/pdf\/2406.10279\" target=\"_blank\" rel=\"nofollow noopener\">how large language models (LLMs) create hallucinations that pop up in AI code.<\/a> It turns out that some third-party libraries called by AI code simply don\u2019t exist.<\/p>\n<h2>Fictitious dependencies in open-source and commercial LLMs<\/h2>\n<p>To study the phenomenon of phantom libraries, the researchers prompted 16 popular LLMs to generate 576,000 Python and JavaScript code samples. The models showed varying degrees of imagination: GPT4 and GPT4 Turbo hallucinated the least (fabricated libraries were seen in less than 5% of the code samples); next came DeepSeek models (more than 15%); while CodeLlama 7B was the most fantasy-prone (more than 25%). What\u2019s more, even the parameters used in LLMs to control randomness (temperature, top-p, top-k) are unable to reduce the hallucination rate to insignificant values.<\/p>\n<p>Python code contained fewer fictitious dependencies (16%) than JavaScript (21%). Age is also a contributing factor. Generating code using packages, technologies and algorithms that started trending only this past year results in 10% more non-existent packages.<\/p>\n<p>But the most dangerous aspect of phantom packages is that their names aren\u2019t random, and neural networks reference the same libraries over and over again. That was demonstrated by stage two of the experiment, in which the researchers selected 500 prompts that had provoked hallucinations, and re-ran each of them 10 times. This revealed that <strong>43% of hallucinated packages crop up during each code generation run<\/strong>.<\/p>\n<p>Also of interest is the naming of hallucinated packages: 13% were typical \u201ctypos\u201d that differed from the real package name by only one character; 9% of package names were borrowed from another development language (Python code, npm packages); and a further 38% were logically named but differed more significantly from the real package names.<\/p>\n<h2>Meet slopsquatting<\/h2>\n<p>All of the can provoke a new generation of attacks on open-source repositories, which has already been dubbed \u201cslopsquatting\u201d by analogy with <a href=\"https:\/\/www.kaspersky.com\/blog\/devops-security-hybrid\/36021\/\" target=\"_blank\" rel=\"noopener nofollow\">typosquatting<\/a>. In this case, squatting is made possible not by names with typos, but by names from AI slop (low-quality output). Because AI-generated code repeats package names, attackers can run popular models, find recurring hallucinated package names in the generated code, and publish real\u00a0\u2014 and malicious \u2014 libraries with these same names. If someone mindlessly installs all packages referenced in the AI-generated code, or the AI assistant installs the packages by itself, a malicious dependency gets injected into the compiled application, exposing the supply chain to a full-blown attack (<a href=\"https:\/\/attack.mitre.org\/techniques\/T1195\/001\/\" target=\"_blank\" rel=\"nofollow noopener\">ATT&amp;CK T1195.001<\/a>). This risk is set to rise significantly with the advance of vibe coding\u00a0\u2014 where the programmer writes code by giving instructions to AI with barely a glance at the actual code produced.<\/p>\n<p>Given that all major open-source repositories have been hit by dozens of malicious packages this past year (<a href=\"https:\/\/thehackernews.com\/2024\/02\/new-malicious-pypi-packages-caught.html\" target=\"_blank\" rel=\"nofollow noopener\">1<\/a>, <a href=\"https:\/\/thehackernews.com\/2025\/04\/rogue-npm-packages-mimic-telegram-bot.html\" target=\"_blank\" rel=\"nofollow noopener\">2<\/a>), and <a href=\"https:\/\/www.infoworld.com\/article\/3953841\/sonatype-warns-of-18000-open-source-malware-packages.html\" target=\"_blank\" rel=\"nofollow noopener\">close to 20,000 malicious libraries<\/a> have been discovered in the same time period, we can be sure that someone out there will try to conveyorize this new type of attack. This scenario is especially dangerous for amateur programmers, as well as for corporate IT departments that solve some automation tasks internally.<\/p>\n<h2>How to stop slopsquatting and use AI safely<\/h2>\n<p>Guidelines on the safe implementation of AI in development already exist (for example, <a href=\"https:\/\/genai.owasp.org\/initiatives\/\" target=\"_blank\" rel=\"nofollow noopener\">OWASP<\/a>, <a href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/ai\/NIST.AI.100-1.pdf\" target=\"_blank\" rel=\"nofollow noopener\">NIST<\/a> and <a href=\"https:\/\/www.kaspersky.com\/blog\/ai-safe-deployment-guidelines\/52789\/\" target=\"_blank\" rel=\"noopener nofollow\">our own<\/a>), but these tend to describe a very broad range of measures, many of which are long and complicated to implement. Therefore, we\u2019ve compiled a small subset of easy-to-implement measures to address the specific problem of hallucinated packets:<\/p>\n<ul>\n<li>Make source-code scanning and static security testing part of the development pipeline. All code, including AI-generated, must meet clear criteria are: no embedded tokens or other secrets; use of correct versions of libraries and other dependencies, and so forth. These tasks are well integrated into the CI\/CD cycle \u2014 for example, with the help of our <a href=\"https:\/\/www.kaspersky.com\/enterprise-security\/container-security?icid=gl_kdailyplacehold_acq_ona_smm__onl_b2b_kasperskydaily_wpplaceholder____\" target=\"_blank\" rel=\"noopener nofollow\">Kaspersky Container Security<\/a>.\n<\/li><li>Introduce additional AI validation cycles where the LLM checks its own code for errors, to reduce the number of hallucinations. In addition, the model can be prompted to analyze the popularity and usability of each package referenced in a project. Using a prebuilt database of popular libraries to fine-tune the model and allow retrieval-augmented generation (RAG) also reduces the number of errors. By combining all these methods, the authors of the study were able to cut the number of hallucinated packages to 2.4% for DeepSeek and 9.3% for CodeLlama. Unfortunately, both figures are too far off zero for these measures to suffice.<\/li>\n<li>Ban the use of AI assistants in coding critical and trusted components. For non-critical tasks where AI-assisted coding is allowed, assign a component developer to build a code review process. For the review, there needs to be a checklist tailored to AI code.<\/li>\n<li>Draw up a fixed list of trusted dependencies. AI assistants and their flesh-and-blood users must have limited scope to add libraries and dependencies to the code\u00a0\u2014 ideally, only libraries from the organization\u2019s internal repository, tested and approved in advance, should be available.<\/li>\n<li>Train developers. They must be well versed in AI security in general, as well as in the context of AI use in code development.<\/li>\n<\/ul>\n<input type=\"hidden\" class=\"category_for_banner\" value=\"mdr\"><input type=\"hidden\" class=\"placeholder_for_banner\" data-cat_id=\"mdr\" value=\"49324\">\n","protected":false},"excerpt":{"rendered":"<p>Popular AI code assistants try to call non-existent libraries. But what happens if attackers actually create them?<\/p>\n","protected":false},"author":2722,"featured_media":53328,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1999,3051,3052],"tags":[1140,1289,4638,4642,1876,97,2718],"class_list":{"0":"post-53327","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-business","8":"category-enterprise","9":"category-smb","10":"tag-ai","11":"tag-development","12":"tag-devsecops","13":"tag-llm","14":"tag-machine-learning","15":"tag-security-2","16":"tag-transparency"},"hreflang":[{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/ai-slopsquatting-supply-chain-risk\/53327\/"},{"hreflang":"en-in","url":"https:\/\/www.kaspersky.co.in\/blog\/ai-slopsquatting-supply-chain-risk\/28774\/"},{"hreflang":"en-ae","url":"https:\/\/me-en.kaspersky.com\/blog\/ai-slopsquatting-supply-chain-risk\/24010\/"},{"hreflang":"en-gb","url":"https:\/\/www.kaspersky.co.uk\/blog\/ai-slopsquatting-supply-chain-risk\/28889\/"},{"hreflang":"ru","url":"https:\/\/www.kaspersky.ru\/blog\/ai-slopsquatting-supply-chain-risk\/39414\/"},{"hreflang":"ru-kz","url":"https:\/\/blog.kaspersky.kz\/ai-slopsquatting-supply-chain-risk\/29053\/"},{"hreflang":"en-au","url":"https:\/\/www.kaspersky.com.au\/blog\/ai-slopsquatting-supply-chain-risk\/34833\/"},{"hreflang":"en-za","url":"https:\/\/www.kaspersky.co.za\/blog\/ai-slopsquatting-supply-chain-risk\/34465\/"}],"acf":[],"banners":"","maintag":{"url":"https:\/\/www.kaspersky.com\/blog\/tag\/ai\/","name":"AI"},"_links":{"self":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/53327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2722"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/comments?post=53327"}],"version-history":[{"count":2,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/53327\/revisions"}],"predecessor-version":[{"id":53330,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/53327\/revisions\/53330"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/53328"}],"wp:attachment":[{"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=53327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/categories?post=53327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kaspersky.com\/blog\/wp-json\/wp\/v2\/tags?post=53327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}