Virtualization Technology News and Information
Article
RSS
Chainguard 2024 Predictions: Securing the AI/ML Supply Chain Will Take Center Stage, Following a Wave of New Attacks

vmblog-predictions-2024 

Industry executives and experts share their predictions for 2024.  Read them in this 16th annual VMblog.com series exclusive.

Securing the AI/ML Supply Chain Will Take Center Stage, Following a Wave of New Attacks

By Dustin Kirkland, VP of Engineering at Chainguard

It might seem like we've just hit a stride progressing towards a more secure software supply chain. But there's now an entirely new, burgeoning set of infrastructure and tooling to secure with the rapid adoption of machine learning and generative AI workflows. In 2024, we'll likely learn some hard lessons about the exploitable weaknesses in LLMs and data pipelines. Just as today's ML workflows apply techniques and processes from the software lifecycle to the data lifecycle, the practice of "AI/ML Supply Chain Security'' will need to apply software security techniques and processes to secure an AI model's lifecycle.

The AI reckoning nears

It seems like every company from startups to enterprises and beyond are betting big on AI/ML. To that end, there are innumerable Large Language Models (LLMs) being trained on massive data sets. The big question is, ‘how are these organizations securing these training models and data sets?' The likely assumption is they aren't. Just as in the rise of the DevOps "build fast and break things" motto, the adoption of AI/ML tooling and workflows is progressing quickly so organizations can remain competitive in their industries.

In 2024, we'll see the first large-scale data breach (on the order of Equifax, Yahoo!, or T-Mobile) of model training data, where that training data ends up in the wrong hands of malicious attackers. In the worst case, it will be a data leak that includes personally identifiable data (PII), but at the very least, it will lead to attackers gaining access to some corporate crown jewels: proprietary training information, company secrets, and strategic intentions.

Cambridge Analytica 2.0?

Still on the topic of LLM training, let's not forget that 2024 is an election year in the US. The race for bad actors or nation states to interfere with elections leveraging the rise of AI chat bots and tooling will be ripe, despite significant progress at the federal government level to prevent this influence.

In 2024, we could see attackers attempt to maliciously influence models by injecting their own data into the training. It might be a new flavor of ‘Cambridge Analytica', but at AI scale, rather than social media platforms. Instead of modifying and exploiting human behavior and interactions with news stories and headlines, we could see attackers try to target influencing training models through deliberately biased or maliciously curated data.

AI will 10X the software "transitive" trust problem

Today's software distribution methods have developers downloading open source packages and modules outside of distros. Docker, Helm, and immense language libraries, such as Golang, Python, Java, Node, Ruby, et al., present layers and layers of "transitive" trust challenges. Not only is it tougher to know that the software you are installing has not been tampered with-you are also implicitly trusting the other transitive dependencies that software was built on top of. AI's vast landscape of frameworks, language primitives and LLMs multiplies the number of moving parts in the transitive trust equation, and obscures even more of the trust visibility. A typical software supply chain has upstream libraries and first-party code fed into a compiler to produce a binary, and the build process itself is often inexpensive enough to allow for independent reproduction. For AI systems, there's often gigabytes or terabytes of training data that enters the conversation, and the expensive training step (the analog of a build step) may use hundreds of thousands of dollars worth of computing resources. Software dependencies often need to be on the cutting edge for the best results, so developers update with abandon. So AI takes all of the inputs that make software supply chain security a challenge in the first place, and turns them up to 11. We will see bad actors leverage this and move toward transitive dependencies as a lower-hanging fruit attack vector, for example: installing malware in a binary JAR that you cannot easily inspect as the consumer of it. 

AI might help the good guys, but attackers are racing to leverage flaws

2024 will bring a race between defenders and attackers to leverage or exploit AI workflows and tooling. This is a vast space to keep an eye on into the next year. For example, will we see a reality where AI generated code actually has fewer vulnerabilities, thus presenting fewer weaknesses for attackers to exploit? Or will AI code generators actually introduce more critical, complicated, harder for humans to spot vulnerabilities? Only time will tell, but the race is on and we can expect to see headlines on both sides of the spectrum, and hopefully, there will be more to celebrate vs. chasing chaos.

##

ABOUT THE AUTHOR

Dustin Kirkland 

Dustin is currently the VP of Engineering at Chainguard. Spanning 25 years as a software engineer, product manager, and executive leader in both CTO and CPO roles, Dustin has spent more than a decade building hardware, software, and services products at some of the world's largest companies (IBM, Google, and Goldman Sachs), plus another decade leading growth-mode startups (Canonical/Ubuntu, Gazzang, and Apex Fintech, and Chainguard). Open source software, cloud security, IoT devices, and financial services technology are among his passions and expertise, as he has launched successful products in each of those markets. Dustin enjoys advising startups and corporations on strategy, and especially helping great people "systematize success" with well-tuned product and engineering methodologies.

Published Monday, January 22, 2024 7:37 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<January 2024>
SuMoTuWeThFrSa
31123456
78910111213
14151617181920
21222324252627
28293031123
45678910