Industry executives and experts share their predictions for 2024. Read them in this 16th annual VMblog.com series exclusive.
Securing the AI/ML Supply Chain Will Take Center Stage, Following a Wave of New Attacks
By Dustin Kirkland,
VP of Engineering at Chainguard
It might seem like we've just hit a stride progressing
towards a more secure software supply chain. But there's now an entirely new,
burgeoning set of infrastructure and tooling to secure with the rapid adoption
of machine learning and generative AI workflows. In 2024, we'll likely learn
some hard lessons about the exploitable weaknesses in LLMs and data pipelines. Just as today's ML workflows apply techniques and processes from the
software lifecycle to the data lifecycle, the practice of "AI/ML Supply Chain
Security'' will need to apply software security techniques and processes to
secure an AI model's lifecycle.
The AI
reckoning nears
It seems like every company from startups to
enterprises and beyond are betting big on AI/ML. To that end, there are
innumerable Large Language Models (LLMs) being trained on massive data sets.
The big question is, ‘how are these organizations securing these training
models and data sets?' The likely assumption is they aren't. Just as in the
rise of the DevOps "build fast and break things" motto, the adoption of AI/ML
tooling and workflows is progressing quickly so organizations can remain
competitive in their industries.
In 2024, we'll see the first large-scale data
breach (on the order of Equifax, Yahoo!, or T-Mobile) of model training data,
where that training data ends up in the wrong hands of malicious attackers. In
the worst case, it will be a data leak that includes personally identifiable
data (PII), but at the very least, it will lead to attackers gaining access to
some corporate crown jewels: proprietary training information, company secrets,
and strategic intentions.
Cambridge
Analytica 2.0?
Still on the topic of LLM training, let's not
forget that 2024 is an election year in the US. The race for bad actors or
nation states to interfere with elections leveraging the rise of AI chat bots
and tooling will be ripe, despite significant progress at the federal
government level to prevent this influence.
In 2024, we could see attackers attempt to
maliciously influence models by injecting their own data into the training. It
might be a new flavor of ‘Cambridge Analytica', but at AI scale, rather than
social media platforms. Instead of modifying and exploiting human behavior and
interactions with news stories and headlines, we could see attackers try to
target influencing training models through deliberately biased or maliciously
curated data.
AI will
10X the software "transitive" trust problem
Today's software distribution methods have
developers downloading open source packages and modules outside of distros.
Docker, Helm, and immense language libraries, such as Golang, Python, Java,
Node, Ruby, et al., present layers and layers of "transitive" trust challenges.
Not only is it tougher to know that the software you are installing has not
been tampered with-you are also implicitly trusting the other transitive
dependencies that software was built on top of. AI's vast landscape of
frameworks, language primitives and LLMs multiplies the number of moving parts
in the transitive trust equation, and obscures even more of the trust
visibility. A typical software supply chain has upstream libraries and
first-party code fed into a compiler to produce a binary, and the build process
itself is often inexpensive enough to allow for independent reproduction. For
AI systems, there's often gigabytes or terabytes of training data that enters
the conversation, and the expensive training step (the analog of a build step)
may use hundreds of thousands of dollars worth of computing resources. Software
dependencies often need to be on the cutting edge for the best results, so
developers update with abandon. So AI takes all of the inputs that make
software supply chain security a challenge in the first place, and turns them
up to 11. We will see bad actors leverage this and move toward transitive
dependencies as a lower-hanging fruit attack vector, for example: installing
malware in a binary JAR that you cannot easily inspect as the consumer of
it.
AI
might help the good guys, but attackers are racing to leverage flaws
2024 will bring a race between defenders and
attackers to leverage or exploit AI workflows and tooling. This is a vast space
to keep an eye on into the next year. For example, will we see a reality where
AI generated code actually has fewer vulnerabilities, thus presenting fewer
weaknesses for attackers to exploit? Or will AI code generators actually
introduce more critical, complicated, harder for humans to spot
vulnerabilities? Only time will tell, but the race is on and we can expect to
see headlines on both sides of the spectrum, and hopefully, there will be more
to celebrate vs. chasing chaos.
##
ABOUT THE AUTHOR
Dustin is currently
the VP of Engineering at Chainguard. Spanning 25 years as a software engineer,
product manager, and executive leader in both CTO and CPO roles, Dustin has
spent more than a decade building hardware, software, and services products at some
of the world's largest companies (IBM, Google, and Goldman Sachs), plus another
decade leading growth-mode startups (Canonical/Ubuntu, Gazzang, and Apex
Fintech, and Chainguard). Open source software, cloud security, IoT devices,
and financial services technology are among his passions and expertise, as he
has launched successful products in each of those markets. Dustin enjoys
advising startups and corporations on strategy, and especially helping great
people "systematize success" with well-tuned product and engineering
methodologies.