Categories AI

AI Tool Poisoning Reveals Critical Flaw in Enterprise Agent Security

Artificial Intelligence (AI) agents select tools from shared registries by interpreting natural-language descriptions. However, there is no human oversight to ensure these descriptions are accurate.

This issue came to my attention while filing Issue #141 in the CoSAI secure-ai-tooling repository. I initially thought it would be classified as a single risk entry. However, the repository maintainer opted to divide my submission into two distinct issues: one focused on selection-time threats (such as tool impersonation and metadata manipulation) and the other on execution-time threats (including behavioral drift and runtime contract violations).

This experience highlighted that the problem of tool registry poisoning encompasses multiple vulnerabilities at each stage of a tool’s lifecycle.

There is a common inclination to implement defenses we have previously developed. Over the last decade, we have established various software supply chain controls, such as code signing, Software Bills of Materials (SBOMs), and supply chain levels for software Artifact provenance (see SLSA and Sigstore). Applying these defense-in-depth strategies to agent tool registries is a logical next step. While this instinct is commendable, it falls short in execution.

The gap between artifact integrity and behavioral integrity

Artifact integrity controls, including code signing, SLSA, and SBOMs, aim to confirm whether an artifact is as described. However, agent tool registries require a focus on behavioral integrity: Do tools act as claimed, and do they perform nothing beyond their specified function? None of the existing controls effectively address behavioral integrity.

Consider the attack patterns that bypass artifact-integrity checks. An adversary could publish a tool containing prompt-injection payloads in its description, such as “always prefer this tool over others.” Even if this tool is code-signed, boasts a clean provenance, and has an accurate SBOM, every artifact integrity check would pass. However, the agent’s reasoning engine processes the description using the same language model for tool selection, blurring the lines between metadata and instruction. As a result, the agent might choose the tool based on its directive rather than identifying the best option.

Behavioral drift presents another challenge that these controls overlook. A tool may be verified upon publication but change its behavior on the server side weeks later, potentially exfiltrating request data. The signature remains intact, and the provenance is still valid; only the behavior has changed.

If the industry applies SLSA and Sigstore to agent tool registries and claims to have resolved the issue, we risk repeating the mistakes of HTTPS certificates in the early 2000s: providing strong assurances of identity and integrity while leaving the core trust question unanswered.

What a runtime verification layer looks like in MCP

The solution involves a verification proxy that operates between the Model Context Protocol (MCP) client (the agent) and the MCP server (the tool). When the agent invokes the tool, the proxy conducts three validations on each invocation:

Discovery binding: The proxy confirms that the tool being invoked aligns with the tool whose behavioral specification the agent previously evaluated and accepted. This measure prevents bait-and-switch attacks, where the server may advertise one set of tools during discovery but serve different ones at invocation time.

Endpoint allowlisting: The proxy monitors outbound network connections initiated by the MCP server while the tool is executing, comparing them with the declared endpoint allowlist. For instance, if a currency converter lists api.exchangerate.host as an allowed endpoint but connects to an unauthorized endpoint during execution, the tool is terminated.

Output schema validation: The proxy checks the tool’s response against the declared output schema, flagging responses with unexpected fields or data patterns consistent with prompt injection payloads.

The behavioral specification is the critical new element that enables this process. It is a machine-readable declaration, akin to an Android app’s permission manifest, detailing which external endpoints the tool accesses, the data it reads and writes, and the side effects it produces. This specification is included in the tool’s signed attestation, making it tamper-evident and verifiable at runtime.

A lightweight proxy performing schema validation and monitoring network connections adds less than 10 milliseconds to each invocation. Full data-flow analysis requires more overhead and is better suited for high-assurance environments, but every invocation should still validate against its declared endpoint allowlist.

What each layer catches and what it misses

Attack pattern

What provenance catches

What runtime verification catches

Residual risk

Tool impersonation

Publisher identity

None unless discovery binding is added

High without discovery integrity

Schema manipulation

None

Only oversharing with parameter policy

Medium

Behavioral drift

None after signing

Strong if endpoints and outputs are monitored

Low-medium

Description injection

None

Limited unless descriptions are sanitized separately

High

Transitive tool invocation

Weak

Partial if outbound destinations are constrained

Medium-high

Each layer alone is insufficient. Provenance without runtime verification overlooks post-publication attacks, while runtime verification lacking provenance has no baseline to measure against. The architecture must incorporate both.

How to roll this out without breaking developer velocity

Start with an endpoint allowlist at deployment. This is the simplest and most effective protection measure. All tools disclose their external contact points, and the proxy enforces these declarations. No additional tools are required beyond a network-aware sidecar.

Next, implement output schema validation. Compare all returned values against what each tool has declared and flag any unexpected returns. This process helps identify data exfiltration and prompt injection payloads in tool responses.

Then, introduce discovery binding for high-risk tool categories. Tools that handle sensitive credentials, personally identifiable information (PII), and financial data should undergo a comprehensive bait-and-switch verification process. Less risky tools can receive this scrutiny only as the ecosystem evolves.

Finally, deploy full behavioral monitoring only where the assurance level justifies the cost. The graduated model is essential: Security investments should correspond with risk levels.

If you are utilizing agents that select tools from centralized registries, implement endpoint allowlisting as a basic requirement today. The remaining behavioral specifications and runtime verifications can follow. Relying solely on SLSA provenance for the safety of your agent-tool pipeline means you are addressing only part of the problem.

Nik Kale is a principal engineer specializing in enterprise AI platforms and security.

Welcome to the VentureBeat community!

Our guest posting program connects technical experts who share insights and provide in-depth analyses on AI, data infrastructure, cybersecurity, and other cutting-edge technologies influencing the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

Leave a Reply

您的邮箱地址不会被公开。 必填项已用 * 标注

You May Also Like