Key Takeaways
- The speed of AI adoption within federal agencies is outpacing the development of governance frameworks and approved security tools. Risks are now present across models, infrastructure, and the software supply chain.
- AI threats frequently resemble normal use cases, complicating detection through conventional static methods.
- Compliance with federal standards necessitates ongoing, evidence-based validation instead of one-time assessments.
- Effective AI security requires comprehensive visibility, prioritized risk management, and a closed-loop remediation process.
- Qualys TotalAI integrates these components, streamlining testing, risk assessment, and remediation into a single workflow compliant with the FedRAMP Moderate Authorization framework.
Why Federal AI Security Requires More Than Standard Scanning
Security for AI systems must adopt a distinct approach compared to traditional IT assets. Protecting these sophisticated tools requires full visibility from end to end and audit-ready evidence sourced from platforms designed for the entire AI architecture. Robust federal AI security goes beyond simple server scanning; it necessitates ongoing behavioral assessment of large language models (LLMs) to detect prompt manipulation, data leaks, and misuse during inference. By leveraging AI-specific capabilities, agencies can not only fulfill federal AI modernization mandates but also generate rigorous compliance documentation and measurable risk mitigations crucial for mission security.
Recent reports indicate that federal AI adoption is accelerating dramatically. An early study by GAO revealed that the number of AI use cases across 11 major agencies surged from 571 in 2023 to 1,110 in 2024. More recent findings from ExecutiveGov bring that total to 3,611 use cases across 56 agencies, highlighting that AI should no longer be viewed as experimental but rather as a critical operational necessity.
Executive Order 14179 and OMB M-25-21 emphasize AI modernization, advocating for swift deployment and the removal of regulatory obstacles. However, the drive for widespread adoption is currently hindered by a shortage of FedRAMP-certified tools designed to address the unique risks associated with LLMs and their infrastructures. Federal directives such as CISA Binding Operational Directive (BOD) 23-01 and BOD 26-02 have elevated these guidelines to mandatory requirements, compelling agencies to maintain comprehensive visibility over every asset, including AI models, endpoints, and the edge-based Model Context Protocol (MCP) servers that connect them.
Many agencies find it challenging to meet these standards due to existing security tools lacking the necessary FedRAMP authorization and AI-targeted oversight. The objective is to achieve complete end-to-end visibility, mitigate risks effectively, and produce audit-ready evidence from a cohesive platform specifically architected for the entire AI stack.
How to Secure Federal AI Systems: Understanding the 3-Layer Attack Surface
When a federal agency implements an AI system, it establishes an interconnected architecture composed of models, training data, APIs, supporting infrastructure, and pipelines, each presenting unique vulnerabilities. Current federal mandates require comprehensive and defensible security across this entire framework.
The AI Layer
This layer, which most agencies prioritize, encompasses models, LLMs, prompts, datasets, APIs, and agentic AI tool connectors, particularly MCP servers and inference services. The predominant attack vector here is not malicious code but rather natural language processing.
A simple text command can coax an LLM into disclosing sensitive data, sidestepping safety protocols, or misusing its functions in ways indistinguishable from legitimate user requests. Conventional scanners and firewalls are often ineffective against these semantic threats, as the exploits appear to be mere conversations. Securing this layer requires ongoing behavioral testing to detect prompt manipulation, jailbreaks, data leaks, multimodal exploits, hallucination problems, and model denial-of-service attacks.
For agencies utilizing AI in critical workflows—such as fraud detection or cyber defense—a compromised model can represent not just another vulnerability, but a catastrophic failure of core system functionality.
The Infrastructure Layer
This layer contains the environments where AI workloads operate, including GPU hosts, vector databases, Python and CUDA libraries, MCP servers connecting models to agency data, containers, cloud services, data pipelines, APIs, and edge devices. It is typically extensive, dynamic, and frequently shared.
If the computing systems supporting AI are compromised, attackers may not need to directly target the model; they can infiltrate it through the underlying infrastructure. Securing this layer involves evaluating and prioritizing vulnerabilities across each component while expanding risk management to include APIs, MCP connectors, and data services that supply to the models.
The Exposure Layer
Unaddressed dependencies and misconfigurations within the AI software supply chain create unnoticed vulnerabilities. Modern AI systems often rely on open-source frameworks and packages, many of which may have known vulnerabilities that are overlooked and defined as low-severity on their own. However, a single misconfigured endpoint or an outdated library can expose an AI system in dangerous ways, an aspect that risk model-level tools may overlook entirely.
The cumulative risks across these layers amplify the potential for attack. An adversary who understands the entire stack can exploit vulnerabilities across all three layers, while no singular view effectively captures that extensive risk.
Qualys Insights
Legacy security tools were designed for a different threat landscape, one in which assets are relatively stable and risks are calculated based on known common vulnerabilities and exposures (CVEs) against patched software versions. AI workloads do not conform to this traditional model, revealing systematic deficiencies.
Standard Vulnerability Scanning Misses AI-Specific Threats
Conventional scanners typically evaluate the LLM environment, host operating systems, packages, and container images against a CVE database. If everything is up to date, the result appears satisfactory. However, threats such as prompt manipulation, data breaches, multimodal exploits, and model misuse are not documented within that database. These behavioral threats require ongoing testing rather than static assessment of the hosting environment.
Infrastructure Scanners Stop at the Model Boundary
Many agencies maintain solid vulnerability management programs for hosts, networks, containers, and cloud infrastructures. However, these tools do not encompass the need to identify and inventory AI-specific assets such as LLM endpoints, model services, datasets, and MCP servers. Consequently, AI workloads running on this infrastructure may remain unaccounted for.
Shadow AI Exists Beyond the Cloud
It can also be found in local GPU clusters, unmanaged containers, developer workstations hosting self-managed models, MCP servers tethered to localhost, and unauthorized SaaS applications outside of sanctioned oversight. Eliminating these blind spots necessitates visibility across all operational layers:
- On the host: Self-hosted models, alongside the local Python and CUDA libraries powering them.
- In the cloud: Unsupervised AI services and vector databases within AWS, Azure, and GCP.
- On the network: Unauthorized SaaS AI services and LLM APIs outside the approved stack.
- In the supply chain: MCP SDK usage in application code, hinting at AI integrations earlier in the development cycle.
No combination of manual processes and piecemeal tools can provide a complete, continuously updated inventory of AI resources at the speed and scale required by federal environments.
Fragmented Tools Yield Fragmented Risk
In many federal contexts, AI teams manage models, infrastructure teams oversee hardware, and security teams deal with vulnerabilities—each working independently and producing unique outputs without a cohesive risk framework. Risk assessment often defaults to addressing the most visible alerts instead of assessing genuine mission impact. Consequently, Chief AI Officers (CAIOs) and Chief Information Security Officers (CISOs) lack a unified source of truth when reporting on AI risks.
Point-in-Time Assessments Fall Short of Continuous Monitoring Requirements
Mandates from OMB M-25-21, CISA BODs 26-02 and 23-01, as well as the NIST AI Risk Management Framework necessitate continuous, real-time security assessments. AI workloads are inherently dynamic—models are updated, new APIs become available, and dependencies shift. A periodic scan merely provides a snapshot of an ever-evolving target; by the time a report is ready, the environment has likely undergone changes.
The Case for Unified AI Risk Management Under FedRAMP
Integrating AI security, infrastructure protection, and compliance reporting into a singular risk framework is vital to countering threats and meeting federal requirements. Cyber adversaries do not operate within isolated sections; a risk extending from the AI layer through insecure dependencies to the underlying infrastructure represents a single vulnerability. Disconnected tools may approach this threat inappropriately or overlook it altogether. A cohesive risk model connecting all elements is the solution.
OMB M-25-21 mandates continuous, enterprise-wide management of AI risk along with evidence for governance bodies. This necessitates ongoing discovery, risk-based prioritization, and the provision of audit-ready proof from a single, authoritative source—not manually compiled reports. The Zero Trust model emphasizes this approach: AI models, endpoints, MCP servers, APIs, and datasets should all be treated as assets. Without rigorous assessment and fortification, they become unmanaged vulnerabilities in what should be a fortified framework.
Qualys TotalAI with FedRAMP Moderate Authorization: Platform Capabilities
Qualys TotalAI is now authorized under FedRAMP Moderate and is accessible via the Qualys Cloud Platform. This integration consolidates TotalAI for infrastructure risk management and TruRisk Eliminate™ for operational resilience and attack surface minimization.
TotalAI’s FedRAMP Moderate Authorization eliminates the primary obstacles to deployment for civilian agencies, DoD components, and defense contractors. It is compatible with existing Qualys agents, requiring no additional tools or alterations to your current systems.
AI Asset Discovery and Inventory
Continuously identify and keep an inventory of all AI assets, such as models, APIs, prompts, datasets, MCP servers, and supporting infrastructure. The three specialized discovery engines—Cloud Agents for hosts, Cloud Connectors for cloud environments, and Network Detection and Response (NDR) for network environments—eradicate blind spots associated with shadow AI and provide an up-to-date, audit-ready inventory without manual intervention. A distinction between “confirmed” versus “potential” assets allows governance teams to differentiate scanned and assessed models from those present in cloud accounts, but not yet authorized.
AI/LLM Security Testing
Engage in ongoing assessments to scrutinize LLM behavior for prompt manipulation, data breaches, and misuse. The platform evaluates 38–40 distinct attack scenarios, ranging from prompt injections and jailbreak techniques to multilingual exploits and model denial-of-service attacks. With over 650 AI-specific detections aligned with the OWASP Top 10 for LLM Applications, MITRE ATLAS, and the EU AI Act, continuous security testing is paramount.
Multimodal threat detection identifies prompts and perturbations concealed within images, audio, and video, capitalizing on cross-modal features that text-only scanners may easily overlook. For sensitive models, an on-premises scanner can execute the same tests within your own firewalls. The system is compatible with any OpenAI-compatible API, along with AWS Bedrock, Azure OpenAI, Google Vertex, and more. Additionally, security findings can be integrated into MLOps pipelines to ensure only vetted models are put into production.
Securing the Systems That Power AI (VMDR + ETM)
Evaluate and prioritize vulnerabilities across GPU hosts, containers, and cloud services. TruRisk™ connects asset significance to operational impact, directing remediation efforts towards the most critical vulnerabilities.
One Score from API to LLM
Receive a singular TruRisk™ score encompassing all infrastructures, applications, and AI systems. This allows for executive-level summaries for leadership and intricate technical insights for engineering teams.
Enforcing Remediation and Hardening (TruRisk Eliminate™)
Automate baseline enforcement and eliminate attack vectors. Apply patchless mitigation strategies for vulnerabilities where standard patches are unavailable, or rebooting is impractical. Seamlessly integrate with ITSM workflows like ServiceNow to accelerate response times.
Continuous Monitoring and Evidence-Backed Findings for Audit Readiness
Each detection includes a complete evidence trail, featuring prompts, responses, analysis based on OWASP LLM, MITRE ATLAS, and EU AI Act categorizations. Reports document severity, unsuccessful checks, jailbreak occurrences, scanning modes, and profiles for increased repeatability: ensuring consistency over time.
This evidence integrates with RMF, ATO, and ConMon workflows due to continuous monitoring and automated assessments against NIST AI RMF, NIST SP 800-53, and CMMC 2.0 whenever posture changes occur.

Securing AI Means Securing Everything It Touches
Federal agencies must quickly adopt AI technology while ensuring a robust security posture. This necessitates continuous visibility and risk-based remediation rather than products operating in isolation. Qualys TotalAI facilitates this process on a platform already familiar to agencies.
With FedRAMP Moderate Authorization, it is primed for deployment without delay.
Understand your agency’s AI attack surface before it comes under scrutiny.
Frequently Asked Questions (FAQs)
Which federal mandates govern AI security for U.S. agencies?
OMB M-25-21, CISA BOD 23-01, and CISA BOD 26-02 establish the essential framework. They call for continuous visibility, risk-based controls, and audit-ready evidence—far surpassing traditional annual reviews. TotalAI aligns with NIST AI RMF, NIST SP 800-53, FISMA, and CMMC 2.0 through automated, ongoing assessments, freeing agencies from the burden of compiling compliance evidence manually before each reporting cycle.
How should federal agencies secure AI infrastructure?
Conduct continuous evaluations across GPU hosts, containers, cloud services (AWS, Azure, GCP), serverless functions, and data pipelines, prioritizing based on operational impact instead of merely the CVSS score. TotalAI maps asset importance to operational risks, targeting the attack pathways that hold significant relevance rather than just those with the highest scores.
Why is LLM security testing different from traditional application security?
Traditional application security focuses on identifying code vulnerabilities. LLMs introduce threat vectors during inference, such as prompt injection, data leakage, and model misuse, which resemble regular interactions and stump static analysis tools. TotalAI’s 650+ AI-specific detections continuously assess LLM behavior in real time, mapping to the OWASP Top 10 for LLM Applications.
How do federal agencies manage AI supply chain risk?
Most federal AI systems rely on open-source components that contain known and unpatched vulnerabilities. Risks can arise not just from the model itself, but from the underlying dependencies. TotalAI continuously evaluates the entire AI software supply chain, revealing misconfigurations and unnoticed entry points that tools analyzing only at the model level may completely miss.
How is AI security deployed in federal environments?
TotalAI is implemented using federally approved Cloud Agents, necessitating no code alterations or disruptions to existing systems. Immediate identification of AI assets across hosts, cloud, network, and SaaS environments can commence from day one.
What does FedRAMP authorization mean for AI security platforms?
FedRAMP Moderate Authorization removes substantial deployment barriers. For federal agencies, this facilitates immediate eligibility for deployment without additional Authority to Operate (ATO) efforts, providing a continuously authorized compliance baseline across NIST AI RMF, CMMC 2.0, FISMA, and NIST SP 800-53 while also generating ongoing audit-ready documentation.
How does Qualys TotalAI unify AI risk management for federal agencies?
Most agencies utilize disparate tools to handle AI risk management—some for infrastructure, others for application security, and yet another set for compliance. TotalAI integrates AI models, infrastructure, and their dependencies into a single risk management framework, ensuring consistent prioritization and always current compliance visibility.




