Prompt Injection Threat Model: A Practical Guide Using MITRE ATLAS
Prompt Injection Threat Model
Introduction
Prompt injection has quickly become one of the defining security challenges for anyone building with large language models. MITRE ATLAS treats it as a first-class adversary technique — LLM Prompt Injection (AML.T0051) — with three sub-techniques: Direct (AML.T0051.000), Indirect (AML.T0051.001), and Triggered (AML.T0051.002). These reflect the different ways attackers can slip malicious instructions into a system — through the user interface, through external content the system ingests, or through delayed payloads that fire later when conditions are right.
What makes prompt injection especially dangerous in modern deployments is that it’s rarely the end goal. Think of it as an enabling primitive — a stepping stone that attackers chain into tool invocation, credential harvesting, data collection from RAG and enterprise sources, system prompt extraction, and exfiltration. MITRE’s OpenClaw investigation illustrates this pattern clearly: the recurring chains involve prompt injection, tool invocation abuse, and configuration modification working together to achieve persistent, wide-ranging impact.
A rigorous threat model therefore needs to treat the LLM and its orchestration layer as an untrusted interpreter of mixed code-and-data. The literature on indirect prompt injection highlights that LLM-integrated applications blur the line between “instructions” and “retrieved data,” and defensive work confirms that LLMs struggle to reliably distinguish where instructions are coming from within a single token stream.
Defence-in-depth is required because there is no universally foolproof prevention mechanism today, and defences must be evaluated against adaptive attackers and measured for utility degradation. The highest-leverage controls, mapped to ATLAS mitigations and modern agent architectures, are:
- Constrain agency and tool impact — least privilege, delegated permissions, human approval for high-consequence actions, and restricting tool use when untrusted content is in context.
- Instrument and monitor — AI telemetry logging of prompts, outputs, and tool decisions, with alerting on anomalous tool calls and exfiltration patterns.
- Harden prompt and context handling — guardrails, validation, provenance marking and segmentation of untrusted retrieved content, and memory hardening.
- Eliminate secrets-as-prompts — assume system prompts can be extracted, treat them as non-secret configuration, and enforce security outside the model.
Scope and Definitions
What this report covers
This threat model applies to two broad categories of system:
LLM-based systems — applications that incorporate an LLM into a workflow (chat, summarisation, coding, retrieval, classification, triage), potentially with retrieval augmentation (RAG), browsing, plugins, and/or function or tool calling.
Autonomous or semi-autonomous agents — LLM-orchestrated systems that can select actions and invoke tools (execute code, call APIs, access enterprise data sources) and may operate with reduced continuous human oversight. ATLAS explicitly models agent tool invocation and agent configuration discovery and modification as techniques.
Prompt injection defined
ATLAS defines LLM Prompt Injection (AML.T0051) as adversary-crafted prompts that cause the LLM to ignore or override original instructions and act in unintended ways, potentially serving as an initial access vector to achieve a foothold that persists within an interactive session. OWASP’s definition is closely aligned: prompts that alter model behaviour in unintended ways, including non-human-visible injections, with a note that RAG and fine-tuning do not fully mitigate the vulnerability class.
The three sub-techniques
Direct injection (AML.T0051.000) — the attacker supplies malicious instructions as a direct user of the system.
Indirect injection (AML.T0051.001) — malicious instructions are embedded in external content the system ingests: web pages, documents, emails, database records, or tool outputs.
Triggered injection (AML.T0051.002) — malicious instructions that activate after a user action or system event, often paired with delayed-execution or activation-trigger discovery in agent workflows.
Related terms and threat-adjacent behaviours
A few related ATLAS techniques show up repeatedly throughout this report:
- Prompt infiltration and smuggling via public-facing apps (AML.T0093) — placing prompt payloads into channels the system ingests (shared docs, web search results, messages, third-party skills), bridging untrusted content into the LLM context.
- RAG poisoning (AML.T0070) — injecting malicious content into indexed retrieval sources so it surfaces for future queries and contaminates downstream context.
- Context and memory poisoning (AML.T0080) — persistence via poisoned agent memory or conversation thread state, potentially across sessions.
- System prompt extraction (AML.T0056) — extracting system prompts via injection or configuration access. OWASP warns that system prompts should not be treated as secrets or security controls and should not contain credentials.
- Tool invocation abuse (AML.T0053) — using access to an AI agent to invoke tools and thereby reach data or actions not directly accessible to the attacker.
- Chain-of-thought leakage and hijacking — reasoning traces can leak sensitive information (PII) even when final answers are sanitised, and long reasoning sequences can be exploited to weaken safety and refusal and enable jailbreak-style outcomes.
Assumptions
No deployment-specific details are assumed beyond the fact that an LLM is used, the system has at least one input channel, and the system may incorporate retrieval, plugins or tools, and memory. This is a risk assessment of realistic enterprise patterns, not a product vulnerability disclosure.
Mapping ATLAS Tactics and Techniques to Prompt Injection
The ATLAS dataset (v5.4.0) encodes a matrix with 16 tactics spanning reconnaissance through impact, with technique definitions and mitigations used for mapping and control selection. MITRE’s SAFE-AI report characterises ATLAS as a framework based on real-world attack observations and AI red team demonstrations, organised similarly to ATT&CK but focused on AI-enabled systems.
Tactic-to-scenario mapping
The table below highlights ATLAS techniques that most often appear in prompt-injection-driven chains — especially in RAG plus tool-using agent stacks — and how they map to practical scenarios.
| ATLAS Tactic | Technique (ID) | Why It Matters | Representative Scenarios |
|---|---|---|---|
| Reconnaissance / Discovery | Discover LLM System Information (AML.T0069) | Finds system prompts, special tokens/delimiters, and instruction keywords to craft higher-success payloads | Token and control-scheme discovery for agent jailbreak or prompt smuggling |
| Reconnaissance | Gather RAG-Indexed Targets (AML.T0064) | Identifies which sources are indexed, enabling targeted RAG poisoning | Targeting an enterprise index via email, docs, or repo content |
| Resource Development | LLM Prompt Crafting (AML.T0065) | Develops working malicious prompts tailored to system behaviour | Payload design for exfil, tool calls, or memory poisoning |
| Resource Development | LLM Prompt Obfuscation (AML.T0068) | Evades humans and filters via hidden text, encoding, or UI tricks | Hidden Unicode, tiny font, base64 or rot13 patterns |
| Initial Access | Prompt Infiltration via Public-Facing App (AML.T0093) | Gets malicious instructions into the context via shared docs, web, or messages | Shared Google Docs, web content ingestion, agent skills |
| Initial Access | Drive-by Compromise (AML.T0078) | Triggers ingestion of malicious content via browsing or previewing workflows | Auto-fetch summarisation, plugin retrieval |
| Execution | LLM Prompt Injection (AML.T0051 + subs) | The core mechanism: override intended behaviour, bypass guardrails, invoke privileged actions | Direct, indirect, and triggered injections across channels |
| Execution | AI Agent Tool Invocation (AML.T0053) | Turns an LLM compromise into a system compromise by invoking tools, APIs, or code execution | “Living off AI”: email, file I/O, terminals, enterprise APIs |
| Persistence | AI Agent Context Poisoning (AML.T0080) | Persists malicious instructions in memory or threads | Cross-session manipulation via poisoned memory |
| Persistence / Defence Evasion | Modify AI Agent Configuration (AML.T0081) | Alters agent policies, confirmations, or tool settings for durability | Disabling confirmations, changing rules files or configs |
| Credential Access / Collection | RAG Credential Harvesting (AML.T0082) | Uses RAG-access to retrieve secrets (API keys, tokens, creds) from indexed content | Prompting enterprise assistant to surface keys from private channels |
| Exfiltration | Extract LLM System Prompt (AML.T0056) | Reveals system prompt for targeted follow-on attacks | Meta-prompt extraction via direct prompting or config access |
| Exfiltration | Exfiltration via AI Agent Tool Invocation (AML.T0086) | Encodes sensitive data into tool parameters or outputs and exfiltrates via legitimate channels | Emailing results, posting to tickets, curl via shell tools |
| Impact | Data Destruction via AI Agent Tool Invocation (AML.T0101) | Converts agent tool abuse into destructive outcomes | Deleting files or resources, sabotaging workflows |
Why these mappings are stable. The OpenClaw investigation explicitly frames the agentic exploit class as abuses of trust, configuration, and autonomy — not purely low-level bugs — highlighting prompt injection plus tool invocation plus agent config modification as recurring, high-risk chains. Academic work on indirect prompt injection also emphasises that when retrieved prompts are processed as instructions, attackers can achieve outcomes analogous to code execution and can influence whether downstream APIs are called.
Threat Model Components
System model and trust boundaries
At a high level, prompt injection becomes feasible when the system constructs a single model context that mixes higher-priority instructions (system or developer prompts), user intent, and external content (retrieved documents, web pages, tool outputs, memory) — without robust provenance controls and deterministic enforcement outside the LLM.
The following diagram shows a conceptual entity-relationship view of a typical tool-using LLM application and where trust boundaries sit.
Figure 1 — Entity-relationship view of a tool-using LLM application, showing trust boundaries and adversary injection paths.
Adversary goals
In ATLAS terms, prompt injection is often a gateway to goals across Credential Access, Collection, Exfiltration, and Impact. Typical goals include:
- Data theft — user conversations, proprietary docs, credentials, API keys, system prompts, or sensitive enterprise records (often via RAG or tool access).
- Privilege amplification through tools — using the agent to call tools or APIs not normally accessible to the attacker, effectively “borrowing” the agent’s permissions.
- Persistence and long-lived manipulation — poisoning memory or threads, or modifying agent configuration so malicious behaviour triggers later.
- Integrity and availability damage — manipulating outputs, eroding model integrity, or destructive tool invocation.
Adversary capabilities and knowledge
Capabilities typically required, in increasing sophistication:
- Prompt engineering and iterative testing — prompt crafting, obfuscation, and delayed execution.
- Control-scheme discovery — special tokens, keywords, and system prompt structure to bypass guardrails or spoof control channels in agent frameworks.
- Placement capability for indirect injections — ability to modify or introduce content where the system retrieves it (web pages, shared docs, indexed channels, repositories).
- Supply chain manipulation — for agent plugins, skills, rules, or config packages.
Access vectors and prerequisites
Common access vectors include direct user input (chat UI or API), indirect channels (web pages, documents, emails, Slack or Teams channels, ticketing systems, knowledge bases), tool outputs and plugin responses treated as trusted context, and agent marketplace or shared configuration artifacts (skills, rules files).
For the chain to matter, three prerequisites typically hold: the application passes untrusted content into model context with insufficient provenance or segmentation; the system has excessive agency relative to safety controls and approval gates; and secrets or privileged details are stored in accessible channels (system prompts, RAG index, config files, memory).
Assets at risk
Key asset classes, aligned to ATLAS technique families:
- System prompts and internal configuration — often prerequisites for targeted injections.
- Credentials and tokens — in RAG sources, agent configuration files, or tool configs.
- Enterprise data accessible via tools or RAG — docs, CRM records, tickets.
- User privacy and conversations — conversation exfiltration and PII extraction.
- Availability and integrity — of systems affected by tools (file deletion, sabotage, fraudulent actions).
Concrete Attack Patterns and Kill Chains
Scenario overview
The following scenarios are selected to cover web apps, APIs, plugins, tool-using agents, chain-of-thought leakage, and data exfiltration, each with explicit mapping to ATLAS IDs.
| Scenario | Environment | Primary Goal | Key ATLAS Techniques |
|---|---|---|---|
| Web app injection to code execution | LLM web app that executes LLM-generated code | Execute commands / steal API keys | AML.T0051.000, AML.T0053, AML.T0093 |
| API-driven system prompt extraction | LLM inference API or internal service | Extract system prompt for follow-on attacks | AML.T0056, AML.T0069, AML.T0051 |
| Plugin/browsing indirect injection | Chat assistant with web retrieval or plugin execution | Exfiltrate conversation or history | AML.T0051.001, AML.T0078, AML.T0077, AML.T0053 |
| RAG poisoning + credential harvesting | Enterprise assistant indexing messages | Steal API key from private content | AML.T0070, AML.T0051.001, AML.T0082, AML.T0077 |
| Memory poisoning persistence | Assistant with cross-session memory | Persistently bias or steer future sessions | AML.T0080.000, AML.T0051.001, AML.T0068 |
| Supply chain injection via skills/rules | Agent marketplace / shared config | Execute tools or insert backdoors | AML.T0104, AML.T0051.000, AML.T0081 |
| Agent becomes command-and-control | Autonomous agent with browsing + shell tool | Persistent remote control | AML.T0051.001, AML.T0053, AML.T0081, AML.T0080.001 |
| Chain-of-thought leakage and hijacking | Reasoning model producing explicit traces | Leak PII or weaken refusal | Injection + exfil/impact outcomes |
Kill chain: Indirect injection through RAG leading to credential exfiltration
This pattern is explicitly demonstrated in ATLAS case studies where a malicious message is indexed into a RAG database and later retrieved and executed — leading to API key harvesting and exfiltration through rendered output.
Figure 2 — Kill chain for indirect prompt injection through RAG leading to credential exfiltration.
Kill chain: Agentic injection leading to tool invocation, persistence, and remote control
MITRE’s OpenClaw investigation and the corresponding ATLAS case study describe this as a key emerging pattern: indirect prompt injection plus tool invocation plus configuration modification produces a persistent “agent implant” capable of ongoing malicious activity.
Figure 3 — Kill chain for agentic prompt injection leading to persistent remote control.
Step-by-step scenario breakdowns
Web app prompt injection enabling code execution and secret theft. A documented exercise against an LLM-backed web app (MathGPT) shows how prompt injection can induce an app that executes LLM-generated code to run attacker-influenced code paths, read environment variables, and obtain API keys (with follow-on denial-of-service potential). The kill chain: the attacker iteratively probes prompt behaviour and crafts a payload that causes the model to output code aligned to attacker objectives, the application executes the generated code (this is the critical design hazard — the LLM output becomes an execution substrate), and the attacker coerces access to sensitive runtime data or triggers resource exhaustion.
API-driven system prompt extraction for targeted follow-on attacks. ATLAS notes adversaries may attempt to extract system prompts via prompt injection or by obtaining configuration files. Research on prompt extraction reinforces that simple text-based attacks can reveal prompts with high probability across many models, and prompt extraction from real systems suggests that defences are often insufficient. The attacker uses an inference API and performs model behaviour discovery, then attempts system prompt extraction with obfuscated reformulations and iterative refinement. Extracted content is used to design higher-success indirect injections using compatible delimiters, tool-call formats, and refusal bypass patterns.
Plugin and browsing indirect injection to exfiltrate private conversation. An ATLAS case study demonstrates that if a chat system retrieves a malicious webpage via a plugin, the injected content can cause the assistant to output a crafted element that exfiltrates the private conversation when the client fetches it — and can also induce additional plugin actions. The attacker publishes malicious content in a location likely to be fetched, the user triggers retrieval, indirect injection executes inside the user’s session context (the attacker’s instructions run with the user’s credentials), and exfiltration occurs through output rendering mechanics.
RAG poisoning and credential harvesting in an enterprise assistant. A case study shows how an attacker can seed a malicious payload into an indexed channel so that later user queries retrieve it. The payload drives the assistant to retrieve an API key from a private channel and present it as a clickable lure that transmits the secret to the attacker.
Memory poisoning for persistence across sessions. An ATLAS case study demonstrates that hidden prompt injection in shared content can lead to memory store poisoning, so future chats incorporate attacker-inserted “facts” or instructions — even after the initial session ends. The adversary hides injection content via obfuscation (invisible text) inside a shared resource, the system ingests it, indirect injection executes and writes to memory store, and future sessions inherit the poisoned memory unless hardening and remediation processes exist.
Supply chain prompt injection via skills, rules, and config packages. ATLAS documents scalable supply chain patterns where rules files or agent skills contain hidden or obfuscated prompt payloads that manipulate coding assistants or agents to insert backdoors or execute commands. Where the artifact is treated as trusted configuration, the payload executes as a system-level instruction, and the agent invokes tools or generates compromised outputs with downstream impact.
Agent becomes command-and-control via prompt injection. An ATLAS case study describes a chain where a malicious webpage’s indirect prompt injection abuses control tokens to invoke an unrestricted execution tool. A downloaded script then plants persistent malicious instructions into future system prompts, enabling remote command issuance. The result is an “agent implant” that continues communicating with attacker infrastructure.
Chain-of-thought leakage and hijacking in reasoning models. Two distinct but related risks: intermediate reasoning can leak PII even when final answers are sanitised, and long reasoning sequences can be exploited to weaken safety signals and achieve high attack success rates across multiple reasoning models. The key takeaway for threat modelling is that any system that logs, displays, or stores reasoning traces increases the sensitive-information attack surface and may create new safety bypass avenues.
Detection and Monitoring Strategies
What to log and why
ATLAS mitigation AI Telemetry Logging (AML.M0024) recommends logging inputs and outputs of deployed models and, for agents, the intermediate steps of agentic actions, tool use, and decisions. This is particularly critical because prompt injection can be invisible or obfuscated (tiny font, hidden Unicode, encoding).
A monitoring baseline for prompt injection threat detection should include:
- Prompt provenance metadata — user input versus retrieved content versus tool output versus memory (source IDs, trust levels, timestamps). Provenance signalling is a core idea behind spotlighting-style defences to help separate instructions from untrusted data.
- Tool invocation logs — tool name, parameters (with sensitive redaction), caller identity, permission context, and whether untrusted data was present in the context at invocation time.
- Output safety signals — DLP findings (PII and keys), suspicious formatting (URLs with long query strings), embedded payload patterns, and prompt leakage patterns.
Detection analytics and alert content
High-yield detections tend to be behavioural rather than purely signature-based, because attackers can obfuscate payload strings. Recommended detection rules (generic, not product-specific):
- Context boundary violations — untrusted retrieved content contains instruction-like patterns and coincides with changes in tool invocation frequency or privilege.
- Suspicious tool-call sequences — browsing → parsing → shell/email/ticket-posting within the same session, or tool calls that include unusually encoded or long parameters (possible exfil payload).
- Prompt leakage anomalies — outputs with large verbatim blocks matching system prompt templates, references to hidden policies, or repeated internal configuration content.
- RAG targeted-retrieval patterns — repeated queries that look like credential hunts, or sudden retrieval of secrets from private channels.
Incident response considerations
Because prompt injection can persist through memory and context poisoning, detection must include retroactive analysis: which sources entered context, what was written to memory or config, and which downstream actions were taken.
Mitigations, Hardening Controls, and ATLAS Mapping
Control families
The mitigation landscape can be modelled as layered controls, with ATLAS mitigations providing a structured anchor.
Input validation and output validation. ATLAS mitigation Input and Output Validation for AI Agent Components (AML.M0033) calls for validation of inputs and outputs for tools and data sources used by agents — enforcing schemas and preventing unsafe agentic workflows.
Sandboxing and segmentation. ATLAS mitigation Segmentation of AI Agent Components (AML.M0032) recommends defining security boundaries around tools and data sources using API access controls, container isolation, code execution sandboxing, and rate limiting of tool invocation.
Instruction-level controls (guardrails, guidelines, alignment). ATLAS mitigation Generative AI Guardrails (AML.M0020) describes guardrails as validators, filters, rules, and classifiers that evaluate prompt and response safety, including domains such as jailbreaks, code exploits, and leakage. Generative AI Model Alignment (AML.M0022) highlights that fine-tuning can remove safety mechanisms, and that supervised fine-tuning, RLHF/RLAIF, and targeted safety distillation can improve alignment against unsafe prompts and responses.
Provenance and supply chain controls. ATLAS mitigation AI Bill of Materials (AML.M0023) supports supply chain risk mitigation by listing artifacts and resources used to build the AI, enabling rapid response to vulnerabilities.
Rate limits and access control. ATLAS includes Restrict Number of AI Model Queries (AML.M0004) for limiting total and rate of queries to hinder attack iteration, and access-control mitigations for production model access.
System prompt protections (realistic posture). OWASP explicitly states system prompts should not be treated as secrets or security controls and should not contain credentials. ATLAS documents prompt extraction as a dedicated exfiltration technique.
Tool access controls. ATLAS includes several agent permission configuration mitigations: Privileged AI Agent Permissions Configuration (AML.M0026) (RBAC/ABAC and least privilege for privileged agents), Single-User AI Agent Permissions Configuration (AML.M0027) (agents inherit user permissions and lifecycle management), and AI Agent Tools Permissions Configuration (AML.M0028) (delegated access; tool permissions inherited from invoking agent or user; applicable to MCP-style tool servers).
Technique-to-mitigation mapping
| ATLAS Technique | Key ATLAS Mitigations | Engineering Interpretation |
|---|---|---|
| LLM Prompt Injection (AML.T0051) | Guardrails (M0020); Guidelines (M0021); Alignment (M0022); Telemetry (M0024); I/O Validation (M0033) | Multi-layer prompt+output filters; strict schema validation; robust logging; improve model resistance via alignment where appropriate |
| Extract LLM System Prompt (AML.T0056) | Guardrails (M0020); Guidelines (M0021); Alignment (M0022) | Assume prompt leakage; prevent secrets in prompts; add output-side leakage detection; consider architectures that don’t expose system text directly |
| AI Agent Tool Invocation (AML.T0053) | Privileged Permissions (M0026); Single-User Permissions (M0027); Tools Permissions (M0028); Restrict on Untrusted Data (M0030); Segmentation (M0032); Human-in-the-loop (M0029); Telemetry (M0024) | Least privilege + delegated auth; approval gates for high-risk actions; deny or require confirmation if untrusted content in context; sandbox tools; log every tool call |
| Exfiltration via Tool Invocation (AML.T0086) | Restrict on Untrusted Data (M0030); Segmentation (M0032); Telemetry (M0024); Permissions (M0026/27/28) | Block auto-write tools when untrusted context present; confine write-path tools; monitor for exfil patterns in tool params; enforce delegated permissions |
| AI Agent Context Poisoning (AML.T0080) | Memory Hardening (M0031); Segmentation (M0032); Telemetry (M0024) | Treat memory as a controlled datastore with trust levels, validation, remediation, and provenance; prevent untrusted sources from writing durable memory without review |
| RAG Poisoning / Credential Harvesting (AML.T0070 / AML.T0082) | Permissions (M0026/27); Segmentation (M0032); Telemetry (M0024) | Restrict retrieval scopes; keep secrets out of indexed corpora; monitor retrieval queries and sensitive outputs; isolate high-sensitivity document stores |
| LLM Prompt Obfuscation (AML.T0068) | Guardrails (M0020); Telemetry (M0024) | Detect hidden/encoded instruction patterns; normalise inputs (strip invisible chars); log raw+normalised representations for forensics |
| Data Destruction via Tool Invocation (AML.T0101) | Restrict on Untrusted Data (M0030); Permissions (M0026/27/28); Human-in-the-loop (M0029) | Always gate destructive actions; require human confirmation; enforce tool-level allow/deny lists; isolate destructive tools in hardened sandboxes |
Design hardening patterns
The following patterns are consistently supported by the above mappings and by the literature on indirect prompt injection and provenance marking:
- Provenance-aware context assembly (“separate code from data”) — implement explicit boundaries and provenance signals for external content (delimiting, encoding, or marking retrieved data) to reduce the chance of it being interpreted as instruction. Spotlighting-style techniques report large reductions in attack success rate with minimal task impact in experiments.
- Tool firewall and policy engine — treat tool calls as untrusted requests requiring deterministic authorisation, not as model outputs to execute. OWASP stresses that security controls (authZ, privilege bounds) must not be delegated to the LLM.
- “Untrusted-context mode” — when any untrusted retrieved content enters context, automatically degrade agent capabilities: block high-impact tools, require confirmation, and shorten memory writes. This aligns directly with AML.M0030.
- Secrets governance for RAG and prompts — prevent credentials from appearing in RAG-indexed sources, prevent secrets in system prompts, and assume prompt extraction is feasible.
Evaluation, Testing, Residual Risks, and Roadmap
Evaluation and testing methods
Threat-model-based red teaming. Use ATLAS technique IDs as a test-plan backbone: for each high-risk workflow, design adversary simulations that attempt AML.T0051 (direct/indirect/triggered), tool invocation abuse, RAG poisoning, memory poisoning, and exfiltration via tools and output rendering. MITRE’s SAFE-AI framing of security control assessment for AI-enabled systems supports this approach: select controls that address AI-specific threats, not only traditional IT threats.
Unit tests and regression tests. Create deterministic unit tests around context construction (ensuring untrusted segments are marked and segmented correctly), tool policy enforcement (ensuring tool calls cannot execute without policy checks and required approvals), and memory writes and retrieval validation (preventing persistent poisoning).
Metrics and benchmarks. Use dual-axis evaluation: effectiveness against adaptive prompt injection attacks and diverse prompts, and general-purpose utility to quantify trade-offs (latency, refusal rates, task success). Where applicable, adopt metrics from the research ecosystem — attack success rate (ASR) is commonly used in prompt-injection defence evaluation, SPE-LLM proposes evaluation metrics and benchmark-driven validation for system prompt extraction, and Imprompter-style evaluations focus on end-to-end confidentiality and integrity compromise via improper tool use.
Operational validation. NIST AI RMF emphasises applying risk management functions (govern/map/measure/manage) in a lifecycle manner. For prompt injection, this encourages continuous measurement and iterative control improvement rather than one-time “prompt patching.”
Residual risk and trade-offs
Residual risk remains even after strong controls, for well-documented reasons:
- No foolproof prevention — OWASP explicitly notes the stochastic nature of generative AI and the lack of guaranteed prevention; systems should instead mitigate impact.
- Defence efficacy versus utility — strong guardrails and heavy filtering can degrade helpfulness and increase false positives, and academic evaluation stresses measuring both dimensions.
- Agentic power concentrates risk — as workflows add browsing, plugins, and tools, indirect prompt injection becomes more severe because attacker instructions can run in the user’s credential context and drive real actions.
- Persistence surfaces (memory, shared threads, config) mean that containment must include remediation processes, not only input filters.
- Reasoning trace exposure can create additional leakage and safety-bypass risk in reasoning models.
Prioritised implementation roadmap
The roadmap below is deployment-agnostic and prioritised by expected risk reduction against documented ATLAS chains, feasibility, and blast-radius containment.
Figure 4 — Three-phase implementation roadmap prioritised by risk reduction, feasibility, and blast-radius containment.
Conclusion
Prompt injection is not a single vulnerability — it is an attack primitive that enables entire kill chains. MITRE ATLAS provides the structured, adversary-centric vocabulary we need to reason about these threats systematically, from the reconnaissance and resource development phases through execution, persistence, and impact.
The most important takeaway from this threat model is that no single defence layer is sufficient. The recurring patterns documented in ATLAS case studies — prompt injection chained with tool invocation, configuration modification, and memory poisoning — demonstrate that attackers will combine traditional and AI-specific techniques in the same campaign. Effective defence requires constraining agency and tool impact, instrumenting everything, hardening prompt and context handling, and eliminating secrets from the model’s reach.
For teams getting started, the immediate hardening actions (telemetry logging, tool permission controls, untrusted-context mode, and removing secrets from prompts) deliver the highest risk reduction per unit of effort. For teams already down that path, investing in provenance-aware context assembly, continuous red teaming mapped to ATLAS technique IDs, and dual-axis evaluation metrics (effectiveness plus utility) will build durable, measurable resilience.
See also: MITRE ATLAS Deep Dive: Threat Intelligence for AI Systems in 2026 for a comprehensive walkthrough of the full ATLAS framework, and OWASP Top 10 for LLM Applications: An In-Depth Guide for 2026 for a complementary vulnerability-focused perspective.