The growing deployment of LLM-based agents that interact with external environments has created new attack surfaces for adversarial manipulation. While prior work on indirect prompt injection has primarily focused on plain-text payloads, we identify a significant yet underexplored vulnerability: instruction-tuned LLMs' dependence on structured chat templates and their susceptibility to persuasive multi-turn dialogues. We introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates — exploiting the model's learned role hierarchy — and a Multi-turn variant that embeds a simulated persuasive conversation inside a single tool output. Across frontier LLMs, ChatInject raises average ASR from 5.18% → 32.05% on AgentDojo and 15.13% → 45.90% on InjecAgent (up to 52.33% for the multi-turn variant); template-based payloads transfer strongly across models — including closed-source targets — and remain robust to existing prompt-based defenses and template-stripping perturbations.
An example of indirect prompt injection in an LLM agent.
A malicious instruction embedded in a tool response hijacks the agent and triggers an unintended update_password call.
While Default InjecPrompt (plain-text) often fails, a payload wrapped in the model's native chat template (ChatInject) is reinterpreted as a higher-priority instruction.
Existing indirect prompt injection attacks — both hand-crafted and automated — operate almost exclusively at the plain-text level. This overlooks two structural vulnerabilities introduced by how modern instruction-tuned LLMs are actually trained and deployed. Addressing them motivates ChatInject and its Multi-turn variant.
To resist prompt injection, LLMs are increasingly trained to enforce an instruction hierarchy (system > user > assistant > tool), implemented with special role tokens such as <|system|> or <|user|>.
However, this token-based segmentation is itself a new attack surface: an attacker who embeds these same role tokens inside a low-priority tool output can forge a higher-priority role.
The model, having learned to trust these delimiters, promotes the attacker's payload to an authoritative instruction — effectively bypassing the intended hierarchy.
Jailbreak research has shown LLMs are highly vulnerable to gradual persuasion across multiple turns. But indirect prompt injection is a strictly one-shot channel — the attacker writes a single tool response and cannot interactively argue with the agent.
Because LLMs rely on role tokens to reconstruct conversational state, an attacker can abuse the same mechanism to embed a simulated multi-turn dialogue in a single payload: forged <|user|> / <|assistant|> turns first establish a benign rationale, then naturally introduce the malicious instruction as a "next step".
This turns one-shot injection into virtual persuasive dialogue.
Given an attacker instruction Ia or a crafted persuasive dialogue Ca that embeds it, we apply a template function Ttype to form four payload variants combining plain vs. template formatting with single-instruction vs. multi-turn content.
The four payload variants generated by combining (plain / model-template) × (single-instruction / multi-turn).
Evaluated on AgentDojo and InjecAgent across 9 frontier LLMs (6 open-source, 3 closed-source). Metrics: Attack Success Rate (ASR, ↑) and Utility under Attack (↑).
Colored deltas are relative to Default InjecPrompt. "+ think" / "+ tool" denote the reasoning and tool-calling hooks built on InjecPrompt + ChatInject. Best ASR per row is bold. Shaded columns are the default (plain-text) baselines.
| Metric | Model | InjecPrompt | Multi-turn | ||||
|---|---|---|---|---|---|---|---|
| default | ChatInject | + think | + tool | default | ChatInject | ||
| InjecAgent | |||||||
| ASR ↑ | Qwen-3 | 8.5 | 39.4 (+30.9) | 40.1 (+31.6) | 42.1 (+33.6) | 10.7 | 65.9 (+55.2) |
| GPT-oss | 0.0 | 14.2 (+14.2) | 16.7 (+16.7) | 19.1 (+19.1) | 0.1 | 16.9 (+16.8) | |
| Llama-4 | 50.1 | 79.4 (+29.3) | — | 88.3 (+38.2) | 16.6 | 88.3 (+71.7) | |
| GLM-4.5 | 0.0 | 57.3 (+57.3) | 69.3 (+69.3) | 72.2 (+72.2) | 0.1 | 71.5 (+71.4) | |
| Kimi-K2 | 15.7 | 67.4 (+51.7) | — | 72.2 (+56.5) | 17.2 | 61.0 (+43.8) | |
| Grok-2 | 16.5 | 17.7 (+1.2) | — | — | 1.6 | 10.4 (+8.8) | |
| AgentDojo | |||||||
| ASR ↑ | Qwen-3 | 17.5 | 54.8 (+37.3) | 66.1 (+48.6) | 69.4 (+51.9) | 60.9 | 80.5 (+19.6) |
| GPT-oss | 0.3 | 51.4 (+51.1) | 48.6 (+48.3) | 47.4 (+47.1) | 3.6 | 55.5 (+51.9) | |
| Llama-4 | 1.0 | 17.2 (+16.2) | — | 19.8 (+18.8) | 1.8 | 11.1 (+9.3) | |
| GLM-4.5 | 0.3 | 20.3 (+20.0) | 24.8 (+24.5) | 36.0 (+35.7) | 17.5 | 48.1 (+30.6) | |
| Kimi-K2 | 5.9 | 29.3 (+23.4) | — | 44.2 (+38.3) | 12.3 | 13.9 (+1.6) | |
| Grok-2 | 6.1 | 19.3 (+13.2) | — | — | 23.7 | 24.7 (+1.0) | |
| Utility ↑ | Qwen-3 | 50.9 | 28.3 (-22.6) | 24.4 (-26.5) | 22.9 (-28.0) | 52.4 | 27.5 (-24.9) |
| GPT-oss | 19.6 | 18.8 (-0.8) | 11.1 (-8.5) | 9.0 (-10.6) | 38.3 | 8.0 (-30.3) | |
| Llama-4 | 16.5 | 15.9 (-0.6) | — | 14.7 (-1.8) | 18.5 | 16.2 (-2.3) | |
| GLM-4.5 | 78.4 | 67.9 (-10.5) | 65.7 (-12.7) | 68.1 (-10.3) | 75.8 | 67.9 (-7.9) | |
| Kimi-K2 | 71.5 | 35.0 (-36.5) | — | 35.2 (-36.3) | 72.0 | 69.9 (-2.1) | |
| Grok-2 | 41.7 | 29.8 (-11.9) | — | — | 33.9 | 31.9 (-2.0) | |
Key takeaways:
Rows = target model; columns = injected template. † denotes closed-source LLMs. Yellow marks self-template (or family-aligned) cases; bold is the best ASR per row; pink shows row/column averages.
| Target Model | Injected Template | Avg. | |||||||
|---|---|---|---|---|---|---|---|---|---|
| default | Qwen-3 | GPT-oss | Llama-4 | GLM-4.5 | Kimi-K2 | Grok-2 | Gemma-3 | ||
| InjecAgent — Open-source Targets | |||||||||
| Qwen-3 | 8.6 | 39.4 | 3.0 | 4.1 | 3.2 | 35.8 | 3.1 | 11.3 | 13.6 |
| GPT-oss | 0.2 | 0.1 | 14.1 | 0.2 | 0.0 | 0.4 | 0.1 | 0.5 | 2.0 |
| Llama-4 | 50.1 | 22.2 | 23.8 | 79.3 | 14.0 | 31.7 | 17.1 | 40.5 | 34.8 |
| GLM-4.5 | 0.0 | 0.2 | 0.3 | 0.1 | 57.2 | 0.0 | 0.1 | 0.1 | 7.3 |
| Kimi-K2 | 15.6 | 53.7 | 13.9 | 40.4 | 9.7 | 67.3 | 14.7 | 24.2 | 29.9 |
| Grok-2 | 16.4 | 12.8 | 7.8 | 3.6 | 1.1 | 6.1 | 16.6 | — | 9.2 |
| Avg. | 15.2 | 21.4 | 10.5 | 21.3 | 14.2 | 23.5 | 8.6 | 15.3 | — |
| InjecAgent — Closed-source Targets | |||||||||
| GPT-4o† | 9.6 | 31.7 | 23.6 | 3.2 | 2.3 | 22.9 | 0.7 | 3.9 | 12.2 |
| Grok-3† | 2.3 | 29.8 | 7.5 | 8.8 | 2.4 | 21.7 | 19.7 | 50.9 | 17.9 |
| Gemini-pro† | 1.4 | 27.4 | 14.3 | 6.8 | 7.8 | 14.5 | 9.9 | 20.2 | 12.8 |
| Avg. | 4.4 | 29.6 | 15.1 | 6.3 | 4.2 | 19.7 | 10.1 | 25.0 | — |
| AgentDojo — Open-source Targets | |||||||||
| Qwen-3 | 17.5 | 54.8 | 36.0 | 27.3 | 15.4 | 47.0 | 19.2 | 21.3 | 29.8 |
| GPT-oss | 0.3 | 10.8 | 51.4 | 0.5 | 0.0 | 6.7 | 0.0 | 6.4 | 9.5 |
| Llama-4 | 1.0 | 11.6 | 9.5 | 19.0 | 3.9 | 7.7 | 4.1 | 7.5 | 8.0 |
| GLM-4.5 | 0.3 | 1.3 | 1.3 | 3.3 | 20.3 | 1.5 | 0.5 | — | 4.1 |
| Kimi-K2 | 5.9 | 15.5 | 8.7 | 10.0 | 3.9 | 29.3 | 3.1 | 6.2 | 10.3 |
| Grok-2 | 6.2 | 6.7 | 1.0 | 1.5 | 0.5 | 2.6 | 19.3 | — | 5.4 |
| Avg. | 5.2 | 16.8 | 18.0 | 10.3 | 7.3 | 15.8 | 7.7 | 10.4 | 11.4 |
| AgentDojo — Closed-source Targets | |||||||||
| GPT-4o† | 6.4 | 27.3 | 40.1 | 9.8 | 5.4 | 31.4 | 2.6 | 7.2 | 16.3 |
| Grok-3† | 8.2 | 33.2 | 10.8 | 19.5 | 19.0 | 22.6 | 37.0 | 30.3 | 22.6 |
| Gemini-pro† | 8.2 | 10.1 | 2.6 | 1.3 | 2.1 | 7.3 | 1.5 | 10.3 | 5.4 |
| Avg. | 7.6 | 23.5 | 17.8 | 10.2 | 8.8 | 20.4 | 13.7 | 15.9 | 14.8 |
Key takeaways:
Higher template similarity → higher ASR; Utility mirrors the opposite trend.
MoT matches the best single-template result with far lower variance across seeds.
In practice the attacker may not know which LLM powers the target agent. We propose Mixture-of-Templates (MoT), which concatenates a random permutation of all candidate templates so that the target inevitably encounters its native wrapper. Across Qwen-3, GPT-oss, and Llama-4, MoT consistently exceeds Default InjecPrompt and, unlike picking one foreign template at random, exhibits substantially lower variance — making it a reliable attack in the unknown-backbone setting.
ASR and Utility under five standard defenses for Qwen-3 and Grok-3.
A natural countermeasure is format stripping — parsing out known role tags to degrade the payload to plain text. We show this is easily defeated by light character-level perturbations (10% remove / replace / insert) of the template wrapper before injection. All perturbed variants still outperform Default InjecPrompt and Default Multi-turn, indicating that deterministic rule-based filters are insufficient.
Perturbed ChatInject / MoT / Multi-turn + ChatInject consistently beat plain-text baselines.
@inproceedings{
chang2026chatinject,
title={ChatInject: Abusing Chat Templates for Prompt Injection in {LLM} Agents},
author={Hwan Chang and Yonghyun Jun and Hwanhee Lee},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=WVhgFSKniL}
}