ChatInject: Abusing Chat Templates
for Prompt Injection in LLM Agents

Chung-Ang University, Seoul, Korea
*Equal contribution   Corresponding author

Abstract

The growing deployment of LLM-based agents that interact with external environments has created new attack surfaces for adversarial manipulation. While prior work on indirect prompt injection has primarily focused on plain-text payloads, we identify a significant yet underexplored vulnerability: instruction-tuned LLMs' dependence on structured chat templates and their susceptibility to persuasive multi-turn dialogues. We introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates — exploiting the model's learned role hierarchy — and a Multi-turn variant that embeds a simulated persuasive conversation inside a single tool output. Across frontier LLMs, ChatInject raises average ASR from 5.18% → 32.05% on AgentDojo and 15.13% → 45.90% on InjecAgent (up to 52.33% for the multi-turn variant); template-based payloads transfer strongly across models — including closed-source targets — and remain robust to existing prompt-based defenses and template-stripping perturbations.

Motivation

ChatInject Overview

An example of indirect prompt injection in an LLM agent. A malicious instruction embedded in a tool response hijacks the agent and triggers an unintended update_password call. While Default InjecPrompt (plain-text) often fails, a payload wrapped in the model's native chat template (ChatInject) is reinterpreted as a higher-priority instruction.

Existing indirect prompt injection attacks — both hand-crafted and automated — operate almost exclusively at the plain-text level. This overlooks two structural vulnerabilities introduced by how modern instruction-tuned LLMs are actually trained and deployed. Addressing them motivates ChatInject and its Multi-turn variant.

1 Role-Based Chat Template Hierarchies Can Be Forged

To resist prompt injection, LLMs are increasingly trained to enforce an instruction hierarchy (system > user > assistant > tool), implemented with special role tokens such as <|system|> or <|user|>. However, this token-based segmentation is itself a new attack surface: an attacker who embeds these same role tokens inside a low-priority tool output can forge a higher-priority role. The model, having learned to trust these delimiters, promotes the attacker's payload to an authoritative instruction — effectively bypassing the intended hierarchy.

2 Indirect Injection Inherits Multi-Turn Persuasion

Jailbreak research has shown LLMs are highly vulnerable to gradual persuasion across multiple turns. But indirect prompt injection is a strictly one-shot channel — the attacker writes a single tool response and cannot interactively argue with the agent. Because LLMs rely on role tokens to reconstruct conversational state, an attacker can abuse the same mechanism to embed a simulated multi-turn dialogue in a single payload: forged <|user|> / <|assistant|> turns first establish a benign rationale, then naturally introduce the malicious instruction as a "next step". This turns one-shot injection into virtual persuasive dialogue.

ChatInject: Payload Construction

Given an attacker instruction Ia or a crafted persuasive dialogue Ca that embeds it, we apply a template function Ttype to form four payload variants combining plain vs. template formatting with single-instruction vs. multi-turn content.

Four ChatInject payload variants

The four payload variants generated by combining (plain / model-template) × (single-instruction / multi-turn).

A
Default InjecPrompt  Tplain(Ia)
Attention-grabbing prefix + malicious instruction as plain text. The standard baseline.
B
InjecPrompt + ChatInject  Tmodel(Ia)
Same content, wrapped in the target model's native role tags (system/user), forging a higher-priority role.
C
Default Multi-turn  Tplain(Ca)
A 7-turn persuasive dialogue (synthesized by GPT-4.1 and manually reviewed) embedded as plain text.
D
Multi-turn + ChatInject  Tmodel(Ca)
Every turn of the persuasive dialogue is wrapped in the model's role tags — the most sophisticated variant, combining role-hierarchy abuse with contextual priming.

Experiments

Evaluated on AgentDojo and InjecAgent across 9 frontier LLMs (6 open-source, 3 closed-source). Metrics: Attack Success Rate (ASR, ↑) and Utility under Attack (↑).

Table 1. Main Results on InjecAgent and AgentDojo

Colored deltas are relative to Default InjecPrompt. "+ think" / "+ tool" denote the reasoning and tool-calling hooks built on InjecPrompt + ChatInject. Best ASR per row is bold. Shaded columns are the default (plain-text) baselines.

Metric Model InjecPrompt Multi-turn
default ChatInject + think + tool default ChatInject
InjecAgent
ASR ↑ Qwen-3 8.5 39.4 (+30.9) 40.1 (+31.6) 42.1 (+33.6) 10.7 65.9 (+55.2)
GPT-oss 0.0 14.2 (+14.2) 16.7 (+16.7) 19.1 (+19.1) 0.1 16.9 (+16.8)
Llama-4 50.1 79.4 (+29.3) 88.3 (+38.2) 16.6 88.3 (+71.7)
GLM-4.5 0.0 57.3 (+57.3) 69.3 (+69.3) 72.2 (+72.2) 0.1 71.5 (+71.4)
Kimi-K2 15.7 67.4 (+51.7) 72.2 (+56.5) 17.2 61.0 (+43.8)
Grok-2 16.5 17.7 (+1.2) 1.6 10.4 (+8.8)
AgentDojo
ASR ↑ Qwen-3 17.5 54.8 (+37.3) 66.1 (+48.6) 69.4 (+51.9) 60.9 80.5 (+19.6)
GPT-oss 0.3 51.4 (+51.1) 48.6 (+48.3) 47.4 (+47.1) 3.6 55.5 (+51.9)
Llama-4 1.0 17.2 (+16.2) 19.8 (+18.8) 1.8 11.1 (+9.3)
GLM-4.5 0.3 20.3 (+20.0) 24.8 (+24.5) 36.0 (+35.7) 17.5 48.1 (+30.6)
Kimi-K2 5.9 29.3 (+23.4) 44.2 (+38.3) 12.3 13.9 (+1.6)
Grok-2 6.1 19.3 (+13.2) 23.7 24.7 (+1.0)
Utility ↑ Qwen-3 50.9 28.3 (-22.6) 24.4 (-26.5) 22.9 (-28.0) 52.4 27.5 (-24.9)
GPT-oss 19.6 18.8 (-0.8) 11.1 (-8.5) 9.0 (-10.6) 38.3 8.0 (-30.3)
Llama-4 16.5 15.9 (-0.6) 14.7 (-1.8) 18.5 16.2 (-2.3)
GLM-4.5 78.4 67.9 (-10.5) 65.7 (-12.7) 68.1 (-10.3) 75.8 67.9 (-7.9)
Kimi-K2 71.5 35.0 (-36.5) 35.2 (-36.3) 72.0 69.9 (-2.1)
Grok-2 41.7 29.8 (-11.9) 33.9 31.9 (-2.0)

Key takeaways:

  • ChatInject strengthens every payload. On both benchmarks and across all models, wrapping the payload in the model's native template consistently raises ASR over Default InjecPrompt and Default Multi-turn — confirming that LLMs re-interpret template-wrapped payloads as higher-priority instructions.
  • Multi-turn + ChatInject shows strong synergy. Plain-text multi-turn alone yields only a modest gain, but combined with ChatInject the ASR rises sharply (e.g., GLM-4.5 0.1→71.5 on InjecAgent; Qwen-3 60.9→80.5 on AgentDojo).
  • Reasoning / tool-calling hooks amplify the attack further. The agentic variants (+ think, + tool) increase ASR and reduce Utility relative to InjecPrompt + ChatInject, with the tool-calling hook producing the largest swings.
  • Effectiveness tracks template structure. Models with explicit role delimiters (Qwen-3, GLM-4.5) show the largest ASR increases; Grok-2, whose template lacks strong delimiters, is the most robust.
  • Utility systematically drops as ASR rises on AgentDojo, confirming that the attacker payload diverts the agent away from the original user instruction.

Table 2. Cross-Model Template Transferability (ASR)

Rows = target model; columns = injected template. denotes closed-source LLMs. Yellow marks self-template (or family-aligned) cases; bold is the best ASR per row; pink shows row/column averages.

Target Model Injected Template Avg.
default Qwen-3 GPT-oss Llama-4 GLM-4.5 Kimi-K2 Grok-2 Gemma-3
InjecAgent — Open-source Targets
Qwen-3 8.6 39.4 3.04.13.235.83.111.3 13.6
GPT-oss 0.2 0.1 14.1 0.20.00.40.10.5 2.0
Llama-4 50.1 22.223.8 79.3 14.031.717.140.5 34.8
GLM-4.5 0.0 0.20.30.1 57.2 0.00.10.1 7.3
Kimi-K2 15.6 53.713.940.49.7 67.3 14.724.2 29.9
Grok-2 16.4 12.87.83.61.16.1 16.6 9.2
Avg. 15.221.410.521.314.223.58.615.3
InjecAgent — Closed-source Targets
GPT-4o 9.6 31.7 23.6 3.22.322.90.73.9 12.2
Grok-3 2.3 29.87.58.82.421.7 19.7 50.9 17.9
Gemini-pro 1.4 27.414.36.87.814.59.9 20.2 12.8
Avg. 4.429.615.16.34.219.710.125.0
AgentDojo — Open-source Targets
Qwen-3 17.5 54.8 36.027.315.447.019.221.3 29.8
GPT-oss 0.3 10.8 51.4 0.50.06.70.06.4 9.5
Llama-4 1.0 11.69.5 19.0 3.97.74.17.5 8.0
GLM-4.5 0.3 1.31.33.3 20.3 1.50.5 4.1
Kimi-K2 5.9 15.58.710.03.9 29.3 3.16.2 10.3
Grok-2 6.2 6.71.01.50.52.6 19.3 5.4
Avg. 5.216.818.010.37.315.87.710.4 11.4
AgentDojo — Closed-source Targets
GPT-4o 6.4 27.3 40.1 9.85.431.42.67.2 16.3
Grok-3 8.2 33.210.819.519.022.6 37.0 30.3 22.6
Gemini-pro 8.2 10.12.61.32.17.31.5 10.3 5.4
Avg. 7.623.517.810.28.820.413.715.9 14.8

Key takeaways:

  • Open → Open. On InjecAgent, foreign templates often fall below the plain-text baseline, but on the more realistic AgentDojo pipeline they frequently exceed it — foreign templates remain a credible threat.
  • Qwen-3 / Kimi-K2 mutual transfer. Their templates are the most similar in embedding space and indeed transfer most strongly to each other; Grok-2 is the opposite — dissimilar and robust in both directions.
  • Open → Closed transfer is alarming. Even without access to their proprietary templates, wrapping payloads in OS templates pushes ASR well above the default on GPT-4o, Grok-3, and Gemini-pro. Family-aligned transfers are especially potent: GPT-oss → GPT-4o (40.1 on AgentDojo), Grok-2 → Grok-3 (37.0), Gemma-3 → Gemini-pro (10.3 — the best for that target).
  • Qwen-3 template is a broadly strong attacker: 29.6% avg ASR on InjecAgent-CS and 23.5% on AgentDojo-CS.

Cross-Model Transferability

  • Similarity predicts transfer: the closer the injected template is to the target's native template (by embedding cosine similarity), the higher the resulting ASR.
  • Closed-source models are not immune: even without knowing their true templates, injecting open-source templates raises ASR above the plain-text baseline on GPT-4o, Grok-3, and Gemini-2.5-Pro. Family-aligned transfers (GPT-oss → GPT-4o, Grok-2 → Grok-3, Gemma-3 → Gemini-pro) are especially effective.
  • Qwen-3 template transfers broadly: ~29.6% avg ASR on InjecAgent and ~23.5% on AgentDojo against closed-source targets.
Template similarity vs ASR

Higher template similarity → higher ASR; Utility mirrors the opposite trend.

Mixture-of-Templates: Attacking Unknown Agents

Mixture-of-Templates ASR

MoT matches the best single-template result with far lower variance across seeds.

In practice the attacker may not know which LLM powers the target agent. We propose Mixture-of-Templates (MoT), which concatenates a random permutation of all candidate templates so that the target inevitably encounters its native wrapper. Across Qwen-3, GPT-oss, and Llama-4, MoT consistently exceeds Default InjecPrompt and, unlike picking one foreign template at random, exhibits substantially lower variance — making it a reliable attack in the unknown-backbone setting.

Existing Defenses Largely Fail

Defense evaluation

ASR and Utility under five standard defenses for Qwen-3 and Grok-3.

  • Prompt-based defenses (Instructional Prevention, Data Delimiters, User-Instruction Repetition) often increase ASR against ChatInject variants — the agent itself cannot tell structural/contextual manipulation apart from user intent.
  • Detector-based defenses (PI detector, Lakera Guard) reduce ASR but react mostly to special tokens, so Default Multi-turn (no tokens, just persuasion) slips through; they also suffer high false-positive rates that stall the agent and collapse Utility.

Robust to Template Stripping

A natural countermeasure is format stripping — parsing out known role tags to degrade the payload to plain text. We show this is easily defeated by light character-level perturbations (10% remove / replace / insert) of the template wrapper before injection. All perturbed variants still outperform Default InjecPrompt and Default Multi-turn, indicating that deterministic rule-based filters are insufficient.

Template perturbation

Perturbed ChatInject / MoT / Multi-turn + ChatInject consistently beat plain-text baselines.

BibTeX

@inproceedings{
chang2026chatinject,
title={ChatInject: Abusing Chat Templates for Prompt Injection in {LLM} Agents},
author={Hwan Chang and Yonghyun Jun and Hwanhee Lee},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=WVhgFSKniL}
}