ChatInject | ICLR 2026

Table 1. Main Results on InjecAgent and AgentDojo

Colored deltas are relative to Default InjecPrompt. "+ think" / "+ tool" denote the reasoning and tool-calling hooks built on InjecPrompt + ChatInject. Best ASR per row is bold. Shaded columns are the default (plain-text) baselines.

Metric	Model	InjecPrompt				Multi-turn
Metric	Model	default	ChatInject	+ think	+ tool	default	ChatInject
InjecAgent
ASR ↑	Qwen-3	8.5	39.4 (+30.9)	40.1 (+31.6)	42.1 (+33.6)	10.7	65.9 (+55.2)
	GPT-oss	0.0	14.2 (+14.2)	16.7 (+16.7)	19.1 (+19.1)	0.1	16.9 (+16.8)
	Llama-4	50.1	79.4 (+29.3)	—	88.3 (+38.2)	16.6	88.3 (+71.7)
	GLM-4.5	0.0	57.3 (+57.3)	69.3 (+69.3)	72.2 (+72.2)	0.1	71.5 (+71.4)
	Kimi-K2	15.7	67.4 (+51.7)	—	72.2 (+56.5)	17.2	61.0 (+43.8)
	Grok-2	16.5	17.7 (+1.2)	—	—	1.6	10.4 (+8.8)
AgentDojo
ASR ↑	Qwen-3	17.5	54.8 (+37.3)	66.1 (+48.6)	69.4 (+51.9)	60.9	80.5 (+19.6)
	GPT-oss	0.3	51.4 (+51.1)	48.6 (+48.3)	47.4 (+47.1)	3.6	55.5 (+51.9)
	Llama-4	1.0	17.2 (+16.2)	—	19.8 (+18.8)	1.8	11.1 (+9.3)
	GLM-4.5	0.3	20.3 (+20.0)	24.8 (+24.5)	36.0 (+35.7)	17.5	48.1 (+30.6)
	Kimi-K2	5.9	29.3 (+23.4)	—	44.2 (+38.3)	12.3	13.9 (+1.6)
	Grok-2	6.1	19.3 (+13.2)	—	—	23.7	24.7 (+1.0)
Utility ↑	Qwen-3	50.9	28.3 (-22.6)	24.4 (-26.5)	22.9 (-28.0)	52.4	27.5 (-24.9)
	GPT-oss	19.6	18.8 (-0.8)	11.1 (-8.5)	9.0 (-10.6)	38.3	8.0 (-30.3)
	Llama-4	16.5	15.9 (-0.6)	—	14.7 (-1.8)	18.5	16.2 (-2.3)
	GLM-4.5	78.4	67.9 (-10.5)	65.7 (-12.7)	68.1 (-10.3)	75.8	67.9 (-7.9)
	Kimi-K2	71.5	35.0 (-36.5)	—	35.2 (-36.3)	72.0	69.9 (-2.1)
	Grok-2	41.7	29.8 (-11.9)	—	—	33.9	31.9 (-2.0)

Key takeaways:

ChatInject strengthens every payload. On both benchmarks and across all models, wrapping the payload in the model's native template consistently raises ASR over Default InjecPrompt and Default Multi-turn — confirming that LLMs re-interpret template-wrapped payloads as higher-priority instructions.
Multi-turn + ChatInject shows strong synergy. Plain-text multi-turn alone yields only a modest gain, but combined with ChatInject the ASR rises sharply (e.g., GLM-4.5 0.1→71.5 on InjecAgent; Qwen-3 60.9→80.5 on AgentDojo).
Reasoning / tool-calling hooks amplify the attack further. The agentic variants (+ think, + tool) increase ASR and reduce Utility relative to InjecPrompt + ChatInject, with the tool-calling hook producing the largest swings.
Effectiveness tracks template structure. Models with explicit role delimiters (Qwen-3, GLM-4.5) show the largest ASR increases; Grok-2, whose template lacks strong delimiters, is the most robust.
Utility systematically drops as ASR rises on AgentDojo, confirming that the attacker payload diverts the agent away from the original user instruction.

Table 2. Cross-Model Template Transferability (ASR)

Rows = target model; columns = injected template. ^† denotes closed-source LLMs. Yellow marks self-template (or family-aligned) cases; bold is the best ASR per row; pink shows row/column averages.

Target Model	Injected Template								Avg.
Target Model	default	Qwen-3	GPT-oss	Llama-4	GLM-4.5	Kimi-K2	Grok-2	Gemma-3	Avg.
InjecAgent — Open-source Targets
Qwen-3	8.6	39.4	3.0	4.1	3.2	35.8	3.1	11.3	13.6
GPT-oss	0.2	0.1	14.1	0.2	0.0	0.4	0.1	0.5	2.0
Llama-4	50.1	22.2	23.8	79.3	14.0	31.7	17.1	40.5	34.8
GLM-4.5	0.0	0.2	0.3	0.1	57.2	0.0	0.1	0.1	7.3
Kimi-K2	15.6	53.7	13.9	40.4	9.7	67.3	14.7	24.2	29.9
Grok-2	16.4	12.8	7.8	3.6	1.1	6.1	16.6	—	9.2
Avg.	15.2	21.4	10.5	21.3	14.2	23.5	8.6	15.3	—
InjecAgent — Closed-source Targets
GPT-4o^†	9.6	31.7	23.6	3.2	2.3	22.9	0.7	3.9	12.2
Grok-3^†	2.3	29.8	7.5	8.8	2.4	21.7	19.7	50.9	17.9
Gemini-pro^†	1.4	27.4	14.3	6.8	7.8	14.5	9.9	20.2	12.8
Avg.	4.4	29.6	15.1	6.3	4.2	19.7	10.1	25.0	—
AgentDojo — Open-source Targets
Qwen-3	17.5	54.8	36.0	27.3	15.4	47.0	19.2	21.3	29.8
GPT-oss	0.3	10.8	51.4	0.5	0.0	6.7	0.0	6.4	9.5
Llama-4	1.0	11.6	9.5	19.0	3.9	7.7	4.1	7.5	8.0
GLM-4.5	0.3	1.3	1.3	3.3	20.3	1.5	0.5	—	4.1
Kimi-K2	5.9	15.5	8.7	10.0	3.9	29.3	3.1	6.2	10.3
Grok-2	6.2	6.7	1.0	1.5	0.5	2.6	19.3	—	5.4
Avg.	5.2	16.8	18.0	10.3	7.3	15.8	7.7	10.4	11.4
AgentDojo — Closed-source Targets
GPT-4o^†	6.4	27.3	40.1	9.8	5.4	31.4	2.6	7.2	16.3
Grok-3^†	8.2	33.2	10.8	19.5	19.0	22.6	37.0	30.3	22.6
Gemini-pro^†	8.2	10.1	2.6	1.3	2.1	7.3	1.5	10.3	5.4
Avg.	7.6	23.5	17.8	10.2	8.8	20.4	13.7	15.9	14.8

Key takeaways:

Open → Open. On InjecAgent, foreign templates often fall below the plain-text baseline, but on the more realistic AgentDojo pipeline they frequently exceed it — foreign templates remain a credible threat.
Qwen-3 / Kimi-K2 mutual transfer. Their templates are the most similar in embedding space and indeed transfer most strongly to each other; Grok-2 is the opposite — dissimilar and robust in both directions.
Open → Closed transfer is alarming. Even without access to their proprietary templates, wrapping payloads in OS templates pushes ASR well above the default on GPT-4o, Grok-3, and Gemini-pro. Family-aligned transfers are especially potent: GPT-oss → GPT-4o (40.1 on AgentDojo), Grok-2 → Grok-3 (37.0), Gemma-3 → Gemini-pro (10.3 — the best for that target).
Qwen-3 template is a broadly strong attacker: 29.6% avg ASR on InjecAgent-CS and 23.5% on AgentDojo-CS.

Cross-Model Transferability

Similarity predicts transfer: the closer the injected template is to the target's native template (by embedding cosine similarity), the higher the resulting ASR.
Closed-source models are not immune: even without knowing their true templates, injecting open-source templates raises ASR above the plain-text baseline on GPT-4o, Grok-3, and Gemini-2.5-Pro. Family-aligned transfers (GPT-oss → GPT-4o, Grok-2 → Grok-3, Gemma-3 → Gemini-pro) are especially effective.
Qwen-3 template transfers broadly: ~29.6% avg ASR on InjecAgent and ~23.5% on AgentDojo against closed-source targets.

Higher template similarity → higher ASR; Utility mirrors the opposite trend.

Mixture-of-Templates: Attacking Unknown Agents

MoT matches the best single-template result with far lower variance across seeds.

In practice the attacker may not know which LLM powers the target agent. We propose Mixture-of-Templates (MoT), which concatenates a random permutation of all candidate templates so that the target inevitably encounters its native wrapper. Across Qwen-3, GPT-oss, and Llama-4, MoT consistently exceeds Default InjecPrompt and, unlike picking one foreign template at random, exhibits substantially lower variance — making it a reliable attack in the unknown-backbone setting.

Existing Defenses Largely Fail

ASR and Utility under five standard defenses for Qwen-3 and Grok-3.

Prompt-based defenses (Instructional Prevention, Data Delimiters, User-Instruction Repetition) often increase ASR against ChatInject variants — the agent itself cannot tell structural/contextual manipulation apart from user intent.
Detector-based defenses (PI detector, Lakera Guard) reduce ASR but react mostly to special tokens, so Default Multi-turn (no tokens, just persuasion) slips through; they also suffer high false-positive rates that stall the agent and collapse Utility.

Robust to Template Stripping

A natural countermeasure is format stripping — parsing out known role tags to degrade the payload to plain text. We show this is easily defeated by light character-level perturbations (10% remove / replace / insert) of the template wrapper before injection. All perturbed variants still outperform Default InjecPrompt and Default Multi-turn, indicating that deterministic rule-based filters are insufficient.

Perturbed ChatInject / MoT / Multi-turn + ChatInject consistently beat plain-text baselines.

ChatInject: Abusing Chat Templates
for Prompt Injection in LLM Agents

Abstract

Motivation

1 Role-Based Chat Template Hierarchies Can Be Forged

2 Indirect Injection Inherits Multi-Turn Persuasion

ChatInject: Payload Construction

Experiments

Table 1. Main Results on InjecAgent and AgentDojo