ASI01 - Agent Goal Hijack

ASI01

📰 In The Wild

EchoLeak (May 2025) — A crafted email silently triggered Microsoft 365 Copilot to exfiltrate confidential emails, files, and chat logs. Zero clicks required.

Source: Aim Security / Microsoft, 2025

BONUS TECH DECODER

Indirect Prompt Injection:Hidden commands planted in content an agent reads — emails, docs, web pages — that silently override its real goals.

Goal Hijack:An agent's objectives are secretly redirected by malicious inputs — it still looks helpful, but it's working for the attacker.

Exfiltrate:Secretly stealing and transmitting data to an unauthorized destination — like a spy smuggling photocopied files out of a building.

🔗 LLM Top 10 Connections

LLM01LLM06

Prompt Injection · Excessive Agency

🧠 WHAT IS IT?

Agent Goal Hijack happens when an attacker manipulates an AI agent's objectives so it pursues unauthorized goals instead of the ones it was designed for. Because agents process natural language from many sources, they cannot tell legitimate instructions apart from attacker-planted ones. It's like reprogramming a delivery robot mid-route — it still looks like it's doing its job, but it's working for the wrong person.

🔍 HOW IT HAPPENS

Attacker embeds hidden instructions in a document, email, or web page the agent processes during a normal task
The agent treats malicious content as legitimate, silently overriding its original goals
The hijacked agent uses its own authorized tools to carry out the attacker's objectives while appearing normal
In multi-agent systems, forged agent messages can redirect entire automated pipelines

🚨 WHY IT MATTERS

A hijacked agent can exfiltrate files, send fraudulent messages under a trusted identity, or approve unauthorized transfers — all within normal permissions, invisible to every security system watching it.

🛡️ HOW TO PREVENT IT

Treat all natural-language inputs as untrusted — validate every document, email, and web page before the agent acts
Require human approval before high-impact or goal-changing actions; pause on any unexpected goal shift
Lock system prompts under version control — goal priorities must be formally approved to change
Monitor agent behavior continuously; alert immediately when tool-use patterns deviate from the baseline