ASI01
Agent Goal Hijack
๐Ÿ“ฐ In The Wild

EchoLeak (May 2025) โ€” A crafted email silently triggered Microsoft 365 Copilot to exfiltrate confidential emails, files, and chat logs. Zero clicks required.

Source: Aim Security / Microsoft, 2025

BONUS TECH DECODER

Indirect Prompt Injection:Hidden commands planted in content an agent reads โ€” emails, docs, web pages โ€” that silently override its real goals.
Goal Hijack:An agent's objectives are secretly redirected by malicious inputs โ€” it still looks helpful, but it's working for the attacker.
Exfiltrate:Secretly stealing and transmitting data to an unauthorized destination โ€” like a spy smuggling photocopied files out of a building.
๐Ÿ”— LLM Top 10 Connections
LLM01LLM06

Prompt Injection ยท Excessive Agency

๐Ÿง  WHAT IS IT?

Agent Goal Hijack happens when an attacker manipulates an AI agent's objectives so it pursues unauthorized goals instead of the ones it was designed for. Because agents process natural language from many sources, they cannot tell legitimate instructions apart from attacker-planted ones. It's like reprogramming a delivery robot mid-route โ€” it still looks like it's doing its job, but it's working for the wrong person.

๐Ÿ” HOW IT HAPPENS

  • Attacker embeds hidden instructions in a document, email, or web page the agent processes during a normal task
  • The agent treats malicious content as legitimate, silently overriding its original goals
  • The hijacked agent uses its own authorized tools to carry out the attacker's objectives while appearing normal
  • In multi-agent systems, forged agent messages can redirect entire automated pipelines

๐Ÿšจ WHY IT MATTERS

CC
II
A hijacked agent can exfiltrate files, send fraudulent messages under a trusted identity, or approve unauthorized transfers โ€” all within normal permissions, invisible to every security system watching it.

๐Ÿ›ก๏ธ HOW TO PREVENT IT

  • Treat all natural-language inputs as untrusted โ€” validate every document, email, and web page before the agent acts
  • Require human approval before high-impact or goal-changing actions; pause on any unexpected goal shift
  • Lock system prompts under version control โ€” goal priorities must be formally approved to change
  • Monitor agent behavior continuously; alert immediately when tool-use patterns deviate from the baseline