ASI09 - Human-Agent Trust Exploitation

ASI09

📰 In The Wild

M365 Copilot Wire Transfer (2025) — A poisoned invoice caused Copilot to confidently recommend urgent payment to attacker-controlled bank details. The finance manager trusted the AI's reasoning and approved without independent verification.

Source: OWASP ASI Incidents Tracker, 2025

BONUS TECH DECODER

Automation Bias:The human tendency to over-trust AI recommendations and under-apply critical thinking — the core cognitive weakness trust exploitation attacks target.

Anthropomorphism:Attributing human-like trustworthiness and expertise to AI — it makes agents feel credible, but makes their manipulation harder to recognize.

Consent Laundering:A "preview" or "review" step that appears harmless but triggers real system changes — disguising an action as a routine approval.

🔗 LLM Top 10 Connections

LLM01LLM05LLM06LLM09

Prompt Injection · Output Handling · Excessive Agency · Misinformation

🧠 WHAT IS IT?

AI agents build strong trust through fluency, apparent expertise, and confident explanations. Adversaries exploit this to manipulate users into approving harmful actions — using the agent as an invisible intermediary. The agent acts as an untraceable bad influence: the human performs the final audited action, making the agent's role invisible to any forensic investigation.

🔍 HOW IT HAPPENS

An agent fabricates a plausible explanation for a risky action; the human trusts the reasoning without independent verification
A prompt-injected support agent impersonates a helpdesk, cites real tickets to appear credible, then harvests credentials
An agent poisoned via a manipulated document urgently recommends a high-value action that benefits the attacker
Opaque AI reasoning forces users to approve outputs they cannot understand, hiding malicious logic behind legitimacy

🚨 WHY IT MATTERS

Trust exploitation targets the human in the loop — the last line of defense. Once human oversight is neutralized by over-reliance on AI, every other security control can be bypassed through a single approved action. The audit trail points only at the human who clicked approve.

🛡️ HOW TO PREVENT IT

Require multi-step human approval for sensitive or irreversible actions — one click is never sufficient
Provide plain-language risk summaries alongside AI recommendations; give users a clear way to flag suspicious behavior
Visually differentiate high-risk recommendations — banners, red borders; no side effects during any review step
Train personnel to recognize AI manipulation patterns; maintain scepticism toward unusually urgent recommendations