ASI09
Human-Agent Trust Exploitation
๐Ÿ“ฐ In The Wild

M365 Copilot Wire Transfer (2025) โ€” A poisoned invoice caused Copilot to confidently recommend urgent payment to attacker-controlled bank details. The finance manager trusted the AI's reasoning and approved without independent verification.

Source: OWASP ASI Incidents Tracker, 2025

BONUS TECH DECODER

Automation Bias:The human tendency to over-trust AI recommendations and under-apply critical thinking โ€” the core cognitive weakness trust exploitation attacks target.
Anthropomorphism:Attributing human-like trustworthiness and expertise to AI โ€” it makes agents feel credible, but makes their manipulation harder to recognize.
Consent Laundering:A "preview" or "review" step that appears harmless but triggers real system changes โ€” disguising an action as a routine approval.
๐Ÿ”— LLM Top 10 Connections
LLM01LLM05LLM06LLM09

Prompt Injection ยท Output Handling ยท Excessive Agency ยท Misinformation

๐Ÿง  WHAT IS IT?

AI agents build strong trust through fluency, apparent expertise, and confident explanations. Adversaries exploit this to manipulate users into approving harmful actions โ€” using the agent as an invisible intermediary. The agent acts as an untraceable bad influence: the human performs the final audited action, making the agent's role invisible to any forensic investigation.

๐Ÿ” HOW IT HAPPENS

  • An agent fabricates a plausible explanation for a risky action; the human trusts the reasoning without independent verification
  • A prompt-injected support agent impersonates a helpdesk, cites real tickets to appear credible, then harvests credentials
  • An agent poisoned via a manipulated document urgently recommends a high-value action that benefits the attacker
  • Opaque AI reasoning forces users to approve outputs they cannot understand, hiding malicious logic behind legitimacy

๐Ÿšจ WHY IT MATTERS

CC
II
Trust exploitation targets the human in the loop โ€” the last line of defense. Once human oversight is neutralized by over-reliance on AI, every other security control can be bypassed through a single approved action. The audit trail points only at the human who clicked approve.

๐Ÿ›ก๏ธ HOW TO PREVENT IT

  • Require multi-step human approval for sensitive or irreversible actions โ€” one click is never sufficient
  • Provide plain-language risk summaries alongside AI recommendations; give users a clear way to flag suspicious behavior
  • Visually differentiate high-risk recommendations โ€” banners, red borders; no side effects during any review step
  • Train personnel to recognize AI manipulation patterns; maintain scepticism toward unusually urgent recommendations