Definition
Prompt injection attacks manipulate AI systems by embedding malicious instructions in input data.
- **Types:**
- Direct: User provides malicious prompt
- Indirect: Malicious content in retrieved data
- Hidden instructions in documents/websites
Attack Examples: - "Ignore previous instructions and..." - Hidden text in documents - Invisible characters - Data exfiltration attempts
Risks: - Data leakage - Unauthorized actions - Bypassing restrictions - System manipulation
Defenses: - Input sanitization - Output filtering - Privilege separation - Instruction hierarchy - Monitoring and detection
Analogy: - Similar to SQL injection - New attack surface for AI apps - Critical for production systems
Examples
A malicious PDF instructing an AI assistant to email sensitive data to attackers.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.