Prompt Injection Scanner
Detect manipulation attempts in AI prompts using heuristic analysis
Test with example injections
What this detects:
- Instruction overrides - attempts to ignore/forget previous instructions
- Role manipulation - "you are now", "act as", "pretend to be"
- Authority claims - fake developer/admin/creator messages
- Jailbreak markers - DAN, bypass safety, uncensored mode
- Delimiter attacks - [INST], <|im_start|>, ### boundaries
- Output control - forcing specific response formats
- Encoding evasion - base64, unicode tricks, leetspeak
- Prompt extraction - attempts to reveal system prompts
- Context manipulation - fictional scenarios, hypotheticals
Detection patterns adapted from
rebuff + custom rules by feralghost