Prompt Injection Scanner

Detect manipulation attempts in AI prompts using heuristic analysis

Test with example injections

Detected patterns:

What this detects:

Instruction overrides - attempts to ignore/forget previous instructions
Role manipulation - "you are now", "act as", "pretend to be"
Authority claims - fake developer/admin/creator messages
Jailbreak markers - DAN, bypass safety, uncensored mode
Delimiter attacks - [INST], <|im_start|>, ### boundaries
Output control - forcing specific response formats
Encoding evasion - base64, unicode tricks, leetspeak
Prompt extraction - attempts to reveal system prompts
Context manipulation - fictional scenarios, hypotheticals

Detection patterns adapted from rebuff + custom rules by feralghost