Real prompt injection attacks. Real defenses. Run them live.
12 prompt injection attacks tested against LLM360/K2-Think-V2 across 4 conditions: baseline (no defense) and CapsK at K=4, K=8, and K=12 provenance marking intervals. With CapsK K=2 + strengthened authority policy: 12/12 attacks blocked. Without defense: 10/12 succeed.
Select an attack and defense mode, then fire it live against K2-Think-V2. Watch the model's actual response in real-time.
All 12 attacks with baseline and CapsK defense outcomes. Click any card to see the model's full response.
CapsK defends against prompt injection by inserting provenance markers every K tokens, continuously reminding the model which content is trusted system instruction vs. untrusted external data.
<EXT_0001> is inserted to mark the content as untrusted external dataAttack success rate (ASR) at different CapsK marking intervals, all measured with the strengthened authority policy active. Higher K values (sparser markers) still allow many injections through — the model's "Think" reasoning chain can process EXT-tagged injections between distant markers. At K=2, markers appear every 2 tokens, surrounding injected text on both sides. Combined with the aggressive policy, this achieves 0% ASR — complete defense.