Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.
[Case 1]
[Case 2]
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md