Implement defense-in-depth HTML sanitization for rich text while preserving safe formatting. Use when a CMS/editor must block XSS vectors (script tags, event handlers, dangerous URLs, obfuscation) and pass security/performance tests.
<script>, on*, javascript:/data:/vbscript:, SVG/embed vectors), andp, b, i, links, lists, headings, blockquote, code/pre).Read full test expectations before coding.
Fetch tests in small chunks if terminal output truncates (observed repeatedly across runs).
Example: nl -ba test_xss.py | sed -n '100,160p'.
Extract explicit security and preservation contracts. Build a checklist from assertions:
alert in tests).Implement defense-in-depth sanitizer (not regex-only).
script, svg, iframe, object, embed, etc.).on* attributes and style.javascript:, data:, vbscript: (including obfuscated whitespace/control-char forms).Preserve only required formatting surface.
Allow minimal tags and attributes needed by tests/product requirements (e.g., a[href,title], headings, lists, blockquote, code/pre).
Run full suite and iterate fast.
python3 test_xss.py or python3 -m unittest -q test_xss.py).pytest /tests/test_outputs.py -rA).<script> lowercase only, single onclick rule) is bypass-prone and fails obfuscation cases.alert; strip dangerous blocks/content, not only tag wrappers.java\nscript:-style bypasses.Import and interface sanity
sanitize_html.Security regression checks (from observed assertions)
onclick, onerror, onload, onmouseover, multiple handlers).javascript:, data:, vbscript: in href/src/action).svg, iframe, object, embed, meta, form-related injection).Formatting preservation checks
b/i, links with safe https, lists, headings, blockquote, code/pre.Performance check
Cleaner, tags/attributes/protocols) if using library-based sanitizationhtml.parser docs if implementing custom parser-based sanitizer