Context

This skill captures the technique of using an LLM agent (Nelson) to play-test a text adventure game interactively. Unlike traditional automated testing (scripted assertions, unit tests), the LLM brings human-like creativity — improvising natural language phrases, trying things a real player would try, and recognizing when responses "feel wrong" even without explicit assertions.

This is a zero-cost alternative to human play testing that catches a different class of bugs than unit tests: parser coverage gaps, unnatural error messages, missing synonyms, interaction chain regressions, and UX friction.

⚠️ CRITICAL: Always Use `--headless` for Automated Testing

The game has a TUI (split-screen terminal UI) that uses ANSI escape codes for cursor positioning, screen clearing, and scroll regions. When an LLM agent reads game output through an interactive terminal session, the TUI rendering overwrites existing content, making it LOOK like the game hung — even though the engine responded correctly. This caused 5 false-positive hang reports (BUG-105/106/116/117/118) and a wasted engineering sprint.

Solution: Always launch the game with for automated testing:

Context

⚠️ CRITICAL: Always Use `--headless` for Automated Testing

Solution: Always launch the game with for automated testing:

Category	Examples	What It Catches
Polite phrasing	"please open the drawer", "could you look around?"	Politeness stripping gaps
Questions	"what's in here?", "is the door locked?"	Question transform coverage
Adverbs	"carefully examine", "quickly take"	Adverb poisoning the parser
Synonyms	"check", "inspect", "hunt for", "rummage"	Missing verb synonyms
Articles	"find the matchbox", "take a match"	Article stripping in targets
Compound	"find a match and light it", "search for matches then light the candle"	Compound command splitting
Natural speech	"I want to look around", "let me open that"	Preamble stripping
Confused player	"help", "what do I do", "where am I"	Error message quality
Mischievous player	"eat the door", "throw the room", nonsense input	Graceful failure handling

Severity	Criteria
CRITICAL	Game hangs (requires force-quit), data loss, blocks progression
HIGH	Feature doesn't work, common player phrase fails
MEDIUM	Minor feature gap, uncommon phrase fails, cosmetic issue
LOW	Grammar, polish, extremely rare edge case

Llm Play Testing

Context

⚠️ CRITICAL: Always Use `--headless` for Automated Testing

Llm Play Testing

Context

⚠️ CRITICAL: Always Use `--headless` for Automated Testing

Patterns

Pattern 1: Headless Pipe-Based Testing (Recommended)

Pattern 1b: Interactive Game Session (Legacy — Not Recommended)

Pattern 2: Creative Phrase Generation

Pattern 3: Structured Test Pass Report

Pattern 4: Streaming Output

Pattern 5: Bug Classification

Pattern 6: Bug-to-Unit-Test Pipeline

Anti-Patterns

When to Use

Metrics

Prose

Golang Patterns

Audiocraft Audio Generation

Pokemon Player

Ideation

Storybook Upgrade

Llm Play Testing

Context

⚠️ CRITICAL: Always Use --headless for Automated Testing

Llm Play Testing

Context

⚠️ CRITICAL: Always Use --headless for Automated Testing

Patterns

Pattern 1: Headless Pipe-Based Testing (Recommended)

Pattern 1b: Interactive Game Session (Legacy — Not Recommended)

Pattern 2: Creative Phrase Generation

Pattern 3: Structured Test Pass Report

Pattern 4: Streaming Output

Pattern 5: Bug Classification

Pattern 6: Bug-to-Unit-Test Pipeline

Anti-Patterns

When to Use

Metrics

Prose

Golang Patterns

Audiocraft Audio Generation

Pokemon Player

Ideation

Storybook Upgrade

⚠️ CRITICAL: Always Use `--headless` for Automated Testing

⚠️ CRITICAL: Always Use `--headless` for Automated Testing