Name: Experiment
Author: ResearAI

SkillsPool

搜索技能.../

Experiment | Skills Pool

bash_exec(...)

if command paths, outputs, or basic metrics are still unverified, execute one bounded smoke test or pilot first
keep smoke/pilot budget at 0-2 for the current experiment pass
treat smoke work as a 0-2 budget rather than as a mandatory separate phase
allow a second smoke/pilot only after a real code, command, environment, or evaluator change
once the path is verified, launch the real run with bash_exec(mode='detach', ...) and normally leave timeout_seconds unset for that long run
monitor through durable logs rather than only live terminal output
bash_exec(mode='read', id=...) returns the full rendered log when it is 2000 lines or fewer; for longer logs it returns the first 500 lines plus the last 1500 lines and a hint to inspect omitted sections with start and tail
if the middle of a long saved log matters, inspect that omitted region with bash_exec(mode='read', id=..., start=..., tail=...)
use bash_exec(mode='list') and bash_exec(mode='read', id=..., tail_limit=..., order='desc') to monitor or revisit managed commands while focusing on the newest evidence first
after the first read, prefer bash_exec(mode='read', id=..., after_seq=last_seen_seq, tail_limit=..., order='asc') so later checks only fetch new evidence
if you need to recover ids or sanity-check the active session ordering, use bash_exec(mode='history')
launch important runs with a structured comment such as {stage, goal, action, expected_signal, next_check}
use silent_seconds, progress_age_seconds, signal_age_seconds, and watchdog_overdue from bash_exec(mode='list'|'read', ...) as your default watchdog signals
use an explicit wait-and-check loop such as:
- wait about 60s, then inspect logs
- wait about 120s, then inspect logs
- wait about 300s, then inspect logs
- wait about 600s, then inspect logs
- wait about 1800s, then inspect logs
- then keep checking about every 1800s while the run is still active
if needed, use an explicit bounded wait such as bash_exec(command='sleep 60', mode='await', timeout_seconds=70) or bash_exec(mode='await', id=..., timeout_seconds=...) between checks
canonical sleep choice:
- if you only need wall-clock waiting between checks, use bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)
- keep a real buffer on that sleep timeout; do not set timeout_seconds exactly equal to N
- if you are waiting on an already running managed session, prefer bash_exec(mode='await', id=..., timeout_seconds=...) instead of starting a new sleep command
after every completed sleep / await cycle, inspect logs first; only send artifact.interact(kind='progress', ...) when the user-visible state, frontier, blocker status, or ETA materially changed
after the first meaningful signal and then at real checkpoints such as completion, recovery, blocker, or a materially widened comparable surface, keep those progress updates going rather than waiting silently
if the run is clearly invalid, wedged, or superseded, stop it with bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...); if it must die immediately, add force=true, record the reason, fix the issue, and relaunch cleanly
do not report completion until logs and output files both confirm completion
when you control the run code, prefer a throttled tqdm progress reporter and concise structured progress markers when feasible

Experiment

Interaction discipline

Planning surfaces

Tool discipline

Experiment

Interaction discipline

Planning surfaces

Tool discipline

Stage purpose

Quick workflow

Execution note

Non-negotiable rules

Experiment mental guardrails

Use when

Do not use when

Preconditions and gate

Required plan and checklist

Working-boundary rules

Resource note

Resource and environment rules

Truth sources

Required durable outputs

Workflow

1. Define the run contract

2. Run a preflight check

2.1 Diagnostic mode trigger

3. Confirm the execution workspace

4. Implement the minimum required change

5. Execute the run

5.1 Long-running command protocol

5.2 Progress marker protocol

6. Validate the outputs

7. Record the run

8. Decide the next move

Run-quality rules

Acceptance gate

Memory rules

Memory note

Artifact rules

Failure and blocked handling

Exit criteria

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns