VLM evaluation suite for video anomaly detection in smart home camera footage
Benchmark suite evaluating VLM models on video anomaly detection across 7 smart home categories — based on the SmartHome-Bench dataset (1,203 YouTube smart home camera clips).
Key differentiator from HomeSec-Bench: This benchmark requires video understanding (multi-frame sampling), not single-frame analysis.
Requires npm install + system dependencies. The deployment agent should:
yt-dlp and ffmpeg in PATHnpm install in the skill directoryEntry script: scripts/run-benchmark.cjs
node scripts/run-benchmark.cjs --help
When spawned by Aegis, configuration is injected via environment variables. The benchmark downloads video clips, samples frames, evaluates with VLM, and generates an HTML report.
# Run with local VLM (subset mode, 50 videos)
node scripts/run-benchmark.cjs --vlm http://localhost:5405
# Quick test with 10 videos
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --max-videos 10
# Full benchmark (all curated clips)
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --mode full
# Filter by category
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --categories "Wildlife,Security"
# Skip download (re-evaluate cached videos)
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --skip-download
# Skip report auto-open
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --no-open
| Variable | Default | Description |
|---|---|---|
AEGIS_VLM_URL | (required) | VLM server base URL |
AEGIS_VLM_MODEL | — | Loaded VLM model ID |
AEGIS_SKILL_ID | — | Skill identifier (enables skill mode) |
AEGIS_SKILL_PARAMS | {} | JSON params from skill config |
Note: This is a VLM-only benchmark. An LLM gateway is not required.
This skill includes a config.yaml that defines user-configurable parameters. Aegis parses this at install time and renders a config panel in the UI. Values are delivered via AEGIS_SKILL_PARAMS.
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | select | subset | Which clips to evaluate: subset (~50 clips) or full (all ~105 curated clips) |
maxVideos | number | 50 | Maximum number of videos to evaluate |
categories | text | all | Comma-separated category filter (e.g. Wildlife,Security) |
noOpen | boolean | false | Skip auto-opening the HTML report in browser |
| Argument | Default | Description |
|---|---|---|
--vlm URL | (required) | VLM server base URL |
--out DIR | ~/.aegis-ai/smarthome-bench | Results directory |
--max-videos N | 50 | Max videos to evaluate |
--mode MODE | subset | subset or full |
--categories LIST | all | Comma-separated category filter |
--skip-download | — | Skip video download, use cached |
--no-open | — | Don't auto-open report in browser |
--report | (auto in skill mode) | Force report generation |
AEGIS_VLM_URL=http://localhost:5405
AEGIS_SKILL_ID=smarthome-bench
AEGIS_SKILL_PARAMS={}
{"event": "ready", "model": "SmolVLM2-2.2B", "system": "Apple M3"}
{"event": "suite_start", "suite": "Wildlife"}
{"event": "test_result", "suite": "Wildlife", "test": "smartbench_0003", "status": "pass", "timeMs": 4500}
{"event": "suite_end", "suite": "Wildlife", "passed": 12, "failed": 3}
{"event": "complete", "passed": 78, "total": 105, "timeMs": 480000, "reportPath": "/path/to/report.html"}
Human-readable output goes to stderr (visible in Aegis console tab).
| Suite | Description | Anomaly Examples |
|---|---|---|
| 🦊 Wildlife | Wild animals near home cameras | Bear on porch, deer in garden, coyote at night |
| 👴 Senior Care | Elderly activity monitoring | Falls, wandering, unusual inactivity |
| 👶 Baby Monitoring | Infant/child safety | Stroller rolling, child climbing, unsupervised |
| 🐾 Pet Monitoring | Pet behavior detection | Pet illness, escaped pets, unusual behavior |
| 🔒 Home Security | Intrusion & suspicious activity | Break-ins, trespassing, porch pirates |
| 📦 Package Delivery | Package arrival & theft | Stolen packages, misdelivered, weather damage |
| 🏠 General Activity | General smart home events | Unusual hours activity, appliance issues |
Each clip is evaluated for binary anomaly detection: the VLM predicts normal (0) or abnormal (1), compared against expert annotations.
Per-category and overall:
Results are saved to ~/.aegis-ai/smarthome-bench/ as JSON. An HTML report with per-category breakdown, confusion matrix, and model comparison is auto-generated.
npm install (for openai SDK dependency)yt-dlp (video download from YouTube)ffmpeg (frame extraction from video clips)