Skill File

Photo Search Skill

Name: Photo Search Skill
Author: mnewkirk

Expert knowledge for developing and operating the local-photo-search service — a self-hosted photo library with CLIP semantic search, face recognition, LLaVA scene descriptions, aesthetic quality scoring, Google Photos upload, and shoot review/culling, deployed via Docker on a UGREEN NAS. Use this skill whenever the user asks about: adding features to photo-search, running indexing jobs, troubleshooting the NAS deployment, understanding the codebase architecture, modifying the CLI or web API, updating the status page, managing face references, finding duplicate photos, working with collections, stacking burst photos, uploading to Google Photos, or any other task related to this project. Trigger even for vague requests like "how do I index 2024?" or "why is search slow?" or "I want to add a new filter" or "upload these to Google" — all of these relate to this service.

mnewkirk0 starsApr 18, 2026

Occupation
Categories: Knowledge Base

Skill Content

This service is a fully local, Docker-deployed photo search system. Photos are never sent to any external API. It runs on a UGREEN NAS (Intel N100 CPU, no GPU) and is developed on a Mac.

Tech Stack

Layer	Technology
Language	Python 3.11
Web framework	FastAPI + Uvicorn
Database	SQLite (WAL mode) with sqlite-vec for vector search
Semantic search	CLIP (ViT-B/16, 512-dim) via open-clip-torch — CPU only
Face recognition	InsightFace (buffalo_l) — ArcFace embeddings
Scene descriptions	LLaVA via Ollama sidecar container
Aesthetic quality	CLIP ViT-L/14 + linear scorer (sac+logos+ava1)
Frontend	Plain React (UMD, no build step) in `frontend/dist/`
Containerization	Docker Compose (multi-stage build)
Google Photos

Related Skills

Photo Search Skill | Skills Pool

Skill File

Photo Search Skill

mnewkirk0 starsApr 18, 2026

Occupation
Categories: Knowledge Base

Skill Content

This service is a fully local, Docker-deployed photo search system. Photos are never sent to any external API. It runs on a UGREEN NAS (Intel N100 CPU, no GPU) and is developed on a Mac.

Tech Stack

Layer	Technology
Language	Python 3.11
Web framework	FastAPI + Uvicorn
Database	SQLite (WAL mode) with sqlite-vec for vector search
Semantic search	CLIP (ViT-B/16, 512-dim) via open-clip-torch — CPU only
Face recognition	InsightFace (buffalo_l) — ArcFace embeddings
Scene descriptions	LLaVA via Ollama sidecar container
Aesthetic quality	CLIP ViT-L/14 + linear scorer (sac+logos+ava1)
Frontend	Plain React (UMD, no build step) in `frontend/dist/`
Containerization	Docker Compose (multi-stage build)
Google Photos

Related Skills

local-photo-search/
├── cli.py                  # All CLI commands (Click)
├── Dockerfile              # Multi-stage: builder + runtime
├── docker-compose.yml      # Local dev
├── docker-compose.nas.yml  # NAS production
├── run-workers.sh          # Docker worker fleet launcher (Mac)
├── requirements.txt        # Python deps (CPU torch pinned)
├── references.yml          # Face reference config (person → photos)
├── GOOGLE_PHOTOS_SETUP.md  # Step-by-step Google Photos OAuth setup
├── photosearch/
│   ├── db.py               # PhotoDB class, schema v12, all queries
│   ├── index.py            # index_directory() + _index_collection() — indexing pipeline
│   ├── search.py           # search_combined() + all search types
│   ├── clip_embed.py       # CLIP text/image embedding (streaming)
│   ├── faces.py            # InsightFace detection, encoding, matching
│   ├── quality.py          # Aesthetic scoring + concept analysis (streaming)
│   ├── describe.py         # LLaVA scene descriptions via Ollama
│   ├── stacking.py         # Burst/bracket stack detection (union-find)
│   ├── verify.py           # Hallucination detection (CLIP + cross-model LLM)
│   ├── google_photos.py    # Google Photos OAuth2, upload, album management
│   ├── web.py              # FastAPI app, 50+ /api/* endpoints
│   ├── worker.py           # Remote worker client — claims batches from NAS, processes locally
│   ├── worker_api.py       # Worker API endpoints (/api/worker/*) for distributed indexing
│   ├── exif.py             # EXIF extraction
│   ├── colors.py           # Dominant color extraction
│   ├── geocode.py          # Offline reverse geocoding
│   ├── date_parse.py       # Natural language date parsing from queries
│   └── cull.py             # Shoot review / culling logic
└── frontend/dist/          # Static HTML/JS served by FastAPI
    ├── index.html          # Main search UI
    ├── faces.html          # Face browser
    ├── collections.html    # Collections UI + Google Photos upload modal
    ├── review.html         # Shoot review / culling UI
    ├── status.html         # Indexing status + run commands
    └── shared.js           # Shared components: PS.SharedHeader, PS.PhotoModal,
                            #   PS.GooglePhotosButton, PS.formatFocalLength, etc.

Table	Purpose
photos	Main photo records — path, date_taken, EXIF, description, tags, scores
faces	Detected faces per photo (bbox, encoding, quality)
persons	Named persons for face matching
face_references	Reference photos/encodings for each person
collections	Named photo collections/albums
collection_photos	Junction table with sort_order for manual ordering
photo_stacks	Burst/bracket groups detected by time + visual similarity
stack_members	Photos belonging to each stack
review_selections	Culling selections per folder
google_photos_uploads	Upload ledger (album_id, filepath, media_item_id)
ignored_clusters	Face clusters marked to ignore
schema_info	Schema version + photo_root path

# The established pattern in quality.py / clip_embed.py:
def score_photos_stream(image_paths, batch_size=8):
    for batch_start in range(0, total, batch_size):
        # ... process batch ...
        for idx, result in enumerate(batch_results):
            yield batch_start + idx, result

# Rebuild after code changes
docker compose -f docker-compose.nas.yml build photosearch

# Restart web server
docker compose -f docker-compose.nas.yml up -d photosearch

# Run a CLI command
docker compose -f docker-compose.nas.yml run --rm photosearch <command>

# Background indexing job
nohup docker compose -f docker-compose.nas.yml run --rm \
  -e PYTHONUNBUFFERED=1 photosearch index /photos/YEAR --clip --no-colors \
  > /tmp/clip_YEAR.log 2>&1 &

# Git pull (Alpine workaround for UGOS ownership issue)
docker run --rm -v /volume1/docker/photosearch:/repo alpine sh -c \
  "apk add -q git && git config --global --add safe.directory /repo && git -C /repo pull"

/data/                      # Persistent Docker volume
  photo_index.db            # SQLite database
  thumbnails/               # Generated JPEG thumbnails
  google_photos_token.json  # Google OAuth token
  .insightface/             # InsightFace model cache (~300MB)
  .cache/photosearch/       # CLIP + aesthetic model cache
/photos/                    # Photo library (read-only mount)
  2026/2026-01-15/DSC*.JPG  # Year/date-named folders
/references/                # Face reference photos
  references.yml            # Person → photo mapping

DC="docker compose -f docker-compose.nas.yml run --rm"
NOHUP="nohup docker compose -f docker-compose.nas.yml run --rm -e PYTHONUNBUFFERED=1"

$NOHUP photosearch index /photos/YEAR --clip --no-colors > /tmp/clip_YEAR.log 2>&1 &
$NOHUP photosearch index /photos/YEAR --faces --no-colors > /tmp/faces_YEAR.log 2>&1 &
$NOHUP photosearch index /photos/YEAR --quality --no-colors > /tmp/quality_YEAR.log 2>&1 &

$DC -v /home/cantimatt/docker/photosearch/references:/references:ro \
  photosearch add-persons --config /references/references.yml
$DC photosearch match-faces --temporal
$DC photosearch recluster-faces          # group remaining unknowns via DBSCAN
$DC photosearch stack --directory /photos/YEAR

# Directory mode — scan and index new photos
photosearch index <photo_dir> [OPTIONS]

# Collection mode — re-index existing photos by collection ID
photosearch index --collection <ID> [OPTIONS]

Common options:
  --clip / --force-clip        CLIP semantic embeddings (ViT-B/16, 512-dim)
  --faces / --force-faces      Face detection + ArcFace encoding (InsightFace buffalo_l)
  --quality / --force-quality  Aesthetic scoring (ViT-L/14 + MLP) + concept analysis
  --describe / --force-describe  LLaVA scene descriptions (requires Ollama)
  --describe-model MODEL       Ollama model name (default: llava; alt: moondream)
  --tags / --force-tags        Semantic tags from fixed ~60-tag vocabulary (requires Ollama)
  --no-colors                  Disable dominant color extraction (on by default)
  --full                       Enable all: --clip --faces --describe --quality --tags
  --verify                     Run hallucination verification after other passes
  --batch-size N               Batch size for CLIP/quality (default: 8)
  --db PATH                    Database path (default: PHOTOSEARCH_DB env var)

Collection-only options:
  --collection ID              Re-index photos in this collection (replaces PHOTO_DIR)
  --expand-stacks              Also include other photos from the same stacks

Pass	Flag	Model	Speed on N100	Dependencies
EXIF + hash	(always)	—	Fast, ~1000/min	None
CLIP embeddings	`--clip`	ViT-B/16 (512-dim, ~330 MB)	~1000 photos/hr	None
Dominant colors	(default on)	ColorThief	Fast, ~1000/min	None
Face detection	`--faces`	InsightFace buffalo_l (~300 MB)	~0.5-2s/photo	None
Quality scoring	`--quality`	ViT-L/14 (768-dim) + MLP	~1000 photos/hr	None
Concept analysis	(auto with quality)	Same ViT-L/14	Runs after scoring	Quality pass
Descriptions	`--describe`	LLaVA 7B via Ollama	30-200s/photo	Ollama running
Tags	`--tags`	Same as describe model	30-200s/photo	Ollama running
Critique	(auto)	Same as describe model	30-200s/photo	Quality + describe
Stacking	(auto after CLIP)	—	Fast (DB only)	CLIP embeddings
Geocoding	(auto)	GeoNames (offline)	Fast	GPS data in EXIF
Verification	`--verify`	minicpm-v + llava	Slow (2 LLM passes)	Descriptions exist

# Re-describe collection 2 with a different model
photosearch index --collection 2 --describe --force-describe --describe-model moondream

# Re-score quality for collection 2, including burst stack members
photosearch index --collection 2 --quality --force-quality --expand-stacks

# Match faces only within a collection
photosearch match-faces --collection 2 --temporal

# Detect stacks only among a collection's photos
photosearch stack --collection 2

Parameter	Default	Purpose
`time_window_sec`	5.0	Max seconds between consecutive burst shots
`clip_threshold`	0.05	Max CLIP cosine distance (very tight = same scene)
`max_stack_span_sec`	10.0	Hard cap on total stack duration

add-persons --config <yaml>                 # Batch register from YAML
match-faces [--tolerance 1.15] [--temporal] # Match faces to persons
  --collection ID                           # Scope to collection photos only
  --expand-stacks                           # Include stack members with --collection
  --temporal-tolerance 1.45                 # Looser threshold for temporal
  --temporal-window 30                      # Session context window (minutes)
recluster-faces [--eps 0.55] [--min-samples 3] [--dry-run]
                                            # Global DBSCAN over all person_id IS NULL
                                            # encodings. Renumbers every unknown cluster
                                            # and clears ignored_clusters atomically
list-persons                                # Show persons and counts
face-clusters                               # Show unidentified clusters
correct-face <filename> <face_num> <name>   # Manual correction
clear-matches <dir> [--person] [--all-faces]
export-face-assignments / import-face-assignments

# Launch 4 workers for CLIP embeddings:
./run-workers.sh -s http://<NAS-IP>:8000 -p clip -d /photos/2026 -n 4

# Multiple passes:
./run-workers.sh -s http://<NAS-IP>:8000 -p clip,quality,faces -d /photos/2026

# Fewer workers for heavier passes:
./run-workers.sh -s http://<NAS-IP>:8000 -p describe --collection 3 -n 2

# More memory per worker if needed:
./run-workers.sh -s http://<NAS-IP>:8000 -p quality -d /photos/2026 -m 4g -n 3

# Monitor:
./run-workers.sh --status    # containers + memory usage + recent progress
./run-workers.sh --logs      # tail all worker logs live (Ctrl-C to stop)
./run-workers.sh --stop      # stop all workers

# Run CLIP embeddings for a directory:
python cli.py worker -s http://nas.local:8000 -p clip -d /photos/2026/2026-04-09

# Run full pipeline on a directory:
python cli.py worker -s http://nas.local:8000 -p clip,faces,quality,describe,tags,verify -d /photos/2026

# Run CLIP + quality for a collection:
python cli.py worker -s http://nas.local:8000 -p clip,quality --collection 3

# Run descriptions with moondream model:
python cli.py worker -s http://nas.local:8000 -p describe --describe-model moondream

# Quick test — one batch only:
python cli.py worker -s http://localhost:8000 -p clip -d /photos/2026/2026-04-09 --one-shot

# Force re-process (clears existing data first):
python cli.py worker -s http://nas.local:8000 -p clip --collection 3 --force

Endpoint	Purpose
`POST /api/worker/claim-batch`	Claim N unprocessed photos for a pass type
`POST /api/worker/submit-results`	Submit processing results for a claimed batch
`POST /api/worker/clear-pass`	Clear processing state to allow re-processing
`GET /api/worker/photo-detail/{id}`	Get photo metadata + CLIP embedding (for verify)
`GET /api/worker/status`	Queue depth and active claims

Table	Purpose
`worker_claims`	Active batch claims (batch_id, worker_id, photo_ids, expires_at)
`worker_processed`	Per-photo processing ledger (photo_id, pass_type) for faces/describe/tags

Photo Search Skill

Tech Stack

Photo Search Skill

Tech Stack

Project Layout

Database Schema (v12)

Key Patterns

1. Streaming generators for long-running batch jobs

2. PHOTOSEARCH_DB env var for --db defaults

3. Filename auto-detection in search

4. File exclusions in find_photos()

5. Frontend — no build step

6. Docker cache awareness

7. SSE for long-running API operations

8. Per-file ledger writes for cancel safety

API Endpoints (50+)

Search & Photos

Faces

Collections

Stacks (Burst/Bracket Groups)

Review (Culling)

Google Photos

Utility

Google Photos Integration

Scope & Limitations

Upload Flow

Album ID Note

Stacking System

Shoot Review / Culling

Shared Frontend Components (shared.js)

NAS Operations

Docker commands

Volume layout

Full indexing sequence for a new year

Indexing Types — Detailed Reference

Pass details

Parallelization

Collection-based re-indexing

Key model details

Stacking parameters

Face management commands (post-indexing)

Frontend stack filtering behavior

Distributed Indexing (Worker System)

Architecture

Docker Worker Fleet (Recommended for Mac)

Ollama for describe/tags/verify passes

Bare-Metal Worker CLI (Single Quick Tests Only)

Worker API Endpoints

Worker Loop

Key Files

DB Tables (Worker)

Troubleshooting

Notion

Feishu Wiki

Gemini

Obsidian Vault Maintainer

Openclaw Pr Maintainer

Wiki Maintainer