Expert knowledge for developing and operating the local-photo-search service — a self-hosted photo library with CLIP semantic search, face recognition, LLaVA scene descriptions, aesthetic quality scoring, Google Photos upload, and shoot review/culling, deployed via Docker on a UGREEN NAS. Use this skill whenever the user asks about: adding features to photo-search, running indexing jobs, troubleshooting the NAS deployment, understanding the codebase architecture, modifying the CLI or web API, updating the status page, managing face references, finding duplicate photos, working with collections, stacking burst photos, uploading to Google Photos, or any other task related to this project. Trigger even for vague requests like "how do I index 2024?" or "why is search slow?" or "I want to add a new filter" or "upload these to Google" — all of these relate to this service.
This service is a fully local, Docker-deployed photo search system. Photos are never sent to any external API. It runs on a UGREEN NAS (Intel N100 CPU, no GPU) and is developed on a Mac.
| Layer | Technology |
|---|---|
| Language | Python 3.11 |
| Web framework | FastAPI + Uvicorn |
| Database | SQLite (WAL mode) with sqlite-vec for vector search |
| Semantic search | CLIP (ViT-B/16, 512-dim) via open-clip-torch — CPU only |
| Face recognition | InsightFace (buffalo_l) — ArcFace embeddings |
| Scene descriptions | LLaVA via Ollama sidecar container |
| Aesthetic quality | CLIP ViT-L/14 + linear scorer (sac+logos+ava1) |
| Frontend | Plain React (UMD, no build step) in frontend/dist/ |
| Containerization | Docker Compose (multi-stage build) |
| Google Photos |
| OAuth2 with photoslibrary.appendonly scope |
local-photo-search/
├── cli.py # All CLI commands (Click)
├── Dockerfile # Multi-stage: builder + runtime
├── docker-compose.yml # Local dev
├── docker-compose.nas.yml # NAS production
├── run-workers.sh # Docker worker fleet launcher (Mac)
├── requirements.txt # Python deps (CPU torch pinned)
├── references.yml # Face reference config (person → photos)
├── GOOGLE_PHOTOS_SETUP.md # Step-by-step Google Photos OAuth setup
├── photosearch/
│ ├── db.py # PhotoDB class, schema v12, all queries
│ ├── index.py # index_directory() + _index_collection() — indexing pipeline
│ ├── search.py # search_combined() + all search types
│ ├── clip_embed.py # CLIP text/image embedding (streaming)
│ ├── faces.py # InsightFace detection, encoding, matching
│ ├── quality.py # Aesthetic scoring + concept analysis (streaming)
│ ├── describe.py # LLaVA scene descriptions via Ollama
│ ├── stacking.py # Burst/bracket stack detection (union-find)
│ ├── verify.py # Hallucination detection (CLIP + cross-model LLM)
│ ├── google_photos.py # Google Photos OAuth2, upload, album management
│ ├── web.py # FastAPI app, 50+ /api/* endpoints
│ ├── worker.py # Remote worker client — claims batches from NAS, processes locally
│ ├── worker_api.py # Worker API endpoints (/api/worker/*) for distributed indexing
│ ├── exif.py # EXIF extraction
│ ├── colors.py # Dominant color extraction
│ ├── geocode.py # Offline reverse geocoding
│ ├── date_parse.py # Natural language date parsing from queries
│ └── cull.py # Shoot review / culling logic
└── frontend/dist/ # Static HTML/JS served by FastAPI
├── index.html # Main search UI
├── faces.html # Face browser
├── collections.html # Collections UI + Google Photos upload modal
├── review.html # Shoot review / culling UI
├── status.html # Indexing status + run commands
└── shared.js # Shared components: PS.SharedHeader, PS.PhotoModal,
# PS.GooglePhotosButton, PS.formatFocalLength, etc.
The database file is photo_index.db (not photos.db). Key tables:
| Table | Purpose |
|---|---|
| photos | Main photo records — path, date_taken, EXIF, description, tags, scores |
| faces | Detected faces per photo (bbox, encoding, quality) |
| persons | Named persons for face matching |
| face_references | Reference photos/encodings for each person |
| collections | Named photo collections/albums |
| collection_photos | Junction table with sort_order for manual ordering |
| photo_stacks | Burst/bracket groups detected by time + visual similarity |
| stack_members | Photos belonging to each stack |
| review_selections | Culling selections per folder |
| google_photos_uploads | Upload ledger (album_id, filepath, media_item_id) |
| ignored_clusters | Face clusters marked to ignore |
| schema_info | Schema version + photo_root path |
Important columns on photos: date_taken (TEXT, "YYYY-MM-DD HH:MM:SS", indexed),
aesthetic_score (REAL, 1-10), description (TEXT, LLaVA-generated), tags (JSON array),
place_name (TEXT, reverse-geocoded), dominant_colors (JSON array of hex values).
Schema migrations run automatically on DB open via _init_schema() with version checks.
Bump SCHEMA_VERSION when adding tables or columns, and add migration SQL in the
appropriate position (after any table it depends on).
These patterns were established through iteration — always follow them.
Any function that processes thousands of photos must yield results incrementally. Progress is saved to the DB per-batch (survives cancellation) and printed to the log.
# The established pattern in quality.py / clip_embed.py:
def score_photos_stream(image_paths, batch_size=8):
for batch_start in range(0, total, batch_size):
# ... process batch ...
for idx, result in enumerate(batch_results):
yield batch_start + idx, result
Every CLI command's --db option must include envvar="PHOTOSEARCH_DB".
Queries that look like camera filenames (e.g., DSC06241) are detected by
_looks_like_filename() in search.py and routed to SQL LIKE instead of CLIP.
index.py:find_photos() skips macOS AppleDouble sidecars (._*) and numbered
copies (DSC_1.JPG when DSC.JPG exists).
frontend/dist/ contains plain HTML with React loaded from CDN (UMD build). Edit HTML
files directly. Use vanilla JS (React.createElement) not JSX. Changes require a Docker
image rebuild since files are baked in via COPY frontend/ frontend/.
requirements.txt changes invalidate the pip install layer (~5 min on N100). Python file
changes only invalidate COPY (~10s). CPU-only PyTorch is pinned via
--extra-index-url https://download.pytorch.org/whl/cpu at the top of requirements.txt.
Google Photos upload uses Server-Sent Events (SSE) via StreamingResponse with
text/event-stream. Cross-thread communication uses asyncio.Queue + threading.Event
for cancellation. Key gotcha: use asyncio.get_running_loop() (not get_event_loop())
in async endpoints, and always include terminal events (done, fatal, cancelled)
in the generate() function to close the stream.
Operations that can be cancelled mid-way (like uploads) write per-file results to the DB immediately rather than batching at the end. This ensures partial progress survives cancellation.
GET /api/search — Combined search (CLIP semantic, color, face, place, date, filename)
GET /api/photos/{id} — Photo detailGET /api/photos/{id}/thumbnail — Cached thumbnailGET /api/photos/{id}/full — Full resolutionGET /api/photos/{id}/preview — Preview sizeGET /api/faces/groups — All face groupingsGET /api/faces/group/{type}/{id}/photos — Photos for a person or clusterGET /api/faces/crop/{face_id} — Face crop imagePOST /api/faces/{face_id}/assign — Assign face to personPOST /api/faces/{face_id}/clear — Clear assignmentPOST /api/faces/bulk-collect — Bulk assign unassigned facesPOST /api/faces/ignore / POST /api/faces/unignore — Ignore/restore clustersGET /api/persons — List personsGET /api/collections — List allPOST /api/collections — CreateGET /api/collections/{id} — Detail (includes photos with sort_order)PUT /api/collections/{id} — RenameDELETE /api/collections/{id} — DeletePOST /api/collections/{id}/photos — Add photosPOST /api/collections/{id}/photos/remove — Remove photosGET /api/stacks — List allGET /api/stacks/{id} — Detail with membersPUT /api/stacks/{id}/top — Set top photoDELETE /api/stacks/{id} — Delete stackPOST /api/photos/{id}/unstack — Remove from stackPOST /api/stacks/{id}/add — Add to stackGET /api/photos/{id}/nearby-stacks — Find nearby stacksGET /api/review/folders — Available folders (returns {path, max_date} objects, sorted by most recent photo first)GET /api/review/run — Run culling algorithmGET /api/review/load — Load saved selectionsPOST /api/review/toggle/{id} — Toggle photo selectionGET /api/google/status — OAuth status (configured + authenticated)GET /api/google/authorize — Start OAuth flowPOST /api/google/exchange-code — Manual code exchangeGET /api/google/callback — OAuth callbackDELETE /api/google/disconnect — Revoke + clear tokensPOST /api/google/albums — Create albumPOST /api/google/upload-status — Check which photos are already uploaded to an albumPOST /api/google/upload — Upload with SSE streaming progressGET /api/stats — Database statistics for status pagephotoslibrary.appendonly scope is available (read/sharing deprecated March 2025)batchAddMediaItems returns 400 "invalid media item id" for photos removed from albums
via Google Photos UI — the media_item_id becomes invalid for album operationsgoogle_photos_token.json (not in the database)client_secret.json alongside DBbatchCreate with uploadTokens (batch size 50) → creates mediaItems with album assignmentstart → begin → bytes_sent → progress → doneforce_reupload_ids param targets specific photosThe API album ID (from albums.create) differs from the ID visible in Google Photos URLs.
Always use the API ID stored in the database.
Burst/bracket detection using union-find over time-sorted photos:
Adaptive clustering of CLIP embeddings to select representative photos:
review_selections tablePS.SharedHeader — Consistent nav header across all pages (logo, nav links, active state)PS.PhotoModal — Unified photo detail modal with configurable features:
showFaces, showCollections, showLocation, showSearchScore, showAesthetics.
Includes face editing, collection management, stacking UI, keyboard navigation
(arrow keys), and mobile swipe navigation (swipe left/right on touch devices).
Slots: fetchDetail, headerChildren, footerChildren.PS.GooglePhotosButton — Upload single photo to Google Photos from modal sidebarPS.formatFocalLength() / PS.formatFNumber() — EXIF display helpersAll commands use: docker compose -f docker-compose.nas.yml
# Rebuild after code changes
docker compose -f docker-compose.nas.yml build photosearch
# Restart web server
docker compose -f docker-compose.nas.yml up -d photosearch
# Run a CLI command
docker compose -f docker-compose.nas.yml run --rm photosearch <command>
# Background indexing job
nohup docker compose -f docker-compose.nas.yml run --rm \
-e PYTHONUNBUFFERED=1 photosearch index /photos/YEAR --clip --no-colors \
> /tmp/clip_YEAR.log 2>&1 &
# Git pull (Alpine workaround for UGOS ownership issue)
docker run --rm -v /volume1/docker/photosearch:/repo alpine sh -c \
"apk add -q git && git config --global --add safe.directory /repo && git -C /repo pull"
/data/ # Persistent Docker volume
photo_index.db # SQLite database
thumbnails/ # Generated JPEG thumbnails
google_photos_token.json # Google OAuth token
.insightface/ # InsightFace model cache (~300MB)
.cache/photosearch/ # CLIP + aesthetic model cache
/photos/ # Photo library (read-only mount)
2026/2026-01-15/DSC*.JPG # Year/date-named folders
/references/ # Face reference photos
references.yml # Person → photo mapping
DC="docker compose -f docker-compose.nas.yml run --rm"
NOHUP="nohup docker compose -f docker-compose.nas.yml run --rm -e PYTHONUNBUFFERED=1"
$NOHUP photosearch index /photos/YEAR --clip --no-colors > /tmp/clip_YEAR.log 2>&1 &
$NOHUP photosearch index /photos/YEAR --faces --no-colors > /tmp/faces_YEAR.log 2>&1 &
$NOHUP photosearch index /photos/YEAR --quality --no-colors > /tmp/quality_YEAR.log 2>&1 &
$DC -v /home/cantimatt/docker/photosearch/references:/references:ro \
photosearch add-persons --config /references/references.yml
$DC photosearch match-faces --temporal
$DC photosearch recluster-faces # group remaining unknowns via DBSCAN
$DC photosearch stack --directory /photos/YEAR
New faces land with cluster_id = NULL and are invisible on /faces until
recluster-faces runs — it's the only thing that forms "Unknown #N" groups.
Run it after each face-indexing pass (or batch thereof). Warning: every run
renumbers every unknown cluster_id and clears ignored_clusters, so any
"ignore" decisions on unknown clusters need to be reapplied afterward.
The index command has two modes: directory mode (scan a folder for new photos and
index them) and collection mode (re-index existing photos in a collection). Collection
mode skips EXIF extraction/insertion since photos are already in the DB, making it ideal
for re-running specific passes (e.g. testing a different LLM model for descriptions).
# Directory mode — scan and index new photos
photosearch index <photo_dir> [OPTIONS]
# Collection mode — re-index existing photos by collection ID
photosearch index --collection <ID> [OPTIONS]
Common options:
--clip / --force-clip CLIP semantic embeddings (ViT-B/16, 512-dim)
--faces / --force-faces Face detection + ArcFace encoding (InsightFace buffalo_l)
--quality / --force-quality Aesthetic scoring (ViT-L/14 + MLP) + concept analysis
--describe / --force-describe LLaVA scene descriptions (requires Ollama)
--describe-model MODEL Ollama model name (default: llava; alt: moondream)
--tags / --force-tags Semantic tags from fixed ~60-tag vocabulary (requires Ollama)
--no-colors Disable dominant color extraction (on by default)
--full Enable all: --clip --faces --describe --quality --tags
--verify Run hallucination verification after other passes
--batch-size N Batch size for CLIP/quality (default: 8)
--db PATH Database path (default: PHOTOSEARCH_DB env var)
Collection-only options:
--collection ID Re-index photos in this collection (replaces PHOTO_DIR)
--expand-stacks Also include other photos from the same stacks
| Pass | Flag | Model | Speed on N100 | Dependencies |
|---|---|---|---|---|
| EXIF + hash | (always) | — | Fast, ~1000/min | None |
| CLIP embeddings | --clip | ViT-B/16 (512-dim, ~330 MB) | ~1000 photos/hr | None |
| Dominant colors | (default on) | ColorThief | Fast, ~1000/min | None |
| Face detection | --faces | InsightFace buffalo_l (~300 MB) | ~0.5-2s/photo | None |
| Quality scoring | --quality | ViT-L/14 (768-dim) + MLP | ~1000 photos/hr | None |
| Concept analysis | (auto with quality) | Same ViT-L/14 | Runs after scoring | Quality pass |
| Descriptions | --describe | LLaVA 7B via Ollama | 30-200s/photo | Ollama running |
| Tags | --tags | Same as describe model | 30-200s/photo | Ollama running |
| Critique | (auto) | Same as describe model | 30-200s/photo | Quality + describe |
| Stacking | (auto after CLIP) | — | Fast (DB only) | CLIP embeddings |
| Geocoding | (auto) | GeoNames (offline) | Fast | GPS data in EXIF |
| Verification | --verify | minicpm-v + llava | Slow (2 LLM passes) | Descriptions exist |
These can run as separate nohup processes simultaneously (different models, no conflicts):
--clip, --faces, --qualityThese must run sequentially (share Ollama's single model slot):
--describe, --tags, critiquesAll indexing commands support --collection <ID> to scope work to a specific collection
instead of an entire directory. This is useful for A/B testing models, re-running a pass
on a curated set, or iterating on a small batch without re-processing thousands of photos.
# Re-describe collection 2 with a different model
photosearch index --collection 2 --describe --force-describe --describe-model moondream
# Re-score quality for collection 2, including burst stack members
photosearch index --collection 2 --quality --force-quality --expand-stacks
# Match faces only within a collection
photosearch match-faces --collection 2 --temporal
# Detect stacks only among a collection's photos
photosearch stack --collection 2
Collection mode in index skips EXIF extraction and photo insertion (photos are already
in the DB) and runs all selected passes over the collection's photos. The --expand-stacks
flag expands the photo set to include other members of the same stacks.
Implementation: index_directory() dispatches to _index_collection() when
collection_id is set. The collection path uses PhotoDB.get_collection_photo_pairs()
to get (photo_id, abs_path) tuples. match_faces_to_persons(), match_faces_temporal(),
detect_stacks(), and run_stacking() all accept an optional photo_ids parameter for
scoping. PhotoDB.expand_to_stacks() expands photo IDs to include stack members.
--force-clip when switching models to avoid stale embeddings.INSIGHTFACE_HOME (~300 MB). Images >3500px
long-edge are downsampled before detection. 512-dim ArcFace embeddings.| Parameter | Default | Purpose |
|---|---|---|
time_window_sec | 5.0 | Max seconds between consecutive burst shots |
clip_threshold | 0.05 | Max CLIP cosine distance (very tight = same scene) |
max_stack_span_sec | 10.0 | Hard cap on total stack duration |
The stack CLI command also supports --collection ID and --expand-stacks to scope
detection to a specific collection's photos.
add-persons --config <yaml> # Batch register from YAML
match-faces [--tolerance 1.15] [--temporal] # Match faces to persons
--collection ID # Scope to collection photos only
--expand-stacks # Include stack members with --collection
--temporal-tolerance 1.45 # Looser threshold for temporal
--temporal-window 30 # Session context window (minutes)
recluster-faces [--eps 0.55] [--min-samples 3] [--dry-run]
# Global DBSCAN over all person_id IS NULL
# encodings. Renumbers every unknown cluster
# and clears ignored_clusters atomically
list-persons # Show persons and counts
face-clusters # Show unidentified clusters
correct-face <filename> <face_num> <name> # Manual correction
clear-matches <dir> [--person] [--all-faces]
export-face-assignments / import-face-assignments
Cache-Control: no-cache to prevent stale JS after deploys.The worker system lets you offload heavy indexing passes (CLIP, faces, quality, describe, tags, verify) to a fast laptop/desktop while the NAS remains the single source of truth (DB + photos).
/api/worker/* endpoints (worker_api.py)python cli.py worker, claims batches via HTTP,
downloads photo bytes, processes locally, and POSTs results backRunning workers bare-metal on macOS causes PyTorch's MPS (Metal) allocator to leak memory over time, eventually crashing the machine. The Docker fleet avoids this by using CPU-only PyTorch with hard memory limits per container.
# Launch 4 workers for CLIP embeddings:
./run-workers.sh -s http://<NAS-IP>:8000 -p clip -d /photos/2026 -n 4
# Multiple passes:
./run-workers.sh -s http://<NAS-IP>:8000 -p clip,quality,faces -d /photos/2026
# Fewer workers for heavier passes:
./run-workers.sh -s http://<NAS-IP>:8000 -p describe --collection 3 -n 2
# More memory per worker if needed:
./run-workers.sh -s http://<NAS-IP>:8000 -p quality -d /photos/2026 -m 4g -n 3
# Monitor:
./run-workers.sh --status # containers + memory usage + recent progress
./run-workers.sh --logs # tail all worker logs live (Ctrl-C to stop)
./run-workers.sh --stop # stop all workers
Important: Use the NAS IP address (not hostname) — Docker containers run in an
isolated network and cannot resolve local DNS names like nas.local or mDNS hostnames.
Key options: -n (number of workers, default 4), -m (memory limit, default 3g),
--batch-size, --model-batch-size, --ttl, --force, --describe-model, --verify-model.
CPU-only inference is ~2-3x slower per worker than MPS, but 4 containers running concurrently with stable memory is faster than 1-2 MPS workers that eventually crash.
If --passes includes describe, tags, or verify, the script checks
localhost:11434 and either reuses an existing Ollama or starts a managed
container. It then pre-pulls the required models into the Ollama volume:
describe / tags → pulls ${DESCRIBE_MODEL:-llava}verify → pulls both ${VERIFY_MODEL:-minicpm-v} (verifier) and
${DESCRIBE_MODEL:-llava} (regeneration model used when a description fails
verification). Ollama does not auto-pull on request, so both must be present
before the pass runs.Prefer native Ollama. Running ollama serve directly on the Mac host avoids
Docker Desktop VM memory oversubscription. When native Ollama is reachable at
localhost:11434, run-workers.sh detects and reuses it (and only warns
about missing models rather than pulling them, since pulling multi-GB models
into someone else's Ollama unannounced is rude).
If you must use the managed Ollama container: raise Docker Desktop memory (Settings → Resources → Memory) to at least ~24 GiB for the default fleet (4 workers × 3 GiB = 12 GiB + Ollama ~4-5 GiB + daemon overhead). ~16 GiB has been observed to still OOM-kill the llama runner. Restart Docker Desktop fully after changing the limit.
Diagnostic hint: photosearch/describe.py detects the
"llama runner process has terminated" error pattern and prints a one-time
hint to stderr explaining the likely cause and fixes (see
_maybe_print_runner_oom_hint()).
# Run CLIP embeddings for a directory:
python cli.py worker -s http://nas.local:8000 -p clip -d /photos/2026/2026-04-09
# Run full pipeline on a directory:
python cli.py worker -s http://nas.local:8000 -p clip,faces,quality,describe,tags,verify -d /photos/2026
# Run CLIP + quality for a collection:
python cli.py worker -s http://nas.local:8000 -p clip,quality --collection 3
# Run descriptions with moondream model:
python cli.py worker -s http://nas.local:8000 -p describe --describe-model moondream
# Quick test — one batch only:
python cli.py worker -s http://localhost:8000 -p clip -d /photos/2026/2026-04-09 --one-shot
# Force re-process (clears existing data first):
python cli.py worker -s http://nas.local:8000 -p clip --collection 3 --force
Key options: --batch-size (photos per claim, default 16), --model-batch-size
(inference batch, default 8), --ttl (claim TTL minutes, default 30), --one-shot
(single batch then exit), --force (clear + reprocess, requires --collection or --directory).
| Endpoint | Purpose |
|---|---|
POST /api/worker/claim-batch | Claim N unprocessed photos for a pass type |
POST /api/worker/submit-results | Submit processing results for a claimed batch |
POST /api/worker/clear-pass | Clear processing state to allow re-processing |
GET /api/worker/photo-detail/{id} | Get photo metadata + CLIP embedding (for verify) |
GET /api/worker/status | Queue depth and active claims |
Network retries are built in (exponential backoff on transient errors like sleep/wake disconnections). If submit fails after retries, photos are released after TTL expires and will be reclaimed by the next worker.
run-workers.sh — Docker fleet launcher script (start/status/logs/stop)photosearch/worker.py — Client-side worker loop + per-pass processing functionsphotosearch/worker_api.py — Server-side FastAPI router (/api/worker/*)cli.py — worker command with all CLI options| Table | Purpose |
|---|---|
worker_claims | Active batch claims (batch_id, worker_id, photo_ids, expires_at) |
worker_processed | Per-photo processing ledger (photo_id, pass_type) for faces/describe/tags |
"database is locked" — Another process holds a write lock. Check docker ps for
concurrent jobs. WAL mode + PRAGMA busy_timeout=60000 resolves short locks automatically.
"Database not found: photo_index.db" — PHOTOSEARCH_DB env var not set. Confirm
Dockerfile has ENV PHOTOSEARCH_DB=/data/photo_index.db.
InsightFace downloading every run — /data/.insightface/ not persisted. Confirm Docker
volume mount and ENV INSIGHTFACE_HOME=/data/.insightface.
No SSE progress reaching browser — Three common causes: (1) using asyncio.get_event_loop()
instead of asyncio.get_running_loop() in async endpoints, (2) missing terminal event
type in the generate() stream check, (3) InterruptedError swallowed by generic
except Exception instead of being re-raised.
Worker containers exit immediately with "Connection refused" — Docker containers
can't resolve local hostnames (mDNS, .local, NAS hostnames). Use the NAS IP address