Name: Solr Parent Chunk Model
Author: jmservera

Context

Apply this skill when modifying Solr queries, adding schema fields, changing search modes, or reviewing PRs that touch search_service.py, managed-schema.xml, or document-indexer chunking logic. The parent/chunk split is the most common source of correctness bugs in this project.

Patterns

Parent documents (books):

id = SHA-256 of file path (unique per book)
Metadata: title_s/t, author_s/t, year_i, category_s, series_s, language_detected_s, file_path_s, folder_path_s, page_count_i, file_size_l
Optional: book_embedding (512D) for book-level similarity
No parent_id_s field — this is how you identify a parent

Context

Patterns

Parent documents (books):

id = SHA-256 of file path (unique per book)
Metadata: title_s/t, author_s/t, year_i, category_s, series_s, language_detected_s, file_path_s, folder_path_s, page_count_i, file_size_l
Optional: book_embedding (512D) for book-level similarity
No parent_id_s field — this is how you identify a parent

Field purpose	Add to parent?	Add to chunk?	Why
Book metadata (author, year)	Yes	Copy from parent	Chunks need it for display after kNN
Full-text search field	Yes (via Tika)	No (use chunk_text_t)	Tika extracts to parent; chunks have own text
Dense vector embedding	Optional (book_embedding)	Yes (embedding_v)	kNN searches chunks, not parents
Facet field	Yes	Not needed	Facets come from BM25 leg (parents only)

Source	keyword	semantic	hybrid
Facets	Solr `facet_counts`	None	From BM25 leg
Highlights	Solr `highlighting`	None	From BM25 leg
Sort	Solr-native	By cosine score	By RRF score

Solr Parent Chunk Model

Context

Patterns

Solr Parent Chunk Model

Context

Patterns

2. Filter rules differ by search mode

3. Adding new fields — which document type?

4. ID generation

Anti-Patterns

Examples

Correct: kNN query targeting chunks

Correct: BM25 query excluding chunks

Correct: Delete book and its chunks

5. Hybrid search implementation (BM25 + kNN + RRF)

Three search modes with clear boundaries

RRF implementation rules

Embedding integration with graceful degradation

Use POST for Solr queries

Facet integration across modes

Filter query security

Embedding pipeline

Filter queries work across all modes

References

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy

Solr Parent Chunk Model

Context

Patterns

1. Two document types share one Solr collection

Solr Parent Chunk Model

Context

Patterns

1. Two document types share one Solr collection

2. Filter rules differ by search mode

3. Adding new fields — which document type?

4. ID generation

Anti-Patterns

Examples

Correct: kNN query targeting chunks

Correct: BM25 query excluding chunks

Correct: Delete book and its chunks

5. Hybrid search implementation (BM25 + kNN + RRF)

Three search modes with clear boundaries

RRF implementation rules

Embedding integration with graceful degradation

Use POST for Solr queries

Facet integration across modes

Filter query security

Embedding pipeline

Filter queries work across all modes

References

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy