Text splitting strategies, embedding generation with FastEmbed, RAG pipeline integration

Chunking Architecture Overview

Location: crates/kreuzberg/src/chunking/, crates/kreuzberg/src/embeddings.rs

Extracted Text
    |
[1. Normalization] -> Clean whitespace, remove control chars
    |
[2. Chunk Strategy Selection] -> Fixed-size, semantic, syntax-aware, recursive
    |
[3. Overlap Management] -> Control context window overlap
    |
[4. Optional Embedding] -> Generate vectors with FastEmbed
    |
Output: Vec<Chunk> with text, vectors, metadata

Chunking Strategies

Location: crates/kreuzberg/src/chunking/mod.rs

Strategy	Pattern	Best For
Fixed-Size	Sliding window with configurable overlap

Text splitting strategies, embedding generation with FastEmbed, RAG pipeline integration

Chunking Architecture Overview

Location: crates/kreuzberg/src/chunking/, crates/kreuzberg/src/embeddings.rs

Extracted Text
    |
[1. Normalization] -> Clean whitespace, remove control chars
    |
[2. Chunk Strategy Selection] -> Fixed-size, semantic, syntax-aware, recursive
    |
[3. Overlap Management] -> Control context window overlap
    |
[4. Optional Embedding] -> Generate vectors with FastEmbed
    |
Output: Vec<Chunk> with text, vectors, metadata

Chunking Strategies

Location: crates/kreuzberg/src/chunking/mod.rs

Strategy	Pattern	Best For
Fixed-Size	Sliding window with configurable overlap

Preset	Chunk Size	Overlap	Strategy	Use Case
Balanced	512 tokens	50	Semantic	RAG sweet spot
Compact	256 tokens	32	Fixed-Size	Dense vectors
Extended	1024 tokens	100	Recursive	Full context
Minimal	128 tokens	16	(default)	Lightweight embeddings

Model	Dimensions	Notes
`BAAI/bge-small-en-v1.5` (default)	384	Fast, excellent for RAG
`BAAI/bge-small-zh-v1.5`	384	Chinese optimized
`BAAI/bge-base-en-v1.5`	768	Better quality, slower
`jinaai/jina-embeddings-v2-base-en`	768	Long context (up to 8192 tokens)
`Custom(path)`	varies	Custom ONNX model path

Chunking Embeddings

Chunking Architecture Overview

Chunking Strategies

Chunking Embeddings

Chunking Architecture Overview

Chunking Strategies

Chunking Configuration Presets

Embedding Generation with FastEmbed

Model Selection

Embedding Pattern

ONNX Runtime Requirement

RAG Integration Pattern

Critical Rules

Healthcare Cdss Patterns

Drug Discovery

Qmd

Attack Tree Construction

Azure Ai Anomalydetector Java

Viboscope

Chunking Embeddings

Chunking Architecture Overview

Chunking Strategies

Chunking Embeddings

Chunking Architecture Overview

Chunking Strategies

Chunking Configuration Presets

Embedding Generation with FastEmbed

Model Selection

Embedding Pattern

ONNX Runtime Requirement

RAG Integration Pattern

Critical Rules

Related Skills

Healthcare Cdss Patterns

Drug Discovery

Qmd

Attack Tree Construction

Azure Ai Anomalydetector Java

Viboscope