Redis Stack vector search. FT.CREATE with VECTOR field, HNSW vs FLAT, hybrid with other field types, RedisVL Python library, Redis Enterprise scaling, pipelines for bulk ingest, TTL for ephemeral vectors (cache use case), Redis JSON + vectors pattern. USE WHEN: user mentions "Redis vector", "Redis Stack", "FT.CREATE", "RedisVL", "Redis HNSW", "Redis JSON vector", "semantic cache Redis" DO NOT USE FOR: other vector stores - use `vector-stores/*` siblings; embedding caching theory - use `rag/caching-retrieval`
Redis is the correct default vector store when:
Redis is a poor fit for:
Vector search requires the search module (RediSearch 2.4+). It ships with:
redis/redis-stack:latest)Plain redis:latest OSS has no search module. Attempt on it and you get "unknown command".
FT.CREATE# pip install redis>=5
import redis
from redis.commands.search.field import VectorField, TagField, NumericField, TextField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
r = redis.Redis(host="localhost", port=6379, decode_responses=False)
schema = (
VectorField(
"vector",
"HNSW",
{
"TYPE": "FLOAT32",
"DIM": 1024,
"DISTANCE_METRIC": "COSINE",
"M": 16,
"EF_CONSTRUCTION": 200,
"EF_RUNTIME": 10,
},
),
TagField("tenant_id"),
TagField("source"),
NumericField("created_at"),
TextField("text"),
)
r.ft("idx:docs").create_index(
fields=schema,
definition=IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH),
)
The index watches any key starting with doc: — insert data with those prefixes and it gets indexed automatically.
| Index | Use case | Build time | Memory | Recall |
|---|---|---|---|---|
| FLAT | < 10k vectors, exact search | Instant | 1x | 100% |
| HNSW | 10k - 10M vectors, approximate | Slow insert | ~1.2x | 95-99% |
Parameters:
M (default 16): connections per node. Higher = better recall, more RAM.EF_CONSTRUCTION (200): build-time search width. Higher = better index, slower ingest.EF_RUNTIME (10): query-time search width. Higher = better recall, slower query.Tune EF_RUNTIME per query if different clients have different recall/latency needs — pass via $EF_RUNTIME in the query.
def insert(doc_id: str, vector: np.ndarray, tenant_id: str, source: str, text: str):
key = f"doc:{doc_id}"
r.hset(key, mapping={
"vector": vector.astype("float32").tobytes(),
"tenant_id": tenant_id,
"source": source,
"created_at": int(time.time()),
"text": text,
})
insert("d1", dense_vec, "acme", "kb", "OAuth uses refresh tokens.")
Vectors are stored as raw FLOAT32 bytes — not JSON, not base64.
from redis.commands.search.query import Query
import numpy as np
q_bytes = q_vec.astype("float32").tobytes()
q = (
Query("(@tenant_id:{acme} @source:{kb|faq})=>[KNN 10 @vector $BLOB AS score]")
.sort_by("score")
.return_fields("score", "text", "source")
.paging(0, 10)
.dialect(2)
)
results = r.ft("idx:docs").search(q, query_params={"BLOB": q_bytes})
for doc in results.docs:
print(doc.score, doc.text)
The (...) pre-filter runs first (tag index), then KNN over the filtered subset — true pre-filtering.
q = (
Query("@text:(oauth refresh token) @tenant_id:{acme}=>[KNN 10 @vector $BLOB AS score]")
.dialect(2)
)
@text:(...) uses Redis's built-in tokenizer + stemmer. Combine with KNN for a keyword-filtered semantic search.
For full hybrid retrieval (BM25 + vector fusion) use RedisVL's hybrid query below.
# pip install redisvl
from redisvl.index import SearchIndex
from redisvl.schema import IndexSchema
from redisvl.query import VectorQuery, FilterQuery
from redisvl.query.filter import Tag, Num
schema = IndexSchema.from_dict({
"index": {"name": "docs", "prefix": "doc:", "storage_type": "hash"},
"fields": [
{"name": "text", "type": "text"},
{"name": "tenant_id", "type": "tag"},
{"name": "created_at", "type": "numeric"},
{"name": "vector", "type": "vector",
"attrs": {"dims": 1024, "algorithm": "hnsw", "distance_metric": "cosine"}},
],
})
index = SearchIndex(schema, r)
index.create(overwrite=False)
filt = (Tag("tenant_id") == "acme") & (Num("created_at") > 1700000000)
q = VectorQuery(vector=q_vec.tolist(), vector_field_name="vector",
return_fields=["text", "source"], num_results=10, filter_expression=filt)
results = index.query(q)
RedisVL adds typed schemas, Pydantic-style filter expressions, and semantic cache / LLM session primitives.
Redis's native fit: cache LLM responses keyed by query embedding. Vector similarity check -> reuse answer.
from redisvl.extensions.llmcache import SemanticCache
cache = SemanticCache(
name="llmcache",
redis_url="redis://localhost:6379",
distance_threshold=0.1,
ttl=3600, # entries expire in 1h
)
hit = cache.check(prompt="What is OAuth PKCE?")
if hit:
answer = hit[0]["response"]