Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, LABEL_LIST, NGRAM, INVERTED, BLOOMFILTER, RTREE, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.
Use this skill to answer questions about:
Do not use this skill for:
Python:
pip install pylance
Verify:
python -c "import lance; print(lance.__version__)"
Rust:
cargo add lance
Or add it to Cargo.toml (choose an appropriate version for your project):
[dependencies]
lance = "x.y"
From source (this repository):
maturin develop -m python/Cargo.toml
Collect the minimum information required to avoid wrong guidance:
If the user does not specify a language, default to Python examples and provide a short mapping to Rust concepts.
references/index-selection.md and confirm constraints.Prefer lance.write_dataset for most user workflows.
import lance
import pyarrow as pa
vectors = pa.array(
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
type=pa.list_(pa.float32(), 3),
)
table = pa.table({"id": [1, 2], "vector": vectors, "category": ["a", "b"]})
ds = lance.write_dataset(table, "my-data.lance", mode="create")
ds = lance.write_dataset(table, "my-data.lance", mode="append")
ds = lance.write_dataset(table, "my-data.lance", mode="overwrite")
Validation checklist:
lance.dataset(uri).count_rows()lance.dataset(uri).schemaNotes:
storage_options={...} when writing to an object store URI.commit_lock and point them to the user guide.Use lance.dataset + scanner(...) for pushdowns (projection, filter, limit, nearest).
import lance
ds = lance.dataset("my-data.lance")
tbl = ds.scanner(
columns=["id", "category"],
filter="category = 'a' and id >= 10",
limit=100,
).to_table()
Validation checklist:
scanner(...) call that reproduces it.filter string and whether prefilter is enabled (when using nearest).Run vector search with scanner(nearest=...) or to_table(nearest=...).
import lance
import numpy as np
ds = lance.dataset("my-data.lance")
q = np.array([1.0, 2.0, 3.0], dtype=np.float32)
tbl = ds.to_table(nearest={"column": "vector", "q": q, "k": 10})
If combining a filter with vector search, decide whether the filter must run before the vector query:
prefilter=True when the filter is highly selective and correctness (top-k among filtered rows) matters.prefilter=False when the filter is not very selective and speed matters, and accept that results can be fewer than k.tbl = ds.scanner(
nearest={"column": "vector", "q": q, "k": 10},
filter="category = 'a'",
prefilter=True,
).to_table()
Create a vector index with LanceDataset.create_index(...).
Start with a minimal working configuration:
ds = lance.dataset("my-data.lance")
ds = ds.create_index(
"vector",
index_type="IVF_PQ",
target_partition_size=8192,
num_sub_vectors=16,
)
Then verify:
ds.describe_indices() (preferred) or ds.list_indices() (can be expensive)nearest query that uses the indexFor parameter selection and tuning, consult references/index-selection.md.
Scalar indices speed up scans with filters. Use create_scalar_index for a stable entry point.
ds = lance.dataset("my-data.lance")
ds.create_scalar_index("category", "BTREE", replace=True)
Then verify:
ds.describe_indices()scanner(filter=...) queryTo choose a scalar index type (BTREE vs BITMAP vs LABEL_LIST vs NGRAM vs INVERTED, etc.), consult references/index-selection.md.
prefilter=True if the user expects top-k among filtered rows.num_sub_vectors.dimension / num_sub_vectors (see references/index-selection.md).use_scalar_index=False).When answering API questions, confirm the exact signature and docstrings locally:
python/python/lance/dataset.py (write_dataset, LanceDataset.scanner)python/python/lance/dataset.py (create_index)python/python/lance/dataset.py (create_scalar_index)Use targeted search:
rg -n "def write_dataset\\b|def create_index\\b|def create_scalar_index\\b|def scanner\\b" python/python/lance/dataset.py
references/index-selection.mdreferences/io-cheatsheet.mdscripts/python_end_to_end.py