スキル内容

LangExtract extracts structured information from unstructured text using LLMs. The main entry point is lx.extract(). Every extraction needs at least one example and input text. prompt_description is optional but recommended.

Install

pip install langextract

# For OpenAI model support:
pip install langextract[openai]

API keys

# Gemini (checks GEMINI_API_KEY then LANGEXTRACT_API_KEY)
export GEMINI_API_KEY="your_key"

# OpenAI (checks OPENAI_API_KEY then LANGEXTRACT_API_KEY)
export OPENAI_API_KEY="your_key"

# Ollama: no API key needed, just a running Ollama server

Auto-routing + env-default resolution is tuned for GPT-style model IDs (gpt-4*, gpt-5*). For OpenAI-compatible endpoints or non-GPT IDs, pass an explicit ModelConfig — see .

references/providers.md

import langextract as lx

examples = [
    lx.data.ExampleData(
        text="Patient takes lisinopril 10mg daily for hypertension.",
        extractions=[
            lx.data.Extraction(
                extraction_class="medication",
                extraction_text="lisinopril",
                attributes={"dose": "10mg", "frequency": "daily"},
            ),
            lx.data.Extraction(
                extraction_class="condition",
                extraction_text="hypertension",
                attributes={"status": "active"},
            ),
        ],
    )
]

result = lx.extract(
    text_or_documents="Patient is prescribed metformin 500mg twice daily.",
    prompt_description="Extract medications and conditions with attributes.",
    examples=examples,
    model_id="gemini-2.5-flash",
)

for e in result.extractions:
    print(e.extraction_class, e.extraction_text)
    print(f"  char_interval: {e.char_interval}")
    print(f"  attributes: {e.attributes}")

from langextract.prompt_validation import PromptValidationLevel

result = lx.extract(
    ...,
    prompt_validation_level=PromptValidationLevel.ERROR,  # default: WARNING
    prompt_validation_strict=True,                          # default: False
)

result = lx.extract(
    text_or_documents=text,          # str, URL, or list of Documents
    prompt_description=prompt,        # what to extract (optional)
    examples=examples,                # few-shot examples (required)
    model_id="gemini-2.5-flash",     # model to use
    extraction_passes=1,              # >1 for higher recall (costs more)
    max_char_buffer=1000,             # chunk size
    batch_length=10,                  # chunks per batch
    max_workers=10,                   # parallel workers (provider-dependent)
    context_window_chars=None,        # cross-chunk context for coreference
)

# Filter to grounded extractions only (have source positions)
grounded = [e for e in result.extractions if e.char_interval]

# Access source position
for e in grounded:
    start = e.char_interval.start_pos
    end = e.char_interval.end_pos
    matched_text = result.text[start:end]

# Save to JSONL
lx.io.save_annotated_documents(
    [result], output_name="results.jsonl", output_dir="."
)

# Visualize from JSONL (shows first document)
html = lx.visualize("results.jsonl")
with open("visualization.html", "w") as f:
    if hasattr(html, "data"):
        f.write(html.data)  # Jupyter/Colab
    else:
        f.write(html)

# Or visualize an AnnotatedDocument directly
html = lx.visualize(result)

Langextract Usage | Skills Pool

Langextract Usage

Langextract Usage

Install

API keys

Basic extraction

Writing good examples

Key parameters

Working with results

Provider selection

Common issues

Further reading

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing