Use this skill when a task involves Geneformer workflows, especially TranscriptomeTokenizer input preparation, tokenized `.dataset` generation, cell or gene classification with `Classifier`, embedding extraction with `EmbExtractor`, and in silico perturbation analysis with `InSilicoPerturber`.
Use this skill when the task involves official Geneformer workflows such as:
This skill is for Geneformer-specific workflows, not generic single-cell model use.
ensembl_id and n_counts are available..dataset.Use TranscriptomeTokenizer first for almost every Geneformer workflow.
This step converts raw-count .loom or .h5ad data into tokenized datasets
used by the downstream APIs.
Geneformer expects:
ensembl_idn_countsOptional metadata can be passed through during tokenization.
Use Classifier for:
The input is a tokenized Geneformer .dataset object, not raw AnnData.
Use EmbExtractor when the task is to:
Use InSilicoPerturber for zero-shot or model-based perturbation analyses such as:
This is one of Geneformer's defining workflows and should be treated as more than ordinary classifier inference.
ensembl_id..dataset files with AnnData objects.| Component | Use |
|---|---|
TranscriptomeTokenizer | create tokenized datasets |
Classifier | fine-tune cell or gene classifiers |
MTLClassifier | multitask cell classification |
EmbExtractor | extract and summarize embeddings |
InSilicoPerturber | simulate perturbations / treatment directions |
references/workflows.md.references/sources-and-notes.md.