Comprehensive assistance with scArches (single-cell architecture surgery) development, generated from official documentation. scArches enables integration of newly produced single-cell datasets into integrated reference atlases through decentralized training and model surgery.
When to Use This Skill
This skill should be triggered when:
Building reference atlases using scVI, trVAE, scANVI, totalVI, or expiMap models
Mapping query datasets to existing reference atlases for cell type annotation
Performing cell type label transfer from reference to query datasets
Integrating multi-modal data (CITE-seq, scRNA-seq + ATAC, TCR + transcriptome)
Analyzing spatial transcriptomics data with SageNet
Working with gene programs and pathway analysis using expiMap
Training deep generative models for single-cell data integration
Debugging scArches models or optimization issues
関連 Skill
Learning best practices for single-cell reference mapping
Quick Reference
Essential Code Patterns
Import and Setup
import warnings
warnings.simplefilter(action='ignore')
import scanpy as sc
import torch
import scarches as sca
import numpy as np
import gdown
Reference Model Training (expiMap)
# Prepare data with gene annotations
sca.utils.add_annotations(adata, 'reactome.gmt', min_genes=12, clean=True)
adata._inplace_subset_var(adata.varm['I'].sum(1)>0)
# Initialize and train model
intr_cvae = sca.models.EXPIMAP(
adata=adata,
condition_key='study',
hidden_layer_sizes=[256, 256, 256],
recon_loss='nb'
)
# Train with early stopping
early_stopping_kwargs = {
"early_stopping_metric": "val_unweighted_loss",
"threshold": 0,
"patience": 50,
"reduce_lr": True,
"lr_patience": 13,
"lr_factor": 0.1,
}
intr_cvae.train(
n_epochs=400,
alpha_epoch_anneal=100,
alpha=0.7,
alpha_kl=0.5,
early_stopping_kwargs=early_stopping_kwargs,
use_early_stopping=True
)
Query Dataset Mapping
# Load pretrained reference model
model = sca.models.SCANVI.load_query_data(adata_query, reference_model)
# Fine-tune on query data
model.train(
n_epochs=100,
train_size=1.0,
lr=1e-4,
use_early_stopping=True
)
# Get latent representation
latent = model.get_latent_representation()
Normalized data acceptable for trVAE (set recon_loss='mse')
Highly variable genes: Minimum 2000, increase to 5000 for complex datasets
Cell type labels: Required for scANVI reference, optional for query
Resources
references/
Comprehensive documentation extracted from official sources containing:
Detailed API documentation with parameter descriptions
Step-by-step tutorials with real datasets
Code examples with proper syntax highlighting
Links to original documentation for further reading
scripts/
Add helper scripts for:
Data preprocessing pipelines
Model training automation
Batch effect evaluation
Visualization utilities
assets/
Store:
Example datasets and preprocessing results
Trained model checkpoints
Configuration templates
Visualization templates
Notes
This skill was generated from official scArches documentation (http://127.0.0.1:9180)
Reference files preserve original structure and examples
All code examples extracted from actual tutorials and API docs
Training recommendations based on empirical best practices
Updating
To refresh this skill with updated documentation:
Re-run the documentation scraper with current scArches version
Update reference files with latest API changes and tutorials
Verify code examples against newest scArches release
Test training workflows with updated hyperparameters
Common Use Cases
Cell Type Annotation
# Map query to reference and transfer labels
query_adata = sca.utils.read('query_data.h5ad')
model = sca.models.SCANVI.load_query_data(query_adata, ref_model)
model.train(max_epochs=400)
predictions = model.predict(query_adata)
Multi-modal Integration
# CITE-seq data integration
model = sca.models.TOTALVI(adata)
model.train()
latent_rna, latent_protein = model.get_latent_representation()
# Analyze query in context of known pathways
expimap_model = sca.models.EXPIMAP(reference, gene_sets='reactome')
gp_activities = expimap_model.get_gene_program_scores(query_data)