CellFateGenie: Adaptive Threshold Regression for pseudotime-associated gene discovery, Mellon density, lineage scoring via ov.single.Fate.
Use this skill when the user wants to identify genes that drive cell fate decisions along a developmental trajectory. CellFateGenie discovers pseudotime-associated genes using adaptive ridge regression and then scores lineage-specific fate-driving genes via manifold density estimation.
This skill is used after trajectory inference — the user must already have pseudotime values computed (e.g., from Palantir, VIA, or diffusion pseudotime).
CellFateGenie answers: "Which genes change most significantly along pseudotime, and which are specifically driving a particular lineage?" It works in two phases:
adata.obs. Compute first using Palantir, VIA, DPT, or any trajectory method.pip install mellon for density estimation. Without it, low_density() will fail..X matrix. Log-normalized data is fine (unlike SCENIC which needs raw counts).import omicverse as ov
# pseudotime: column name in adata.obs containing pseudotime values
fate = ov.single.Fate(adata, pseudotime='dpt_pseudotime')
# Automatically uses GPU (PyTorchRidge) if CUDA available, else sklearn Ridge on CPU
coef_df = fate.model_init(
test_size=0.3, # Train/test split ratio
alpha=0.1, # Ridge regularization strength
use_data_augmentation=False, # Enable for noisy pseudotime
)
# Returns: DataFrame of gene coefficients
# Stores: fate.coef (all coefficients), fate.raw_r2, fate.raw_mse
This is the core innovation. ATR iteratively removes genes with the smallest coefficients and monitors when R² starts dropping significantly:
threshold_df = fate.ATR(
test_size=0.4,
alpha=0.1,
stop=500, # Maximum iterations. Increase for more genes (default 100).
flux=0.01, # R² drop tolerance. When R² drops by more than flux from max, stop.
)
# Sets fate.coef_threshold internally
# Visualize the filtering curve:
fate.plot_filtering() # Shows R² vs iteration, marks optimal threshold
filter_coef_df = fate.model_fit(
test_size=0.3,
alpha=0.1,
)
# Returns: DataFrame of coefficients for genes above threshold only
# Stores: fate.filter_coef
# Compare: fate.get_r2('raw') vs fate.get_r2('filter') — filter R² should be close to raw
kendall_df = fate.kendalltau_filter()
# Computes Kendall's tau rank correlation for each filtered gene vs pseudotime
# Returns: DataFrame with kendalltau_sta and pvalue per gene
# Confirms monotonic relationship — genes with high |tau| are truly pseudotime-associated
fate.low_density(
n_components=10, # Diffusion map components for manifold representation
knn=30, # k-nearest neighbors for density estimation
alpha=0.0, # Mellon regularization
seed=0,
pca_key='X_pca', # PCA embedding to use
)
# Stores: adata.obs['mellon_log_density_lowd']
# Low-density regions = developmental transition points (branching, commitment)
fate.lineage_score(
cluster_key='leiden', # Clustering column in adata.obs
lineage=['20', '17'], # Cluster labels defining the lineage of interest
cell_mask='specification', # How to select cells: 'specification' uses lineage list
density_key='mellon_log_density_lowd',
)
# Stores: adata.var['change_scores_lineage']
# High scores = genes with high expression variability specifically in that lineage
# Intersect ATR-selected genes with lineage-specific scores
fate_genes = adata.var.loc[fate.filter_coef.index, 'change_scores_lineage']
top_fate_genes = fate_genes.sort_values(ascending=False).head(20)
print(top_fate_genes)
For noisy pseudotime estimates, enable augmentation to improve robustness:
fate.model_init(
use_data_augmentation=True,
augmentation_strategy='jitter_pseudotime_noise', # or 'gene_expression_noise', 'both'
augmentation_intensity=0.05, # Noise magnitude (fraction of range)
)
# Same parameters available in ATR() and model_fit()
CellFateGenie also works with scATAC-seq data:
fate.atac_init(...) # Initialize for ATAC peak data
fate.get_related_peak(...) # Find peaks associated with fate genes
# ATR filtering curve — shows R² vs iteration
fate.plot_filtering(figsize=(3, 3))
# Model fit quality
fate.plot_fitting(type='raw') # All genes
fate.plot_fitting(type='filter') # ATR-selected genes only
# Color-coded by cluster
fate.plot_color_fitting(type='filter', cluster_key='leiden')
# CORRECT
fate = ov.single.Fate(adata, pseudotime='dpt_pseudotime')
# WRONG — column doesn't exist
# fate = ov.single.Fate(adata, pseudotime='pseudotime') # KeyError if not in adata.obs
The flux parameter (default 0.01) determines when ATR stops removing genes. Lower flux = more genes retained (stricter R² preservation). Higher flux = fewer genes (more aggressive filtering).
# WRONG — mellon not installed
# fate.low_density() # ImportError: No module named 'mellon'
# FIX
# pip install mellon
# Verify pseudotime column exists
assert pseudotime_col in adata.obs.columns, \
f"Pseudotime column '{pseudotime_col}' not in adata.obs. Compute trajectory first."
# Verify pseudotime has valid values (no NaN)
import numpy as np
assert not adata.obs[pseudotime_col].isna().any(), \
f"Pseudotime column contains NaN. Filter cells or impute missing values."
# Verify mellon is installed (before low_density)