Separates T and non-T cells or B and non-B cells from a mixed cell population. Uses either clonotype percentage from VDJ data, indicator gene expression (CD3 markers for T cells, CD19/CD20 for B cells), custom selector expressions, or k-means clustering for automatic selection.
Separates T and non-T cells or B and non-B cells from a mixed cell population. Uses either clonotype percentage from VDJ data, indicator gene expression (CD3 markers for T cells, CD19/CD20 for B cells), custom selector expressions, or k-means clustering for automatic selection.
SeuratClusteringOfAllCells to identify which clusters are T/B cells[TOrBCellSelection]
cache = true # Enable caching for this process
[TOrBCellSelection.in]
# Seurat object file (RDS/qs2 format) from SeuratClusteringOfAllCells
srtobj = ["SeuratClusteringOfAllCells"]
# Optional: Immune repertoire data file (RDS/qs2 format) from ScRepLoading
# Required unless ignore_vdj is set to true
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Whether to ignore VDJ information and use only marker gene expression
ignore_vdj = false
# Custom R expression to identify T/B cells
# Example: "Clonotype_Pct > 0.25" selects cells with >25% clonotype percentage
# Can use indicator genes: "Clonotype_Pct > 0.25 & CD3E > 0"
# If not provided, k-means clustering will be used
selector = null
# List of indicator genes for T/B cell identification
# For T cells: ["CD3E", "CD3D", "CD3G"] (positive markers)
# or include negative markers: ["CD3E", "CD19", "CD14"]
# For B cells: ["CD19", "MS4A1", "CD79A", "CD79B"]
indicator_genes = ["CD3E"]
# Parameters for k-means clustering (if selector not provided)
# Reference: https://rdrr.io/r/stats/kmeans.html
# Note: dots in argument names should be replaced with hyphens
kmeans = {"nstart": 25}
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
What this does: Uses default CD3E marker + k-means clustering with VDJ data to automatically select T cell clusters.
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Use all three CD3 markers for robust T cell identification
indicator_genes = ["CD3E", "CD3D", "CD3G"]
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Select B cells using CD19 and CD20 (MS4A1) markers
indicator_genes = ["CD19", "MS4A1"]
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Select cells/clusters with >25% clonotype percentage as T/B cells
selector = "Clonotype_Pct > 0.25"
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Select cells with high clonotype percentage AND CD3E expression
indicator_genes = ["CD3E"]
selector = "Clonotype_Pct > 0.25 & CD3E > 0"
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TOrBCellSelection.envs]
# Ignore VDJ data, use only marker gene expression
ignore_vdj = true
# Need at least 2 markers for k-means when VDJ is ignored
indicator_genes = ["CD3E", "CD3D", "CD3G"]
# First gene must be a positive marker for selection
# (CD3E is positive for T cells)
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TOrBCellSelection.envs]
# Select B cells using markers only (no VDJ data)
ignore_vdj = true
indicator_genes = ["CD19", "MS4A1", "CD79A"]
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
indicator_genes = ["CD3E", "CD3D", "CD3G"]
# Custom k-means parameters
# nstart: number of random starts for stability (default: 25)
# iter.max: maximum iterations (default: 10 in R)
# Note: hyphens instead of dots in key names
kmeans = {"nstart": 50, "iter-max": 20}
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Robust T cell selection using all three CD3 markers
indicator_genes = ["CD3E", "CD3D", "CD3G"]
When to use: Typical TCR-seq analysis where T cells need to be separated from other cell types.
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# B cell selection using CD19 and CD20 markers
indicator_genes = ["CD19", "MS4A1"]
When to use: BCR-seq analysis where B cells need to be separated from other cell types.
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Lower threshold to capture more T cells
selector = "Clonotype_Pct > 0.10 & CD3E > 0"
When to use: When you suspect low-quality VDJ data or want to capture borderline T cells.
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Higher threshold for clean T cell population
selector = "Clonotype_Pct > 0.50 & CD3E > 1"
When to use: When you want only the highest-confidence T cells (e.g., for clonal expansion analysis).
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Let k-means determine T cell clusters automatically
# No selector = automatic selection
indicator_genes = ["CD3E", "CD3D", "CD3G"]
kmeans = {"nstart": 50}
When to use: When you don't have a specific threshold in mind and want automatic unsupervised selection.
seurat_clusters metadataignore_vdj = true)SeuratPreparing → SeuratClusteringOfAllCells → TOrBCellSelection → SeuratClustering → (downstream TCR/BCR analysis)
↑
ScRepLoading
When selector is not provided, TOrBCellSelection performs:
Pros: Automatic, unsupervised, adapts to data Cons: May select unexpected clusters if data is noisy
Provide a custom R expression via selector:
Clonotype_Pct > 0.25Clonotype_Pct > 0.25 & CD3E > 0(Clonotype_Pct > 0.25 | CD3E > 1) & CD19 < 0.1Pros: Full control, transparent selection criteria Cons: Requires domain knowledge, need to test thresholds
Set ignore_vdj = true to use only marker genes:
Pros: Works without VDJ data, robust marker-based selection Cons: Requires good marker genes, may include non-clonal cells
Positive markers (expressed in T cells):
CD3E: Core CD3 epsilon chain (most reliable)CD3D: Core CD3 delta chainCD3G: Core CD3 gamma chainNegative markers (excluded from T cells):
CD19: B cell markerMS4A1 (CD20): B cell markerCD14: Monocyte markerCD68: Macrophage markerRecommended for T cells:
indicator_genes = ["CD3E", "CD3D", "CD3G"]
Positive markers (expressed in B cells):
CD19: Pan-B cell marker (most reliable)MS4A1 (CD20): Mature B cell markerCD79A: B cell receptor componentCD79B: B cell receptor componentRecommended for B cells:
indicator_genes = ["CD19", "MS4A1"]
For selecting specific T/B cell subtypes:
CD4CD8A, CD8BFOXP3, IL2RACD27CD38, SDC1 (CD138)srtobj must be specified (from SeuratClusteringOfAllCells)immdata required unless ignore_vdj = trueignore_vdj = true, must provide at least 2 indicator genesindicator_genes must be a positive marker when using k-means without VDJ dataselector must be a valid R expressionClonotype_Pct), indicator genes (e.g., CD3E)& (and), | (or), ! (not)kmeans must be a valid JSON objectnstart, iter-max, algorithm, etc. (see stats::kmeans documentation)iter.max → iter-max)Cause: Barcode mismatch between scRNA-seq and VDJ data Solution:
ScRepLoading processed VDJ data correctlyignore_vdj = true to use marker genes onlyCause: Using ignore_vdj = true with only 1 indicator gene
Solution: Add more indicator genes or use a custom selector
Cause: K-means selected wrong cluster Solution:
details/kmeans.pngindicator_genes to include more robust markersselector instead of automatic selectionkmeans.nstart for more stable clustering (e.g., {"nstart": 50})Cause: Threshold too high or too low Solution:
selector threshold (e.g., Clonotype_Pct > 0.20 vs 0.30)details/data.txtdetails/ directory for gene vs clonotype relationshipsCause: Poor VDJ data or incorrect marker genes Solution:
ScRepLoading outputCD3E is actually expressed in your dataignore_vdj = true with robust marker genesoutfile: Seurat object (qs2 format) containing only selected T/B cells
{{in.srtobj | stem}}.qsdetails/)data.txt: Table of indicator gene expression and clonotype percentage per cluster
kmeans.png: K-means clustering visualization (if k-means used)selected_cells_per_sample.png: Bar plot of selected cells per sampleselected_cells_pie.png: Pie chart of selected vs other cellsselected-cells.png: Dimension plots showing VDJ data and selected cellsfeature-plots.png: Feature plots of indicator genesInteractive HTML report with visualization of selection results and cell composition.
# Standard TCR-seq workflow
[SeuratClusteringOfAllCells]
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
indicator_genes = ["CD3E", "CD3D", "CD3G"]
[SeuratClustering]
# Clustering of selected T cells
[CDR3Clustering]
[TESSA]
[ClonalStats]
[SeuratClusteringOfAllCells]
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
immdata = ["ScRepLoading"]
[TOrBCellSelection.envs]
# Select B cells from TILs
indicator_genes = ["CD19", "MS4A1", "CD79A"]
selector = "CD19 > 0.5"
[SeuratClustering]
[CDR3Clustering]
[CellCellCommunication]
[SeuratClusteringOfAllCells]
[TOrBCellSelection]
[TOrBCellSelection.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TOrBCellSelection.envs]
# No VDJ data, use markers only
ignore_vdj = true
indicator_genes = ["CD3E", "CD3D", "CD3G"]
[SeuratClustering]
[ScFGSEA]
[CellCellCommunication]
Not for Pure T/B Cell Populations: If all cells are already T or B cells, skip this process and use SeuratClustering directly.
Cluster-Level Selection: Selection happens at the cluster level, not single-cell level. All cells in selected clusters are kept.
Normalization: Gene expression values are normalized (mean=0, SD=1) before k-means clustering.
Marker First: When using k-means without VDJ data, the first indicator gene must be a positive marker for your target cell type.
Report Review: Always review the HTML report and plots in details/ to verify selection quality.
Threshold Tuning: Start with default k-means, then adjust to custom selector if automatic selection is not satisfactory.