Cluster TCR/BCR clones by CDR3 sequences using GIANA or ClusTCR (both Faiss-based). Adds `CDR3_Cluster` column to metadata for clonotype analysis.
Cluster TCR/BCR clones by CDR3 sequences using GIANA or ClusTCR (both Faiss-based). Adds CDR3_Cluster column to metadata for clonotype analysis.
Important: Only runs when VDJ input present (TCRData/BCRData columns in SampleInfo).
[CDR3Clustering]
cache = true
[CDR3Clustering.in]
screpfile = "path/to/combined_object.qs"
[CDR3Clustering.envs]
type = "auto" # TCR, BCR, or auto
tool = "GIANA" # GIANA or ClusTCR
python = "python" # Path to python
within_sample = true # Cluster per sample
args = {} # Tool-specific arguments
chain = "both" # TRA, TRB, IGH, IGL, IGK, both, heavy, light
args)[CDR3Clustering.envs.args]
method = "hierarchical" # hierarchical, kmeans
dist = "hamming" # hamming, levenshtein
threshold = 0.15 # Distance threshold
args)[CDR3Clustering.envs.args]
method = "two-step" # mcl, faiss, two-step
n_cpus = 4 # CPUs for MCL
faiss_cluster_size = 5000 # Supercluster size
mcl_params = [1.2, 2] # [inflation, expansion]
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
tool = "GIANA"
[CDR3Clustering.envs.args]
method = "hierarchical"
dist = "hamming"
threshold = 0.15
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
tool = "ClusTCR"
[CDR3Clustering.envs.args]
method = "two-step"
faiss_cluster_size = 5000
n_cpus = 8
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
tool = "ClusTCR"
[CDR3Clustering.envs.args]
method = "mcl"
n_cpus = 4
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
chain = "TRB"
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
within_sample = false
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
type = "TCR"
tool = "GIANA"
chain = "TRB"
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
tool = "ClusTCR"
[CDR3Clustering.envs.args]
method = "two-step"
faiss_cluster_size = 5000
n_cpus = 8
[CDR3Clustering]
[CDR3Clustering.in]
screpfile = "intermediate/screpcombiningexpression/combined.qs"
[CDR3Clustering.envs]
tool = "GIANA"
[CDR3Clustering.envs.args]
threshold = 0.15 # Higher=fewer clusters, Lower=more clusters
CDR3_Cluster metadata)"GIANA" or "ClusTCR"method = "mcl" (highest quality)method = "two-step" (balanced)500K sequences: GIANA or ClusTCR
method = "two-step"(fastest)
Cause: No VDJ data available Solution: Verify ScRepCombiningExpression output contains TCR/BCR data
Cause: Missing dependencies Solution:
pip install biopython faiss-cpu scikit-learnconda install -c conda-forge clustcrCause: Threshold inappropriate Solution: Adjust threshold (higher = fewer clusters, lower = more clusters)
Cause: Dataset too large for RAM
Solution: Use within_sample = true, reduce n_cpus, or use GIANA
Cause: Suboptimal method for dataset size Solution:
50K: ClusTCR
method = "two-step"with increased n_cpus
Metadata column: CDR3_Cluster
Cluster naming:
S_1, S_2: Single unique CDR3 sequence (may have multiple cells)M_1, M_2: Multiple unique CDR3 sequences (similar but different)Interpretation:
S_ prefix: Cells share identical CDR3 sequenceM_ prefix: Cells have similar but different CDR3 sequencesCDR3_Cluster as grouping factor in Seurat plotsPerformance Tips: