Help researchers select and characterize cancer cell lines for experiments. Given a cancer type, gene of interest, or cell line name, profiles molecular features (mutations, expression, CNV), gene dependencies (CRISPR screens), drug sensitivities (IC50/AUC), and genetic backgrounds using DepMap, Cellosaurus, PharmacoDB, COSMIC, CellMarker, CLUE, and SYNERGxDB. Generates a decision-support report for cell line selection. Use when researchers ask about which cell line to use, cell line characterization, DepMap dependencies, drug sensitivity profiles, or cancer model selection.
Comprehensive profiling of cancer cell lines for experimental model selection. Transforms a query (cancer type, gene, or cell line name) into an actionable report covering identity verification, molecular features, gene dependencies, drug sensitivities, and druggable targets.
KEY PRINCIPLES:
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply for: cell line selection by cancer type/gene, cell line profiling, gene dependencies, drug sensitivity queries, cell line comparisons, mutation checks.
BEFORE calling ANY tool, verify parameters against this table.
| Tool | Key Parameters | Notes |
|---|---|---|
DepMap_search_cell_lines | query (required) | Search by name, e.g., "A549", "MCF" |
DepMap_get_cell_line | model_name OR model_id | Name: "A549"; ID: "SIDM00001" |
DepMap_get_cell_lines | tissue, cancer_type, page_size | Filter by tissue (e.g., "Lung") |
DepMap_get_gene_dependencies | gene_symbol (required), model_id | Gene effect scores; negative = essential |
DepMap_search_genes | query (required) | Validate gene symbol in DepMap first |
cellosaurus_search_cell_lines | q (required), size | Solr syntax: id:HeLa, ox:9606 AND char:cancer |
cellosaurus_get_cell_line_info | accession (required, CVCL_ format) | Full cell line record |
cellosaurus_query_converter | query (required) | Natural language to Solr syntax |
COSMIC_search_mutations | terms OR query, max_results | Search "BRAF V600E" or gene name |
COSMIC_get_mutations_by_gene | gene OR gene_name, max_results | All mutations for a gene |
PharmacoDB_get_cell_line | operation="get_cell_line", cell_name | Cell line metadata + datasets |
PharmacoDB_get_experiments | operation="get_experiments", compound_name, cell_line_name, dataset_name, per_page | Drug response data (IC50, AAC, EC50) |
PharmacoDB_get_biomarker_assoc | operation="get_biomarker_associations", , , , |
Input: Cancer type AND/OR Gene of interest AND/OR Cell line name(s)
Phase 1: Cell Line Identification
- Search and verify cell line identity (Cellosaurus)
- Get metadata: species, disease, STR profile, cross-references
- If cancer type given without cell line: find candidate lines (DepMap)
Phase 2: Molecular Profiling
- Mutation landscape (COSMIC, cBioPortal CCLE)
- Gene expression (HPA, DepMap)
- Cancer markers (CellMarker)
Phase 3: Gene Dependencies (CRISPR Screens)
- Gene essentiality scores from DepMap
- Identify selectively essential genes
- Compare across cell lines if multiple candidates
Phase 4: Drug Sensitivity
- IC50/AAC from PharmacoDB (GDSC, CCLE, CTRPv2, PRISM)
- Biomarker associations for drug response
- Drug combination synergy (SYNERGxDB)
Phase 5: Target Druggability & Recommendations
- Druggable targets (DGIdb, OpenTargets)
- Final ranked recommendation with rationale
Goal: Verify cell line identity and find candidates.
If specific cell line given: (1) cellosaurus_search_cell_lines(q="id:<NAME>") → get CVCL accession, species, disease, contamination flags. (2) cellosaurus_get_cell_line_info(accession="CVCL_XXXX") for STR profile. (3) DepMap_get_cell_line(model_name="...") for tissue, cancer_type, MSI, ploidy. (4) PharmacoDB_get_cell_line(operation="get_cell_line", cell_name="...") for datasets.
If cancer type only: (1) DepMap_get_cell_lines(tissue="Lung", page_size=20). (2) Narrow by gene mutations/dependencies in Phases 2-3. (3) CellMarker_search_cancer_markers(operation="search_cancer_markers", cancer_type="Lung").
OUTPUT: Table of candidate cell lines with: name, tissue, cancer type, key identifiers.
Goal: Characterize mutational and expression landscape.
2A Mutations: COSMIC_get_mutations_by_gene(gene="EGFR") + cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="EGFR,KRAS,TP53"). Note: gene_list is a comma-separated STRING. CCLE study ID: ccle_broad_2019.
2B Expression: HPA_get_comparative_expression_by_gene_and_cellline(gene_name="EGFR", cell_line="a549"). Only 10 lines supported: hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa.
2C Cancer markers: CellMarker_search_by_gene(operation="search_by_gene", gene_symbol="EGFR", species="Human")
OUTPUT: Mutation table (gene, AA change, type) + expression summary per cell line.
Goal: Determine which genes are essential in candidate cell lines.
LIMITATION: DepMap_get_gene_dependencies returns gene metadata (HGNC ID, Ensembl ID) but NOT per-cell-line CRISPR scores. Full Chronos scores require depmap.org download.
Available tools: (1) DepMap_search_genes(query="EGFR") — validate gene exists. (2) DepMap_get_gene_dependencies(gene_symbol="EGFR") — metadata only. (3) Alternatives: cBioPortal CCLE for mutation data, PubMed for published screens, or direct user to depmap.org/portal.
Interpreting Chronos scores (from DepMap portal): <-0.5 = essential; ~0 = not essential; ~-1.0 = strongly essential. Selective dependency (essential in some lineages only) indicates therapeutic window.
OUTPUT: Gene validation + mutation status per cell line.
Offline DepMap analysis (when API lacks CRISPR scores): Download CRISPRGeneEffect.csv + Model.csv from https://depmap.org/portal/download/all/. Load with pandas, find gene column (format: "KRAS (3845)"), merge with metadata, filter by lineage, sort by score. Most negative Chronos score = most dependent.
If DepMap data is unavailable: Use cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS") for mutation data, and the Quick Reference table below for common recommendations.
Goal: Profile drug response data.
4A PharmacoDB: PharmacoDB_get_experiments(operation="get_experiments", compound_name="Erlotinib", cell_line_name="A549", per_page=20) for dose-response (IC50, AAC, EC50). Omit compound_name to get all drugs for a cell line. Use PharmacoDB_get_biomarker_assoc(compound_name="...", tissue_name="...", mdata_type="mutation") for sensitivity biomarkers.
4B SYNERGxDB: SYNERGxDB_search_combos(drug_name_1="gemcitabine", drug_name_2="erlotinib", sample="lung"). Positive ZIP = synergy. Covers cytotoxic agents only (not targeted therapies/biologics).
4C CLUE: CLUE_get_cell_lines(operation="get_cell_lines", cell_id="MCF7") — requires CLUE_API_KEY.
OUTPUT: Drug sensitivity table (drug, IC50, AAC, dataset) + synergy data if available.
5A Druggability: DGIdb_get_drug_gene_interactions(genes=["EGFR", "KRAS"]) + MyGene_query_genes(query="EGFR") → OpenTargets_get_associated_drugs_by_target_ensemblID(ensemblId="...", size=10) + STRING_get_network(protein_ids=["EGFR"], species=9606).
5B Final Recommendation: Synthesize all phases. Explain WHY one line is better for this specific use case.
| Criterion | Weight | Score 3 (Best) | Score 2 (Acceptable) | Score 1 (Poor) |
|---|---|---|---|---|
| Mutation match | x3 | Exact mutation (e.g., KRAS G12D) | Same gene, different mutation | No mutation in gene of interest |
| Co-mutation simplicity | x2 | Few co-mutations (cleaner background) | Moderate co-mutations | Complex background (3+ driver mutations) |
| Gene dependency | x2 | DepMap score < -0.5 (essential) | Score -0.5 to -0.2 (moderately essential) | Score > -0.2 (not essential) |
| Drug sensitivity data | x1 | In GDSC + CCLE + PRISM (3+ datasets) | In 1-2 datasets | No drug response data |
| Practical factors | x1 | Adherent, well-characterized, widely used | Suspension or less common | Hard to culture, contamination-prone |
Total score = sum of (criterion score × weight). Max = 27. Rank cell lines by total score.
The best cell line depends on what you're doing with it:
| Use Case | Key Requirements | Extra Considerations |
|---|---|---|
| CRISPR knockout screen | Adherent growth, good lentiviral transduction, pre-existing Cas9 clones (check Cellosaurus for "-Cas9" derivatives) | Doubling time matters for library coverage; <72h ideal |
| Drug sensitivity testing | In PharmacoDB/GDSC, known IC50 for reference compounds | Check SYNERGxDB for combo data |
| Xenograft model | Known tumorigenicity in mice, available PDX data | Check if line forms tumors in nude/NSG mice (Cellosaurus often notes this) |
| Mechanism of action | Clean genetic background, gene dependency confirmed | Fewer co-mutations = easier to attribute phenotypes |
| Biomarker discovery | Isogenic pairs available, well-characterized omics | Check if isogenic knockouts exist (Cellosaurus) |
| Drug combination | In SYNERGxDB with combo data, known single-agent responses | ZIP score available for synergy assessment |
Check for pre-made derivatives — this can save months of lab work:
cellosaurus_search_cell_lines(q="ca:<PARENT_LINE>", size=20) — finds all derivativesIf DepMap_get_gene_dependencies fails (common for some genes):
cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="<GENE>") as an alternative source for cell line mutation data.OUTPUT: Ranked cell line table with total scores, per-criterion breakdown, and a text recommendation explaining the top pick and runner-up with biological reasoning.
| Pattern | Question Type | Key Tools (in order) |
|---|---|---|
| 1 | "Which cell line for [cancer] + [gene]?" | DepMap_get_cell_lines → DepMap_get_gene_dependencies → COSMIC_get_mutations_by_gene → cBioPortal_get_mutations (ccle_broad_2019) → PharmacoDB_get_experiments → rank by mutation + dependency + drug sensitivity |
| 2 | "Profile cell line X" | cellosaurus_search → DepMap_get_cell_line → PharmacoDB_get_cell_line → cBioPortal_get_mutations → HPA expression (if supported) → PharmacoDB_get_experiments |
| 3 | "Which lines are sensitive to [drug]?" | DepMap_get_cell_lines (tissue filter) → PharmacoDB_get_experiments (compound) → PharmacoDB_get_biomarker_assoc → rank by AAC (higher=sensitive) or IC50 (lower=sensitive) |
| 4 | "Compare A vs B" | Run Pattern 2 for both in parallel → side-by-side comparison table |
| 5 | "Drug combos for [cell line]?" | SYNERGxDB_search_combos → PharmacoDB_get_experiments (single-agent baseline) → report synergistic pairs with ZIP scores |
| Cancer Type | Key Cell Lines | Common Mutations |
|---|---|---|
| NSCLC | A549 (KRAS G12S), H1975 (EGFR L858R/T790M), PC-9 (EGFR del19), HCC827 (EGFR del19/amp), H460 (KRAS Q61H), H1299 (NRAS Q61K, TP53-null) | KRAS, EGFR, TP53, STK11 |
| Breast | MCF7 (ER+/PR+), MDA-MB-231 (TNBC, KRAS G13D), T-47D (ER+), BT-474 (HER2+), SK-BR-3 (HER2+) | PIK3CA, TP53, BRCA1/2 |
| Colorectal | HCT116 (KRAS G13D, MSI-H), SW480 (KRAS G12V), HT-29 (BRAF V600E), Caco-2 (APC), LoVo (KRAS G13D, MSI-H) | APC, KRAS, TP53, BRAF |
| Melanoma | A375 (BRAF V600E), SK-MEL-28 (BRAF V600E), WM266-4 (BRAF V600D), MeWo (WT BRAF) | BRAF, NRAS, TP53 |
| Pancreatic | PANC-1 (KRAS G12D), MIA PaCa-2 (KRAS G12C), AsPC-1 (KRAS G12D), Capan-1 (BRCA2 mut) | KRAS, TP53, CDKN2A, SMAD4 |
| Prostate | PC-3 (AR-negative), LNCaP (AR+, PTEN-null), DU145 (AR-negative), VCaP (AR amp, TMPRSS2-ERG) | AR, PTEN, TP53, RB1 |
| Ovarian | SKOV3 (HER2+, TP53 mut), OVCAR3 (TP53 mut), A2780 (sensitive), A2780cis (cisplatin-resistant) | TP53, BRCA1/2 |
| Leukemia | K562 (CML, BCR-ABL), Jurkat (T-ALL), HL-60 (AML), THP-1 (AML, monocytic) | BCR-ABL, FLT3, NPM1 |
| Glioblastoma | U251 (TP53 mut), U87MG (PTEN-null), T98G (TP53/PTEN mut), LN229 (TP53 mut, PTEN WT) | TP53, PTEN, EGFR, IDH1 |
| Liver | HepG2 (hepatoblastoma, WT TP53), Hep3B (HBV+, TP53-null), Huh7 (HCC, TP53 Y220C) | TP53, CTNNB1, AXIN1 |
Use cell line NAME as common key across databases. IDs: DepMap=SIDM, Cellosaurus=CVCL, cBioPortal=sample (e.g. A549_LUNG), PharmacoDB/SYNERGxDB=name string. When names differ ("HCT 116" vs "HCT116"), check Cellosaurus synonyms first.
Mutation-based filtering: cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS") → filter by amino_acid_change → extract cell line names → query other databases.
| Issue | Resolution |
|---|---|
| DepMap returns no results for cell line name | Try alternative names: check Cellosaurus synonyms first |
| cBioPortal CCLE study ID unknown | Use ccle_broad_2019 as default CCLE study |
| PharmacoDB cell line name mismatch | Use PharmacoDB_search(operation="search", query="<name>") to find the canonical name |
| HPA cell line not supported | Only 10 lines supported (hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa). Skip HPA for other lines |
| CLUE requires API key | Skip CLUE tools if CLUE_API_KEY not set; note in report |
| Gene symbol not found in DepMap | Use DepMap_search_genes(query="<symbol>") to check aliases |
| Cellosaurus accession pattern | Must be CVCL_XXXX format; search first if you only have a name |
| SYNERGxDB no results for drug combo | Drug may not be in database; SYNERGxDB covers cytotoxic agents, not most targeted therapies |
Before finalizing the report, verify:
compound_nametissue_namemdata_typeper_page| Gene-drug sensitivity correlations |
PharmacoDB_search | operation="search", query | Find PharmacoDB IDs |
CellMarker_search_cancer_markers | operation="search_cancer_markers", cancer_type, gene_symbol, cell_type | Cancer cell markers |
CellMarker_search_by_gene | operation="search_by_gene", gene_symbol (required), species | Cell types expressing a gene |
HPA_get_comparative_expression_by_gene_and_cellline | gene_name (required), cell_line (required) | Supported lines: ishikawa, hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251 |
CLUE_get_cell_lines | operation="get_cell_lines", cell_id | L1000 CMap cell line info (requires CLUE_API_KEY) |
SYNERGxDB_search_combos | drug_name_1, drug_name_2, sample (tissue or cell ID) | Drug combination synergy (ZIP, Bliss, Loewe) |
SYNERGxDB_list_cell_lines | - | All cell lines in SYNERGxDB |
DGIdb_get_drug_gene_interactions | genes: list[str] | Druggable gene interactions |
OpenTargets_get_associated_drugs_by_target_ensemblID | ensemblId, size | Drugs targeting a gene |
STRING_get_network | protein_ids: list[str], species: int (9606) | PPI network for gene context |
MyGene_query_genes | query (NOT q) | Resolve gene symbol to Ensembl ID |
cBioPortal_get_mutations | study_id, gene_list (STRING, not array) | Cell line mutations from CCLE |