Access COSMIC to download mutation datasets, query Cancer Gene Census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources.
Use this skill when you need access cosmic to download mutation datasets, query cancer gene census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources in a reproducible workflow.
Use this skill when a evidence insight task needs a packaged method instead of ad-hoc freeform output.
Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
Use this skill when scripts/download_cosmic.py is the most direct path to complete the request.
Use this skill when you need the cosmic-database package behavior rather than a generic answer.
Key Features
Scope-focused workflow aligned to: Access COSMIC to download mutation datasets, query Cancer Gene Census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources.
The following example downloads a COSMIC file and loads it into a pandas DataFrame.
from scripts.download_cosmic import download_cosmic_file
import pandas as pd
# 1) Download a COSMIC dataset (example path; adjust to your target release/build)
download_cosmic_file(
email="[email protected]",
password="pwd",
filepath="GRCh38/cosmic/latest/CosmicMutantExport.tsv.gz"
)
# 2) Load the downloaded GZIP-compressed TSV
df = pd.read_csv(
"CosmicMutantExport.tsv.gz",
sep="\t",
compression="gzip"
)
# 3) Example analysis: filter by gene symbol (column name depends on the dataset)
# df_gene = df[df["Gene name"] == "TP53"]
For dataset field definitions and COSMIC file specifics, see: references/cosmic_data_reference.md.
5. Implementation Details
Authentication: Downloads require COSMIC account credentials (email/password) and are performed via an authenticated HTTP session.
File targeting: The filepath parameter specifies the COSMIC resource path (e.g., genome build such as GRCh38, release channel such as latest, and the target filename).
Data format: Many COSMIC exports are distributed as GZIP-compressed TSV (and sometimes VCF). Use pandas.read_csv(..., sep="\t", compression="gzip") for TSV .gz files.
Typical workflow:
Download the desired COSMIC export.
Load into a DataFrame (or parse VCF with an appropriate library if needed).
Filter/aggregate by gene, tumor type, sample, or signature depending on the analysis goal.