Access NCBI GEO for gene expression/genomics data. Search/download microarray and RNA-seq datasets (GSE, GSM, GPL), retrieve SOFT/Matrix files, for transcriptomics and expression analysis.
Instruction
- Formulate search queries to identify relevant transcriptomics or epigenomics datasets (GSE, GSM, GPL) based on study keywords.
- Utilize programmatic tools (e.g.,
GEOquery in R or getGEO in Python) to retrieve series metadata and sample information.
- Download processed expression matrices or SOFT files for deep analysis of gene expression levels across different conditions.
- Extract experimental design details, including sample groups, treatments, and platform information from the retrieved metadata.
- Validate the integrity of downloaded files and cross-reference them with published literature mentioned in the GEO series description.
- Coordinate the transition to downstream differential expression analysis (DEA) by cleaning and formatting the retrieved matrix data.
When to Use
- When searching for public microarray or RNA-seq datasets related to specific diseases, tissues, or experimental conditions.