Kegg Database | Skills Pool
Kegg Database Programmatic access to KEGG via BioServices for pathway analysis, gene functions, and metabolic cross-referencing.
ChatAndBuild 1 스타 2026. 3. 21. BioServices
Overview
BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.
When to Use This Skill
This skill should be used when:
Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
Analyzing metabolic pathways and gene functions via KEGG or Reactome
Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
Running sequence similarity searches (BLAST, MUSCLE alignment)
Querying gene ontology terms (QuickGO, GO annotations)
Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
Mining genomic data (BioMart, ArrayExpress, ENA)
npx skillvault add ChatAndBuild/chatandbuild-chatchat-skills-skills-research-kegg-database-skill-md
스타 1
업데이트 2026. 3. 21.
직업
Integrating data from multiple bioinformatics resources in a single workflow
Instruction
Utilize the BioServices Python client to query KEGG REST services for genes, pathways, and compounds.
Retrieve specific metabolic pathway maps and their associated interactions using gene symbols or identifiers.
Perform identifier mapping between KEGG and other biological databases like UniProt or ChEMBL.
Search for chemical compounds in ChEBI or PubChem and cross-reference them with KEGG metabolic pathways.
Extract network interaction data for pathways to enable downstream topological analysis with tools like NetworkX.
Execute batch ID conversion utilities to process large-scale genomic or metabolomic datasets efficiently.
Core Capabilities
1. Protein Analysis Retrieve protein information, sequences, and functional annotations:
2. Pathway Discovery and Analysis Access KEGG pathway information for genes and organisms:
lookfor_organism(), lookfor_pathway(): Search by name
get_pathway_by_gene(): Find pathways containing genes
parse_kgml_pathway(): Extract structured pathway data
pathway2sif(): Get protein interaction networks
Reference: references/workflow_patterns.md for complete pathway analysis workflows.
3. Compound Database Searches Search and cross-reference compounds across multiple databases:
Search compound by name in KEGG
Extract KEGG compound ID
Use UniChem for KEGG → ChEMBL mapping
ChEBI IDs are often provided in KEGG entries
4. Sequence Analysis Run BLAST searches and sequence alignments:
Note: BLAST jobs are asynchronous. Check status before retrieving results.
5. Identifier Mapping Convert identifiers between different biological databases:
6. Gene Ontology Queries Access GO terms and annotations:
7. Protein-Protein Interactions Available databases: MINT, IntAct, BioGRID, DIP, and 30+ others.
Multi-Service Integration Workflows BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:
Complete Protein Analysis Pipeline Execute a full protein characterization workflow:
This script demonstrates:
UniProt search for protein entry
FASTA sequence retrieval
BLAST similarity search
KEGG pathway discovery
PSICQUIC interaction mapping
Pathway Network Analysis Analyze all pathways for an organism:
All pathway IDs for organism
Protein-protein interactions per pathway
Interaction type distributions
Exports to CSV/SIF formats
Cross-Database Compound Search Map compound identifiers across databases:
KEGG compound ID
ChEBI identifier
ChEMBL identifier
Basic compound properties
Batch Identifier Conversion Convert multiple identifiers at once:
Best Practices
Different services return data in various formats:
XML : Parse using BeautifulSoup (most SOAP services)
Tab-separated (TSV) : Pandas DataFrames for tabular data
Dictionary/JSON : Direct Python manipulation
FASTA : BioPython integration for sequence analysis
Rate Limiting and Verbosity Control API request behavior:
Error Handling Wrap service calls in try-except blocks:
Organism Codes Use standard organism abbreviations:
hsa: Homo sapiens (human)
mmu: Mus musculus (mouse)
dme: Drosophila melanogaster
sce: Saccharomyces cerevisiae (yeast)
List all organisms: k.list("organism") or k.organismIds
BioServices works well with:
BioPython : Sequence analysis on retrieved FASTA data
Pandas : Tabular data manipulation
PyMOL : 3D structure visualization (retrieve PDB IDs)
NetworkX : Network analysis of pathway interactions
Galaxy : Custom tool wrappers for workflow platforms
Output
Formatted reports on gene functions, pathway memberships, and compound properties.
Network interaction files and metabolic maps for the queried biological entities.
Automated Python scripts for bulk data retrieval and identifier conversion.
02
Overview