Query and retrieve protein sequences, annotations, and functional data from UniProt. Supports text search, ID mapping between databases, batch downloads, and access to Swiss-Prot (reviewed) and TrEMBL (predicted) entries.
UniProt serves as the authoritative resource for protein sequence data and functional annotations. This skill enables programmatic access to search proteins by various criteria, retrieve FASTA sequences, translate identifiers between biological databases, and query both manually curated (Swiss-Prot) and computationally predicted (TrEMBL) protein records.
No package installation required - UniProt provides a REST API accessed via HTTP requests:
import requests
# Test connectivity
resp = requests.get("https://rest.uniprot.org/uniprotkb/P53_HUMAN.json")
print(resp.json()["primaryAccession"]) # Q9NZC2 or similar
Find proteins by keywords, names, or descriptions:
import requests
endpoint = "https://rest.uniprot.org/uniprotkb/search"
params = {
"query": "hemoglobin AND organism_id:9606 AND reviewed:true",
"format": "json",
"size": 10
}
resp = requests.get(endpoint, params=params)
results = resp.json()
for entry in results["results"]:
acc = entry["primaryAccession"]
name = entry["proteinDescription"]["recommendedName"]["fullName"]["value"]
print(f"{acc}: {name}")
UniProt uses a powerful query language with field prefixes and boolean operators:
# Boolean combinations
hemoglobin AND organism_id:9606
(kinase OR phosphatase) AND reviewed:true
receptor NOT bacteria
# Field-specific queries