Analyze non-coding RNAs (miRNAs, lncRNAs, circRNAs) using miRBase, LNCipedia, RNAcentral, Rfam, and target prediction databases. Covers ncRNA identification, target prediction, disease associations, expression profiling, and functional annotation. Use when asked about microRNAs, long non-coding RNAs, RNA interference, miRNA targets, lncRNA function, or ncRNA-disease associations.
Pipeline for identifying, annotating, and interpreting non-coding RNAs and their biological roles. Covers microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and other ncRNA classes.
Key principles:
Type-based reasoning — look up, don't guess: Non-coding RNA function depends on type: miRNA silences target mRNAs (look up targets in miRTarBase/TargetScan), lncRNA has diverse functions (scaffolding, guiding, decoying — check literature for the specific lncRNA), circRNA may sponge miRNAs.
For any ncRNA query: first identify the class from the name/sequence, then select the appropriate evidence source. Do not assume function based on name alone — a gene named "LINC" may have a characterized mechanism, or none at all. Always search PubMed for the specific ncRNA before interpreting. For miRNAs, validated targets (T1) from miRTarBase outweigh any computational prediction — a predicted target with no experimental support is a hypothesis, not a finding. For lncRNAs, mechanism is almost always determined by experimental studies; use PubMed_search_articles with the lncRNA name + "mechanism" or "function" to find relevant evidence. For circRNAs, miRNA sponging is the most common proposed mechanism but is frequently over-claimed — look for CLIP-seq or reporter assay evidence before asserting it.
Not this skill: For mRNA expression analysis, use tooluniverse-rnaseq-deseq2. For CRISPR screens, use tooluniverse-crispr-screen-analysis.
| Tool | Use For |
|---|---|
miRBase_search_mirna | Search miRNAs by name, accession, or sequence |
miRBase_get_mirna | Detailed miRNA info (sequence, genomic location, family) |
miRBase_get_mature_mirna | Mature miRNA sequences and annotations |
PubMed_search_articles | Search for validated miRNA targets in literature (e.g., "miR-21 target validation") |
LNCipedia_search_lncrna | Search lncRNAs by name, gene symbol, or transcript ID |
LNCipedia_get_lncrna | Detailed lncRNA transcript info (sequence, structure, conservation) |
LNCipedia_get_lncrna_xrefs | lncRNA gene info with all transcript variants |
LNCipedia_search_ncrna_by_type | List all transcripts for a lncRNA gene |
LNCipedia_get_lncrna_publications | lncRNA sequence (FASTA format) |
RNAcentral_search | Search all ncRNA types across databases |
RNAcentral_get_rna | Detailed ncRNA annotations from 40+ databases |
Rfam_get_family | RNA family details (structure, alignment, species distribution) |
Rfam_search | Search RNA families by keyword |
DisGeNET_search_gene | ncRNA-disease associations |
PubMed_search_articles | ncRNA literature |
GTEx_get_median_gene_expression | Tissue expression of ncRNA genes |
Phase 0: ncRNA Identity & Classification
Name/ID → miRBase/LNCipedia/RNAcentral → class, sequence, genomic location
|
Phase 1: Target & Interaction Analysis
miRNA → target mRNAs; lncRNA → interacting proteins/RNAs/chromatin
|
Phase 2: Expression & Tissue Specificity
GTEx/GEO → where is it expressed? Tissue-specific or ubiquitous?
|
Phase 3: Disease Associations
DisGeNET/PubMed/CTD → ncRNA-disease links with evidence
|
Phase 4: Functional Interpretation
Pathway enrichment of targets → biological role → clinical significance
ncRNA classes by size and database:
Identification workflow:
miR- or hsa-mir- → search miRBaseLINC, MALAT, HOTAIR, XIST, or ends in -AS1 → search LNCipediaFor miRNAs — the targets determine the biology:
NOTE: There is no dedicated miRNA target lookup tool in ToolUniverse. To find miRNA targets:
PubMed_search_articles(query="miR-21 target validation luciferase")miRBase_get_mirna_xrefs(accession="MIMAT0000076") — may link to external target databasesWell-studied miRNA targets (for common oncomiRs/tumor suppressors):
Target interpretation framework:
For lncRNAs — the mechanism varies:
| lncRNA Mechanism | Example | How to Investigate |
|---|---|---|
| Chromatin modifier | HOTAIR, XIST | Check interacting proteins (PRC2, LSD1) via PubMed |
| Transcription regulator | NEAT1, MEG3 | Check nearby genes (cis-regulation) via genomic location |
| miRNA sponge | MALAT1, circRNAs | Search for miRNA binding sites |
| Scaffold | NKILA, BCAR4 | Check protein interactions |
| Enhancer RNA | eRNAs | Check ENCODE enhancer annotations |
GTEx_get_median_gene_expression(gene_symbol="MIR21") # miRNA host gene expression
# Note: GTEx measures RNA-seq; miRNA expression may need miRNA-seq data from GEO
Interpretation: Tissue-restricted ncRNAs are often functionally important in that tissue. Ubiquitous ncRNAs (like MALAT1) tend to have housekeeping roles.
DisGeNET_search_gene(query="MIR21") # miR-21 disease associations
PubMed_search_articles(query="miR-21 biomarker cancer")
Key ncRNA-disease associations (well-established T1 examples — always verify via DisGeNET or PubMed for the specific ncRNA):
After identifying miRNA targets (Phase 1), run pathway enrichment:
# Collect validated target gene symbols
targets = ["PTEN", "PDCD4", "TPM1", "RECK", "SPRY1"] # miR-21 targets
# Pathway enrichment
ReactomeAnalysis_pathway_enrichment(identifiers="PTEN PDCD4 TPM1 RECK SPRY1")
STRING_get_network(identifiers="PTEN\rPDCD4\rTPM1\rRECK\rSPRY1", species=9606)
Interpretation: If miR-21 targets are enriched in apoptosis and PI3K-AKT signaling → miR-21 is an oncomiR that promotes survival by simultaneously suppressing multiple tumor suppressors.
Report structure:
TargetScan provides the best computational miRNA target predictions but has no REST API. Download and process locally:
# Step 1: Download TargetScan predicted targets (one-time, ~10MB zipped)
# URL: https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip
import pandas as pd
import zipfile, io, requests
url = "https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip"
resp = requests.get(url, timeout=60)
with zipfile.ZipFile(io.BytesIO(resp.content)) as z:
fname = z.namelist()[0]
df = pd.read_csv(z.open(fname), sep='\t')
# Step 2: Query for a specific miRNA family
mirna = "miR-21-5p" # or "miR-21/590-5p" (TargetScan uses family names)
targets = df[df['miRNA Family'].str.contains("miR-21", case=False, na=False)]
# Step 3: Rank by cumulative weighted context++ score
targets_ranked = targets.sort_values('Cumulative weighted context++ score', ascending=True)
print(f"Top 20 predicted targets of {mirna}:")
for _, row in targets_ranked.head(20).iterrows():
print(f" {row['Target Gene']:10s} score={row['Cumulative weighted context++ score']:.3f} "
f"sites={row['Total num conserved sites']}")
Interpretation: More negative context++ score = stronger predicted repression. Conserved sites (>1) are higher confidence.
miRTarBase has Cloudflare protection blocking programmatic access. Use the R/Bioconductor data package or bulk download:
# Option 1: Download from miRTarBase bulk export (requires browser download first)
# Go to: https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2025/
# Download: hsa_MTI.xlsx (human miRNA-target interactions)
# Option 2: Use the GitHub data dump
# https://github.com/jorainer/mirtarbase — R package with cached data
# Once you have the file:
import pandas as pd
mti = pd.read_excel("hsa_MTI.xlsx") # or read_csv if TSV
# Filter for your miRNA
mir21_targets = mti[mti['miRNA'].str.contains('hsa-miR-21', case=False, na=False)]
print(f"miR-21 validated targets: {len(mir21_targets)}")
# Filter by evidence strength
strong = mir21_targets[mir21_targets['Support Type'].str.contains(
'Luciferase|Reporter|Western|CLIP', case=False, na=False
)]
print(f" Strong evidence (reporter/CLIP): {len(strong)}")
for _, row in strong.head(10).iterrows():
print(f" {row['Target Gene']:10s} — {row['Support Type']}")
When download is not available: Use the built-in reference table in Phase 1 for well-studied miRNAs, or search PubMed for validated targets.