Name: Phylogenetics and Sequence Analysis
Author: FreedomIntelligence

Phylogenetics and Sequence Analysis

Production-ready phylogenetics and sequence analysis skill for alignment processing, tree analysis, and evolutionary metrics. Computes treeness, RCV, treeness/RCV, parsimony informative sites, evolutionary rate, DVMC, tree length, alignment gap statistics, GC content, and bootstrap support using PhyKIT, Biopython, and DendroPy. Performs NJ/UPGMA/parsimony tree construction, Robinson-Foulds distance, Mann-Whitney U tests, and batch analysis across gene families. Integrates with ToolUniverse for sequence retrieval (NCBI, UniProt, Ensembl) and tree annotation. Use when processing FASTA/PHYLIP/Nexus/Newick files, computing phylogenetic metrics, comparing taxa groups, or answering questions about alignments, trees, parsimony, or molecular evolution.

FreedomIntelligence2,097 starsMar 8, 2026

Occupation
Categories: Bioinformatics

Comprehensive phylogenetics and sequence analysis using PhyKIT, Biopython, and DendroPy. Designed for bioinformatics questions about multiple sequence alignments, phylogenetic trees, parsimony, molecular evolution, and comparative genomics.

IMPORTANT: This skill handles complex phylogenetic workflows. Most implementation details have been moved to references/ for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.

When to Use This Skill

Apply when users:

Have FASTA alignment files and ask about parsimony informative sites, gaps, or alignment quality
Have Newick tree files and ask about treeness, tree length, evolutionary rate, or DVMC
Ask about treeness/RCV, RCV, or relative composition variability
Need to compare phylogenetic metrics between groups (fungi vs animals, etc.)
Ask about PhyKIT functions (treeness, rcv, dvmc, evo_rate, parsimony_informative, tree_length)
Have gene family data with paired alignments and trees
Need Mann-Whitney U tests or other statistical comparisons of phylogenetic metrics
Ask about bootstrap support, branch lengths, or tree topology

Phylogenetics and Sequence Analysis

FreedomIntelligence2,097 starsMar 8, 2026

Occupation
Categories: Bioinformatics

When to Use This Skill

Apply when users:

Have FASTA alignment files and ask about parsimony informative sites, gaps, or alignment quality

Have Newick tree files and ask about treeness, tree length, evolutionary rate, or DVMC

Ask about treeness/RCV, RCV, or relative composition variability

Need to compare phylogenetic metrics between groups (fungi vs animals, etc.)

Ask about PhyKIT functions (treeness, rcv, dvmc, evo_rate, parsimony_informative, tree_length)

Have gene family data with paired alignments and trees

Need Mann-Whitney U tests or other statistical comparisons of phylogenetic metrics

Ask about bootstrap support, branch lengths, or tree topology

START: User question about phylogenetic data │ ├─ Q1: What type of analysis is needed? │ │ │ ├─ ALIGNMENT ANALYSIS (FASTA/PHYLIP files) │ │ ├─ Parsimony informative sites → phykit_parsimony_informative() │ │ ├─ RCV score → phykit_rcv() │ │ ├─ Gap percentage → alignment_gap_percentage() │ │ ├─ GC content → alignment_statistics() │ │ └─ See: references/sequence_alignment.md │ │ │ ├─ TREE ANALYSIS (Newick files) │ │ ├─ Treeness → phykit_treeness() │ │ ├─ Tree length → phykit_tree_length() │ │ ├─ Evolutionary rate → phykit_evolutionary_rate() │ │ ├─ DVMC → phykit_dvmc() │ │ ├─ Bootstrap support → extract_bootstrap_support() │ │ └─ See: references/tree_building.md │ │ │ ├─ COMBINED ANALYSIS (alignment + tree) │ │ └─ Treeness/RCV → phykit_treeness_over_rcv() │ │ │ ├─ TREE CONSTRUCTION (build from alignment) │ │ ├─ Neighbor-Joining → build_nj_tree() │ │ ├─ UPGMA → build_upgma_tree() │ │ ├─ Parsimony → build_parsimony_tree() │ │ └─ See: references/tree_building.md │ │ │ ├─ GROUP COMPARISON (fungi vs animals, etc.) │ │ ├─ Batch compute metrics per group │ │ ├─ Mann-Whitney U test │ │ ├─ Summary statistics (median, mean, percentiles) │ │ └─ See: references/parsimony_analysis.md │ │ │ └─ TREE COMPARISON │ ├─ Robinson-Foulds distance → robinson_foulds_distance() │ └─ Bootstrap consensus → bootstrap_analysis() │ ├─ Q2: What data format is available? │ ├─ FASTA (.fa, .fasta, .faa, .fna) │ ├─ PHYLIP (.phy, .phylip) - Use phylip-relaxed for long names │ ├─ Nexus (.nex, .nexus) │ ├─ Newick (.nwk, .newick, .tre, .tree) │ └─ Auto-detect with load_alignment() or load_tree() │ └─ Q3: Is this a batch analysis? ├─ Single gene → Run metric function once ├─ Multiple genes → Use batch_compute_metric() └─ Group comparison → Use discover_gene_files() + compare_groups()

Metric	Function	Input	Description
Treeness	`phykit_treeness(tree_file)`	Newick	Internal branch length / Total branch length
RCV	`phykit_rcv(aln_file)`	FASTA/PHYLIP	Relative Composition Variability
Treeness/RCV	`phykit_treeness_over_rcv(tree, aln)`	Both	Treeness divided by RCV
Tree Length	`phykit_tree_length(tree_file)`	Newick	Sum of all branch lengths
Evolutionary Rate	`phykit_evolutionary_rate(tree_file)`	Newick	Total branch length / num terminals
DVMC	`phykit_dvmc(tree_file)`	Newick	Degree of Violation of Molecular Clock
Parsimony Sites	`phykit_parsimony_informative(aln_file)`	FASTA/PHYLIP	Sites with ≥2 chars appearing ≥2 times
Gap Percentage	`alignment_gap_percentage(aln_file)`	FASTA/PHYLIP	Percentage of gap characters

Method	Speed	Accuracy	Use Case
ClustalW	Slow	Medium	Small datasets (<100 sequences), educational
MUSCLE	Fast	High	Medium datasets (100-1000 sequences)
MAFFT	Very Fast	Very High	Recommended - Large datasets (>1000 sequences)

Method	Speed	Accuracy	Use Case
Neighbor-Joining	Fast	Medium	Quick trees, large datasets, exploratory
UPGMA	Fast	Low	Assumes molecular clock, special cases only
Maximum Parsimony	Medium	Medium	Small datasets, discrete characters
Maximum Likelihood	Slow	High	Use external tools (IQ-TREE, RAxML) for production

Question Pattern	Extraction Method
"What is the median X?"	`np.median(values)`
"What is the maximum X?"	`np.max(values)`
"What is the difference between median X for A vs B?"	`abs(np.median(a) - np.median(b))`
"What percentage of X have Y above Z?"	`sum(v > Z for v in values) / len(values) * 100`
"What is the Mann-Whitney U statistic?"	`stats.mannwhitneyu(a, b)[0]`
"What is the p-value?"	`stats.mannwhitneyu(a, b)[1]`
"What is the X value for gene Y?"	`results[gene_id]`
"What is the fold-change in median X?"	`np.median(a) / np.median(b)`
"multiplied by 1000"	`round(value * 1000)`

Project	Questions	Metrics
bix-4	7	DVMC analysis (fungi vs animals)
bix-11	6	Treeness analysis (median, percentages, Mann-Whitney U)
bix-12	5	Parsimony informative sites (counts, percentages, ratios)
bix-25	2	Treeness/RCV with gap filtering
bix-35	4	Evolutionary rate (specific genes, comparisons)
bix-38	5	Tree length (fold-change, variance, paired ratios)
bix-45	4	RCV (Mann-Whitney U, medians, paired differences)
bix-60	1	Average treeness across multiple trees

Phylogenetics and Sequence Analysis

When to Use This Skill

Phylogenetics and Sequence Analysis

When to Use This Skill

Core Principles

Required Python Packages

High-Level Workflow Decision Tree

Quick Reference: Common Metrics

Common Analysis Patterns (BixBench)

Pattern 1: Single Metric Across Groups

Pattern 2: Statistical Comparison

Pattern 3: Filtering + Metric

Pattern 4: Specific Gene Lookup

Choosing Methods: When to Use What

Alignment Methods

Tree Building Methods

Batch Processing

Discovering Gene Files

Computing Metrics in Batch

Statistical Analysis

Answer Extraction for BixBench

Rounding Rules

BixBench Question Coverage

ToolUniverse Integration

Sequence Retrieval

File Structure

Completeness Checklist

Next Steps

Support

License

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy