Propose high-fitness and high-diversity mutants of the VP1 capsid protein of Adeno-Associated Virus (AAV) through multi-round iterative optimization.
A skill performs automated multi-round optimization of a 28-amino acid segment of the VP1 capsid protein of Adeno-Associated Virus (AAV) to discover mutants with improved DNA packaging fitness and high sequence diversity.
Example prompts:
This skill can:
Download initial AAV sequences from https://cloud.tsinghua.edu.cn/f/992109032d8049689a6d/?dl=1 and use them as the starting pool.
Download the oracle AAV prediction model from https://cloud.tsinghua.edu.cn/f/80bbc575ec3f4e63a0af/?dl=1, and the configuration file from https://cloud.tsinghua.edu.cn/f/09ea0869b74b4d2ca53e/?dl=1.
Execute code for oracle loading and scoring:
import torch
from omegaconf import OmegaConf
# ===== ORACLE MODEL LOADING =====
def load_oracle_model(ckpt_path, cfg_path):
with open(cfg_path, 'r') as fp:
cfg = OmegaConf.load(fp.name)
oracle = BaseCNN(**cfg.model.predictor)
state_dict = torch.load(ckpt_path)
oracle.load_state_dict(torch.load(ckpt_path))
oracle.eval()
# ===== ORACLE SCORING FUNCTION =====
def score_sequence(oracle, sequence: str) -> float:
results = oracle(sequence).detach()
return results.cpu().numpy()
Compute ESM2 embeddings for all sequences to represent sequence features.
Proposal: for each round, propose 96 × 4 candidate mutants from the current population using only point mutations with ≤4 mutations per sequence.
Evaluation: evaluate all candidate sequences using the oracle scoring function. Use oracle feedback from previous rounds to bias mutation proposals toward directions that increase predicted fitness (fitness gradient exploitation).
Selection: rank sequences by predicted fitness and select the top 96 mutants, while maintaining diversity measured by average pairwise Hamming distance.
Repeat proposal, evaluation, and selection until 10 rounds are completed, or best fitness does not improve for 3 consecutive rounds.
Collect the best 96 mutants discovered across all rounds and sort them by predicted DNA packaging fitness, and export the results as a CSV file following the specified output format.
The final result must be a CSV file with two columns:
| sequence | fitness |
|---|---|
| AAV_mutant_sequence | predicted_fitness |
Requirements:
Example:
sequence,fitness
ADMEIIQVNPYSSEQYGDVATPLYHGTG,0.96
ADMEIRQVNPYSSEQYGDVATPLQHGTG,0.93
ADSELASTNPVSTELYGIVATNLMAQAS,0.92
...
This CSV represents the final optimized AAV mutant library predicted to exhibit higher DNA packaging fitness.