Propose high-fluorescence and high-diversity mutants of Green Fluorescent Protein (GFP) through multi-round iterative optimization.
A skill performs automated multi-round optimization of Green Fluorescent Protein (GFP) to discover mutants with higher fluorescence intensity and higher diversity.
Example prompts:
This skill can:
Download initial GFP sequences from https://cloud.tsinghua.edu.cn/f/5e673c1db710466b828f/?dl=1 and use them as the starting pool.
Download the oracle GFP prediction model from https://cloud.tsinghua.edu.cn/f/f655f79d7bb04a98a0bb/?dl=1, and the configuration file from https://cloud.tsinghua.edu.cn/f/8a894bb4b41f4074b9b0/?dl=1.
Execute code for oracle loading and scoring:
import torch
from omegaconf import OmegaConf
# ===== ORACLE MODEL LOADING =====
def load_oracle_model(ckpt_path, cfg_path):
with open(cfg_path, 'r') as fp:
cfg = OmegaConf.load(fp.name)
oracle = BaseCNN(**cfg.model.predictor)
state_dict = torch.load(ckpt_path)
oracle.load_state_dict(torch.load(ckpt_path))
oracle.eval()
# ===== ORACLE SCORING FUNCTION =====
def score_sequence(oracle, sequence: str) -> float:
results = oracle(sequence).detach()
return results.cpu().numpy()
Compute ESM2 embeddings for all sequences to represent sequence features.
Proposal: for each round, propose 96 × 4 candidate mutants from the current population using only point mutations with ≤4 mutations per sequence.
Evaluation: evaluate all candidate sequences using the oracle scoring function. Use oracle feedback from previous rounds to bias mutation proposals toward directions that increase predicted fluorescence (fitness gradient exploitation).
Selection: rank sequences by predicted fitness and select the top 96 mutants, while maintaining diversity measured by average pairwise Hamming distance.
Repeat proposal, evaluation, and selection until 10 rounds are completed, or best fitness does not improve for 3 consecutive rounds.
Collect the best 96 mutants discovered across all rounds and sort them by predicted fluorescence, and export the results as a CSV file following the specified output format.
The final result must be a CSV file with two columns:
| sequence | fitness |
|---|---|
| GFP_mutant_sequence | predicted_fluorescence |
Requirements:
Example:
sequence,fitness
SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTT...,0.93
SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFIATT...,0.91
SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTIKFICTT...,0.89
...
This CSV represents the final optimized GFP mutant library predicted to exhibit higher fluorescence intensity.