ESM-3 protein embeddings, encoder integration, pooling strategies
| Model | Parameters | Embedding Dim | Recommended |
|---|---|---|---|
| esm3-sm-open-v1 | 1.4B | 1,536 | Default |
ESM-3 (frozen) → Per-residue [L, 1536]
↓
Attention Pooling → [32, 1536]
↓
MLP Projector → [32, LLM_dim]
↓
LLM (with LoRA)
from src.models.protein_encoder import ESM3Encoder
encoder = ESM3Encoder(
model_name="esm3-sm-open-v1",
freeze=True # ALWAYS True
)
# Get embeddings
embeddings = encoder(sequences) # [batch, seq_len, 1536]
| Method | Performance | Memory | Use When |
|---|---|---|---|
| Attention | Best | Higher | Default choice |
| Mean | Good | Lower | Memory constrained |
| CLS | Okay | Lowest | Quick experiments |