Build and analyze Markov State Models from molecular dynamics trajectories. Covers direct MSM construction, Observable Operator Model (OOM) bias correction, implied timescale analysis, and discretization strategies. Use for analyzing conformational dynamics, estimating transition rates, and correcting estimation bias in non-equilibrium MD data.
Markov State Models (MSMs) estimate equilibrium transition probabilities from MD trajectory data. Key challenge: short non-equilibrium trajectories produce biased MSMs. The Observable Operator Model (OOM) correction can recover unbiased estimates.
Based on: Nüske et al., J. Chem. Phys. 146, 094104 (2017).
from sklearn.cluster import KMeans
import numpy as np
# Features: cos/sin of backbone dihedrals
features = np.column_stack([np.cos(phi), np.sin(phi), np.cos(psi), np.sin(psi)])
kmeans = KMeans(n_clusters=40, random_state=42, n_init=10)
labels = kmeans.fit_predict(features).reshape(n_traj, n_frames)
State count guidelines:
from scipy import linalg
C = np.zeros((n_states, n_states))
for q in range(n_traj):
for t in range(n_frames - tau):
C[labels[q, t], labels[q, t + tau]] += 1
row_sums = C.sum(axis=1, keepdims=True)
row_sums[row_sums == 0] = 1
T = C / row_sums
evals = np.sort(np.abs(np.real(linalg.eigvals(T))))[::-1]
dt = tau * timestep_ps # lag time in ps
timescales = [-dt / np.log(ev) for ev in evals[1:] if 0 < ev < 1]
Scan τ from 1 to K/4. Implied timescales should plateau — that's your converged estimate.
The two-step tensor S^{2τ} has n_states³ bins. With 40 states = 64K bins.
| Discretization | Direct MSM t₂ | OOM t₂ | Reference |
|---|---|---|---|
| 2 states (αR/αL) | 136 ps (τ=1) | 1,036 ps | 1,400 ps |
| 40 states (k-means) | 1,086 ps (τ=1) | Diverges | 1,400 ps |
Lesson: OOM corrects in the right direction but overshoots with sparse data. Direct MSM with sufficient data and appropriate τ converges to the correct timescale.
# OpenMM setup
integrator = mm.LangevinMiddleIntegrator(300*unit.kelvin, 1/unit.picosecond, 2*unit.femtoseconds)
# Save every 25 steps (50 fs)
# 2000 saves = 100 ps per trajectory