Machine Learning Engineering Standards
This skill covers the end-to-end lifecycle of building and deploying ML models, from simple regressions to complex neural networks.
1. Problem Framing
- Supervised vs Unsupervised: Do we have labeled data (e.g., patient outcomes)?
- Regression vs Classification: Are we predicting a number (blood sugar level) or a category (Risk: High/Low)?
- Baseline: Always establish a dump heuristic baseline (e.g., "Predict the average") before training a model. If your model doesn't beat the average, it's useless.
2. Data Engineering (Feature Store)
- Garbage In, Garbage Out: 80% of ML is data cleaning.
- Normalization: Scale inputs (0-1 or -1 to 1). Neural networks fail with unscaled data.
- Categorical Encoding: One-Hot Encoding vs Embeddings.
- : STRICT separation of Train / Validation / Test sets to avoid data leakage.