Name: R Tidymodels — Machine Learning
Author: alexvantwisk

R Tidymodels — Machine Learning

Use when building machine learning models, predictive modeling, or model tuning in R using tidymodels, recipes, workflows, tune, or yardstick. Provides expert guidance on the split-preprocess-model-tune-evaluate pipeline, feature engineering with recipes, hyperparameter tuning, cross-validation, and model performance assessment. Triggers: tidymodels, machine learning, predictive model, recipes, parsnip, workflows, tune, yardstick, cross-validation, hyperparameter, rsample, model tuning, feature engineering, classification, random forest, xgboost, model comparison, train test split, predict. Do NOT use for inferential statistics or hypothesis testing — use r-stats instead. Do NOT use for clinical trial endpoints — use r-clinical instead.

alexvantwisk0 星标2026年4月8日

职业
分类: 机器学习

Predictive modeling with the tidymodels ecosystem: split, preprocess, model, tune, evaluate. All code uses base pipe |>, <- for assignment, and tidyverse style.

Lazy references:

Read references/recipe-steps-catalog.md for the complete step_* function reference
Read references/parsnip-engines.md for the full engine comparison table

Agent dispatch: When the question is about which model family to use (methodology, assumptions, inference), hand off to the r-statistician agent. This skill covers how to implement models with tidymodels, not which model is statistically appropriate.

MCP integration (when R session available):

Before specifying a parsnip model engine: btw_tool_sessioninfo_is_package_installed to verify the engine package is installed
Before writing recipe steps: btw_tool_env_describe_data_frame to inspect predictor types and choose appropriate preprocessing
When uncertain about recipe step or tune function arguments: btw_tool_docs_help_page to read installed docs

R Tidymodels — Machine Learning

alexvantwisk0 星标2026年4月8日

职业
分类: 机器学习

Predictive modeling with the tidymodels ecosystem: split, preprocess, model, tune, evaluate. All code uses base pipe |>, <- for assignment, and tidyverse style.

Lazy references:

Read references/recipe-steps-catalog.md for the complete step_* function reference
Read references/parsnip-engines.md for the full engine comparison table

MCP integration (when R session available):

Before specifying a parsnip model engine: btw_tool_sessioninfo_is_package_installed to verify the engine package is installed
Before writing recipe steps: btw_tool_env_describe_data_frame to inspect predictor types and choose appropriate preprocessing
When uncertain about recipe step or tune function arguments: btw_tool_docs_help_page to read installed docs

Trap	Why It Fails	Fix
Leaking test data into preprocessing	Fitting the recipe on the full dataset causes data leakage; inflated metrics	Always `prep()` the recipe on training data only; `workflow()` handles this automatically
Forgetting `prep()` and `bake()` outside workflows	A recipe object is a blueprint, not transformed data; using it raw gives the original data	Call `prep(rec, training = train)` then `bake(prepped, new_data = test)` for manual use
Using `set_mode("regression")` on a classification task	Model trains but predictions are numeric, not class labels; metrics crash	Match `set_mode()` to the outcome type; check with `class(train$outcome)`
Forgetting `finalize_workflow()` after tuning	`select_best()` returns a tibble of parameters, not a fitted model; calling `fit()` on unfinalized workflow uses `tune()` placeholders	Always call `finalize_workflow(wf, best_params)` before `last_fit()` or `fit()`
Not setting seed before `vfold_cv()`	Folds are random; results differ every run, making comparisons meaningless	Call `set.seed()` before `vfold_cv()`, `initial_split()`, and any resampling
Confusing `fit()` vs `fit_resamples()`	`fit()` trains one model on all data; `fit_resamples()` trains on each fold for evaluation only	Use `fit_resamples()` or `tune_grid()` for evaluation; use `fit()` only for the final model
Building full ML pipeline when user asked to tune one model	Scope creep introduces unnecessary recipes, stacks, or workflow sets	Deliver what was requested; suggest pipeline extensions as follow-up

R Tidymodels — Machine Learning

R Tidymodels — Machine Learning

Core Packages

Data Splitting — Conventions

Feature Engineering (recipes)

Model Specification (parsnip)

Workflows

Hyperparameter Tuning

Evaluation (yardstick)

Model Stacking

Integration with targets

Gotchas

Verification

Examples

Happy Path: Full Split-Recipe-Model-Tune-Evaluate Workflow

Edge Case: Data Leakage from Preprocessing Outside the Recipe

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns