Merge multiple fine-tuned LLM checkpoints using mergekit with methods like linear interpolation, SLERP, TIES, DARE, task arithmetic, and frankenmerging. Use when combining specialized model capabilities without retraining — e.g., merging a code model with a chat model. Do not use for training, LoRA adapter composition, or inference serving.
Combine multiple fine-tuned model checkpoints into a single model using weight-space merging techniques via mergekit, selecting the right merge strategy and validating results with benchmark evaluation.
Use this skill when:
fine_tuned - base) and adding/subtracting themfine-tuning or pretraining-pipelinemodel-architecturemodel.config.to_dict().merge_method: linear): Weighted average of parameters. Simple, works well for similar models. Config: weight: 0.6 per model.merge_method: slerp): Spherical interpolation between exactly 2 models. Smoother blending, better for dissimilar fine-tunes. Set t: 0.5 for equal blend.merge_method: ties): Trim small deltas, elect sign by majority, merge. Best when models have conflicting parameter updates. Set density: 0.5 to keep top 50% of delta magnitudes.merge_method: dare_ties): Randomly drop delta elements and rescale survivors. Effective for merging many models. Set density: 0.3 for aggressive sparsification.task_vector = fine_tuned_weights - base_weights, then merged = base + α * task_vector_A + β * task_vector_B.merge_method: passthrough): Interleave layers from different models to create a deeper model. E.g., take layers 0-15 from model A and layers 8-23 from model B to make a 32-layer hybrid.merge_method, slices (layer ranges), models with paths and weights, and parameters (density, t).mergekit-yaml config.yaml ./output_dir --cuda --trust-remote-code.Merge Config — Complete mergekit YAML with method, models, weights, slice ranges, and parametersSource Model Inventory — List of source models with their base, fine-tune domain, and architecture hashEvaluation Comparison — Table of benchmark scores: each source model vs merged modelMethod Rationale — Why the chosen merge method suits these specific modelsmodel-architecture — to verify source models share compatible architecturefine-tuning — the upstream process that produces models to be mergedsafety-alignment — merged models may lose alignment; re-evaluate safety post-mergeserving-architecture — for deploying the merged checkpointhidden_size, num_layers, or vocab_size), abort and report which fields differ.--lazy-unpickle and --low-cpu-memory flags, or merge on CPU with --no-cuda.