技能檔案

Coreml Diag

Name: Coreml Diag
Author: tuliopc23

CoreML diagnostics - model load failures, slow inference, memory issues, compression accuracy loss, compute unit problems, conversion errors.

tuliopc230 星標2026年2月4日

職業
分類: 科學計算

技能內容

CoreML Diagnostics

Quick Reference

Symptom	First Check	Pattern
Model won't load	Deployment target	1a-1c
Slow first load	Cache miss	2a
Slow inference	Compute units	2b-2c
High memory	Concurrent predictions	3a-3b
Bad accuracy after compression	Granularity	4a-4c
Conversion fails	Operation support	5a-5b

Decision Tree

CoreML issue
├─ Load failure?
│   ├─ "Unsupported model version" → 1a
│   ├─ "Failed to create compute plan" → 1b
│   └─ Other load error → 1c
├─ Performance issue?
│   ├─ First load slow, subsequent fast? → 2a
│   ├─ All predictions slow? → 2b
│   └─ Slow only on specific device? → 2c
├─ Memory issue?
│   ├─ Memory grows during predictions? → 3a
│   └─ Out of memory on load? → 3b
├─ Accuracy degraded?
│   ├─ After palettization? → 4a
│   ├─ After quantization? → 4b
│   └─ After pruning? → 4c
└─ Conversion issue?
    ├─ Operation not supported? → 5a
    └─ Wrong output? → 5b

相關技能

Coreml Diag | Skills Pool

# Check model's minimum deployment target
import coremltools as ct
model = ct.models.MLModel("Model.mlpackage")
print(model.get_spec().specificationVersion)

mlmodel = ct.convert(
    traced,
    minimum_deployment_target=ct.target.iOS16  # Lower target
)

// Force CPU-only to bypass unsupported GPU/NE operations
let config = MLModelConfiguration()
config.computeUnits = .cpuOnly
let model = try MLModel(contentsOf: url, configuration: config)

# Float16 often better supported
mlmodel = ct.convert(traced, compute_precision=ct.precision.FLOAT16)

// Debug logging
let config = MLModelConfiguration()
config.parameters = [.reporter: { print($0) }]  // iOS 17+

// Warm cache in background at app launch
Task.detached(priority: .background) {
    _ = try? await MLModel.load(contentsOf: modelURL)
}

Cause	Fix
Running on CPU when GPU/NE available	Check `computeUnits` config
Model too large for Neural Engine	Compress model
Frequent CPU↔GPU↔NE transfers	Adjust segmentation
Dynamic shapes recompiling	Use fixed/enumerated shapes

let plan = try await MLComputePlan.load(contentsOf: modelURL)
for op in plan.modelStructure.operations {
    let info = plan.computeDeviceInfo(for: op)
    print("\(op.name): \(info.preferredDevice)")
}

// Check available compute
let devices = MLModel.availableComputeDevices
print(devices)  // Different per device

Scenario	Cause	Fix
Fast on M-series Mac, slow on iPhone	Model optimized for GPU	Use palettization (Neural Engine)
Fast on iPhone, slow on Intel Mac	No Neural Engine	Use quantization (GPU)
Slow on older devices	Less compute power	Use more aggressive compression

Instruments → Allocations + Core ML template
Look for: Many concurrent prediction intervals
Check: MLMultiArray allocations growing

actor PredictionLimiter {
    private let maxConcurrent = 2
    private var inFlight = 0

    func predict(_ model: MLModel, input: MLFeatureProvider) async throws -> MLFeatureProvider {
        while inFlight >= maxConcurrent {
            await Task.yield()
        }
        inFlight += 1
        defer { inFlight -= 1 }
        return try await model.prediction(from: input)
    }
}

# Check model size
ls -lh Model.mlpackage/Data/com.apple.CoreML/weights/

# Step 1: Try grouped channels (iOS 18+)
config = OpPalettizerConfig(
    nbits=4,
    granularity="per_grouped_channel",
    group_size=16
)

# Step 2: If still bad, try more bits
config = OpPalettizerConfig(nbits=6, ...)

# Step 3: If still need 4-bit, use calibration
from coremltools.optimize.torch.palettization import DKMPalettizer
# ... training-time compression

# Step 1: Use per-block (iOS 18+)
config = OpLinearQuantizerConfig(
    dtype="int4",
    granularity="per_block",
    block_size=32
)

# Step 2: Use calibration data
from coremltools.optimize.torch.quantization import LayerwiseCompressor
compressor = LayerwiseCompressor(model, config)
quantized = compressor.compress(calibration_loader)

# Use calibration-based pruning
from coremltools.optimize.torch.pruning import LayerwiseCompressor

config = MagnitudePrunerConfig(
    target_sparsity=0.4,
    n_samples=128
)
compressor = LayerwiseCompressor(model, config)
sparse = compressor.compress(calibration_loader)

Error: "Op 'custom_op' is not supported for conversion"

pip install --upgrade coremltools

# Instead of custom_op(x)
# Use: supported_op1(supported_op2(x))

from coremltools.converters.mil import Builder as mb

@mb.register_torch_op
def custom_op(context, node):
    # Map to MIL operations
    ...

# PyTorch often uses ImageNet normalization
# CoreML may need explicit preprocessing

# Check shapes in conversion
ct.convert(..., inputs=[ct.ImageType(shape=(1, 3, 224, 224))])

# Force Float32 to match PyTorch
ct.convert(..., compute_precision=ct.precision.FLOAT32)

# Ensure eval mode
model.eval()

# Compare outputs layer by layer
import numpy as np

torch_output = model(input).detach().numpy()
coreml_output = mlmodel.predict({"input": input.numpy()})["output"]

print(f"Max diff: {np.max(np.abs(torch_output - coreml_output))}")

Spec Version	Minimum iOS
4	iOS 13
5	iOS 14
6	iOS 15
7	iOS 16
8	iOS 17
9	iOS 18

Approach	Compression	Memory Impact
8-bit palettization	2x smaller	2x less memory
4-bit palettization	4x smaller	4x less memory
Pruning (50%)	~2x smaller	~2x less memory

Coreml Diag

CoreML Diagnostics

Quick Reference

Decision Tree

Coreml Diag

CoreML Diagnostics

Quick Reference

Decision Tree

Pattern 1a - "Unsupported model version"

Pattern 1b - "Failed to create compute plan"

Pattern 1c - General Load Failures

Pattern 2a - Slow First Load (Cache Miss)

Pattern 2b - All Predictions Slow

Pattern 2c - Slow on Specific Device

Pattern 3a - Memory Grows During Predictions

Pattern 3b - Out of Memory on Load

Pattern 4a - Bad Accuracy After Palettization

Pattern 4b - Bad Accuracy After Quantization

Pattern 4c - Bad Accuracy After Pruning

Pattern 5a - Operation Not Supported

Pattern 5b - Conversion Succeeds but Wrong Output

Pressure Scenario - "Model works on simulator but not device"

Pressure Scenario - "Ship now, optimize later"

Diagnostic Checklist

Resources

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns