Upgrade CoreWeave deployments and migrate between GPU types. Use when migrating from A100 to H100, upgrading CUDA versions, or updating inference server versions. Trigger with phrases like "upgrade coreweave", "coreweave gpu migration", "coreweave cuda upgrade", "migrate coreweave".
CoreWeave is a GPU-specialized cloud provider running Kubernetes-native infrastructure. Migrations involve upgrading between GPU instance types (A100 to H100), updating CUDA driver versions, and handling Kubernetes API version changes across namespaces. Tracking API versions is critical because CoreWeave's instance type labels and resource quotas change between platform releases, and deploying to a deprecated instance class will cause scheduling failures.
import { KubeConfig, CoreV1Api } from "@kubernetes/client-node";
async function detectCoreWeaveVersion(): Promise<void> {
const kc = new KubeConfig();
kc.loadFromDefault();
const k8sApi = kc.makeApiClient(CoreV1Api);
// Check current namespace GPU allocations
const pods = await k8sApi.listNamespacedPod("my-namespace");
for (const pod of pods.body.items) {
const gpuClass = pod.spec?.nodeSelector?.["gpu.nvidia.com/class"];
const cudaVersion = pod.metadata?.labels?.["cuda-version"];
console.log(`Pod ${pod.metadata?.name}: GPU=${gpuClass}, CUDA=${cudaVersion}`);
}
// Detect deprecated instance types
const deprecated = ["A100_PCIE_40GB", "V100_PCIE_16GB", "RTX_A5000"];
const activeGpus = pods.body.items
.map((p) => p.spec?.nodeSelector?.["gpu.nvidia.com/class"])
.filter(Boolean);
const stale = activeGpus.filter((g) => deprecated.includes(g!));
if (stale.length > 0) console.warn(`Deprecated GPU types in use: ${stale.join(", ")}`);
}
gpu.nvidia.com/class node selectorsnvidia.com/gpu) for new instance memoryapps/v1 changes)// CoreWeave instance type labels changed in 2025 platform update
// Old: gpu.nvidia.com/class: "A100_PCIE_80GB"
// New: gpu.nvidia.com/class: "H100_SXM5_80GB"
interface DeploymentMigration {
oldSelector: Record<string, string>;
newSelector: Record<string, string>;
cudaMinVersion: string;
}
const GPU_MIGRATIONS: DeploymentMigration[] = [
{
oldSelector: { "gpu.nvidia.com/class": "A100_PCIE_80GB" },
newSelector: { "gpu.nvidia.com/class": "H100_SXM5_80GB" },
cudaMinVersion: "12.4",
},
{
oldSelector: { "gpu.nvidia.com/class": "A100_SXM4_80GB" },
newSelector: { "gpu.nvidia.com/class": "H100_SXM5_80GB" },
cudaMinVersion: "12.4",
},
];
function migrateNodeSelector(manifest: any, migration: DeploymentMigration): any {
const selector = manifest.spec?.template?.spec?.nodeSelector;
if (!selector) return manifest;
for (const [key, oldVal] of Object.entries(migration.oldSelector)) {
if (selector[key] === oldVal) {
selector[key] = migration.newSelector[key];
}
}
return manifest;
}
import { AppsV1Api, KubeConfig } from "@kubernetes/client-node";
async function rollbackDeployment(namespace: string, name: string): Promise<void> {
const kc = new KubeConfig();
kc.loadFromDefault();
const appsApi = kc.makeApiClient(AppsV1Api);
// Kubernetes rollout undo — reverts to previous revision
const deployment = await appsApi.readNamespacedDeployment(name, namespace);
const currentRevision = deployment.body.metadata?.annotations?.["deployment.kubernetes.io/revision"];
console.log(`Rolling back ${name} from revision ${currentRevision}`);
// Patch to trigger rollback via revision annotation
await appsApi.patchNamespacedDeployment(name, namespace, {
spec: { template: { metadata: { annotations: { "kubectl.kubernetes.io/restartedAt": new Date().toISOString() } } } },
}, undefined, undefined, undefined, undefined, undefined, { headers: { "Content-Type": "application/strategic-merge-patch+json" } });
console.log(`Rollback initiated for ${name} in ${namespace}`);
}
| Migration Issue | Symptom | Fix |
|---|---|---|
| GPU class not schedulable | Pod stuck in Pending with Insufficient nvidia.com/gpu | Verify instance type exists in target region; check quota |
| CUDA version mismatch | Container crashes with CUDA driver version is insufficient | Rebuild container with CUDA matching target GPU driver |
| Namespace quota exceeded | Forbidden: exceeded quota on deployment | Request quota increase for new instance type via CoreWeave dashboard |
| PVC migration failure | VolumeAttachment timeout on new node | Detach old PVC, recreate in target availability zone |
| API version deprecated | no matches for kind "Deployment" in version "extensions/v1beta1" | Update manifest to apps/v1 and adjust spec fields |
For CI/CD pipeline integration, see coreweave-ci-integration.