Skill ファイル

Ops Platform Onboarding

Name: Ops Platform Onboarding
Author: LerianStudio

Structured workflow for onboarding new services to the internal platform including infrastructure provisioning, observability setup, and documentation.

LerianStudio174 スター2026/03/08

職業
カテゴリ: モニタリング

スキル内容

Platform Onboarding Workflow

This skill defines the structured process for onboarding services to the internal developer platform. Use it to ensure consistent, compliant service deployments.

Onboarding Phases

Phase	Focus	Output
1. Requirements	Gather service requirements	Requirements doc
2. Golden Path Selection	Choose deployment pattern	Selected template
3. Infrastructure Provisioning	Create service resources	Infrastructure ready
4. Observability Setup	Configure monitoring	Dashboards/alerts
5. Security Configuration	Apply security controls	Security validated
6. Documentation

関連 Skill

Ops Platform Onboarding | Skills Pool

## Service Onboarding Request

**Service Name:** [name]
**Team:** [owning team]
**Requested By:** [name]
**Target Date:** YYYY-MM-DD

### Service Information

| Attribute | Value |
|-----------|-------|
| Service type | [API / Worker / Batch / Frontend] |
| Language/runtime | [Go / Node.js / Python / etc.] |
| Criticality | [Tier 1/2/3/4] |
| External traffic | [Yes / No] |
| Data sensitivity | [PII / Financial / Public] |

### Resource Requirements

| Resource | Requirement | Notes |
|----------|-------------|-------|
| CPU | [cores] | [peak/average] |
| Memory | [GB] | [peak/average] |
| Storage | [GB] | [type: SSD/HDD] |
| Database | [type] | [shared/dedicated] |
| Cache | [type] | [shared/dedicated] |

### Dependencies

| Dependency | Type | SLA Required |
|------------|------|--------------|
| [service] | Internal | [Yes/No] |
| [external] | External | [Yes/No] |

### Compliance Requirements

- [ ] SOC2
- [ ] PCI-DSS
- [ ] GDPR
- [ ] HIPAA
- [ ] Other: ____________

Golden Path	Use Case	Includes
api-service	REST/GraphQL APIs	ALB, EKS, RDS, ElastiCache
worker-service	Background processing	SQS, EKS, auto-scaling
batch-job	Scheduled jobs	EventBridge, Lambda/Fargate
frontend-app	Static sites, SPAs	CloudFront, S3, API Gateway
data-pipeline	ETL, streaming	Kinesis, Glue, S3

## Golden Path Selection

**Service:** [name]
**Selected Path:** [api-service / worker-service / etc.]

### Rationale

1. Service type [X] matches [golden path] pattern
2. Traffic requirements of [X] supported by [features]
3. Compliance requirements met by built-in [controls]

### Customizations Required

| Standard Component | Customization | Reason |
|--------------------|---------------|--------|
| [component] | [change] | [why] |

### Approval

- [ ] Platform team reviewed
- [ ] Security team reviewed (if customizations)
- [ ] Architecture team reviewed (if non-standard)

# Example service provisioning
module "service" {
  source = "platform/service-template"

  service_name    = var.service_name
  team            = var.team
  environment     = var.environment
  golden_path     = "api-service"

  # Compute
  cpu_request     = "500m"
  memory_request  = "512Mi"
  replicas_min    = 2
  replicas_max    = 10

  # Database
  database_enabled = true
  database_class   = "db.t3.medium"

  # Tags
  tags = {
    Team        = var.team
    Environment = var.environment
    CostCenter  = var.cost_center
  }
}

# Verify namespace
kubectl get namespace [service-name]

# Verify compute
kubectl get deployment -n [service-name]

# Verify database
aws rds describe-db-instances --db-instance-identifier [service-db]

# Verify DNS
dig [service-name].internal.example.com

Alert	Condition	Severity	Response
High error rate	5xx > 1% for 5m	Critical	Page on-call
High latency	p99 > 1s for 5m	Warning	Alert team
Low availability	uptime < 99.9%	Critical	Page on-call
Resource saturation	CPU > 85% for 10m	Warning	Alert team

## Service Level Objectives

**Service:** [name]
**SLO Version:** 1.0

| SLI | Target | Measurement |
|-----|--------|-------------|
| Availability | 99.9% | Successful requests / total requests |
| Latency | p99 < 500ms | Request duration percentile |
| Error rate | < 0.1% | 5xx responses / total responses |

### Error Budget

- Monthly budget: 43.2 minutes downtime
- Current consumption: [X]%
- Actions if budget exceeded: [escalation process]

apiVersion: networking.k8s.io/v1

Panel	Metrics
Request rate	requests/sec, by status code
Error rate	5xx rate, 4xx rate
Latency	p50, p95, p99
Saturation	CPU, memory utilization
Dependencies	Upstream/downstream health

Ops Platform Onboarding

Platform Onboarding Workflow

Onboarding Phases

Ops Platform Onboarding

Platform Onboarding Workflow

Onboarding Phases

Phase 1: Requirements Gathering

Service Requirements Checklist

Phase 2: Golden Path Selection

Available Golden Paths

Golden Path Selection Matrix

Selection Template

Phase 3: Infrastructure Provisioning

Provisioning Checklist

Terraform/IaC Template

Provisioning Verification

Phase 4: Observability Setup

Observability Checklist

Dashboard Template

Alert Configuration

SLI/SLO Definition

Phase 5: Security Configuration

Security Checklist

Network Policy Template

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid

Requirement	api-service	worker-service	batch-job
HTTP traffic	Yes	No	No
Queue processing	Optional	Yes	Optional
Scheduled runs	No	No	Yes
Real-time	Yes	Near-real-time	No
Auto-scaling	Yes	Yes	N/A