Structured workflow for onboarding new services to the internal platform including infrastructure provisioning, observability setup, and documentation.
This skill defines the structured process for onboarding services to the internal developer platform. Use it to ensure consistent, compliant service deployments.
| Phase | Focus | Output |
|---|---|---|
| 1. Requirements | Gather service requirements | Requirements doc |
| 2. Golden Path Selection | Choose deployment pattern | Selected template |
| 3. Infrastructure Provisioning | Create service resources | Infrastructure ready |
| 4. Observability Setup | Configure monitoring | Dashboards/alerts |
| 5. Security Configuration | Apply security controls | Security validated |
| 6. Documentation |
| Complete service docs |
| Runbook ready |
| 7. Handoff | Transfer to service team | Ownership confirmed |
## Service Onboarding Request
**Service Name:** [name]
**Team:** [owning team]
**Requested By:** [name]
**Target Date:** YYYY-MM-DD
### Service Information
| Attribute | Value |
|-----------|-------|
| Service type | [API / Worker / Batch / Frontend] |
| Language/runtime | [Go / Node.js / Python / etc.] |
| Criticality | [Tier 1/2/3/4] |
| External traffic | [Yes / No] |
| Data sensitivity | [PII / Financial / Public] |
### Resource Requirements
| Resource | Requirement | Notes |
|----------|-------------|-------|
| CPU | [cores] | [peak/average] |
| Memory | [GB] | [peak/average] |
| Storage | [GB] | [type: SSD/HDD] |
| Database | [type] | [shared/dedicated] |
| Cache | [type] | [shared/dedicated] |
### Dependencies
| Dependency | Type | SLA Required |
|------------|------|--------------|
| [service] | Internal | [Yes/No] |
| [external] | External | [Yes/No] |
### Compliance Requirements
- [ ] SOC2
- [ ] PCI-DSS
- [ ] GDPR
- [ ] HIPAA
- [ ] Other: ____________
| Golden Path | Use Case | Includes |
|---|---|---|
| api-service | REST/GraphQL APIs | ALB, EKS, RDS, ElastiCache |
| worker-service | Background processing | SQS, EKS, auto-scaling |
| batch-job | Scheduled jobs | EventBridge, Lambda/Fargate |
| frontend-app | Static sites, SPAs | CloudFront, S3, API Gateway |
| data-pipeline | ETL, streaming | Kinesis, Glue, S3 |
| Requirement | api-service | worker-service | batch-job |
|---|---|---|---|
| HTTP traffic | Yes | No | No |
| Queue processing | Optional | Yes | Optional |
| Scheduled runs | No | No | Yes |
| Real-time | Yes | Near-real-time | No |
| Auto-scaling | Yes | Yes | N/A |
## Golden Path Selection
**Service:** [name]
**Selected Path:** [api-service / worker-service / etc.]
### Rationale
1. Service type [X] matches [golden path] pattern
2. Traffic requirements of [X] supported by [features]
3. Compliance requirements met by built-in [controls]
### Customizations Required
| Standard Component | Customization | Reason |
|--------------------|---------------|--------|
| [component] | [change] | [why] |
### Approval
- [ ] Platform team reviewed
- [ ] Security team reviewed (if customizations)
- [ ] Architecture team reviewed (if non-standard)
# Example service provisioning
module "service" {
source = "platform/service-template"
service_name = var.service_name
team = var.team
environment = var.environment
golden_path = "api-service"
# Compute
cpu_request = "500m"
memory_request = "512Mi"
replicas_min = 2
replicas_max = 10
# Database
database_enabled = true
database_class = "db.t3.medium"
# Tags
tags = {
Team = var.team
Environment = var.environment
CostCenter = var.cost_center
}
}
# Verify namespace
kubectl get namespace [service-name]
# Verify compute
kubectl get deployment -n [service-name]
# Verify database
aws rds describe-db-instances --db-instance-identifier [service-db]
# Verify DNS
dig [service-name].internal.example.com
Standard service dashboard includes:
| Panel | Metrics |
|---|---|
| Request rate | requests/sec, by status code |
| Error rate | 5xx rate, 4xx rate |
| Latency | p50, p95, p99 |
| Saturation | CPU, memory utilization |
| Dependencies | Upstream/downstream health |
| Alert | Condition | Severity | Response |
|---|---|---|---|
| High error rate | 5xx > 1% for 5m | Critical | Page on-call |
| High latency | p99 > 1s for 5m | Warning | Alert team |
| Low availability | uptime < 99.9% | Critical | Page on-call |
| Resource saturation | CPU > 85% for 10m | Warning | Alert team |
## Service Level Objectives
**Service:** [name]
**SLO Version:** 1.0
| SLI | Target | Measurement |
|-----|--------|-------------|
| Availability | 99.9% | Successful requests / total requests |
| Latency | p99 < 500ms | Request duration percentile |
| Error rate | < 0.1% | 5xx responses / total responses |
### Error Budget
- Monthly budget: 43.2 minutes downtime
- Current consumption: [X]%
- Actions if budget exceeded: [escalation process]
apiVersion: networking.k8s.io/v1