Design and implement Azure cloud architectures using best practices for compute, storage, databases, AI services, networking, and governance. Use when building applications on Microsoft Azure or migrating workloads to Azure cloud platform.
Design and implement Azure cloud architectures following Microsoft's Well-Architected Framework and best practices for service selection, cost optimization, and security.
Use this skill when:
Azure offers 200+ services. Choose based on:
| Pillar | Focus | Key Practices |
|---|---|---|
| Cost Optimization | Maximize value within budget | Reserved Instances, auto-scaling, lifecycle management |
| Operational Excellence | Run reliable systems | Azure Policy, automation, monitoring |
| Performance Efficiency | Scale to meet demand | Autoscaling, caching, CDN |
| Reliability | Recover from failures | Availability Zones, multi-region, backup |
| Security | Protect data and assets | Managed Identity, Private Endpoints, Key Vault |
Reference references/well-architected.md for detailed pillar implementation patterns.
Container-based workload?
YES → Need Kubernetes control plane?
YES → Azure Kubernetes Service (AKS)
NO → Azure Container Apps (recommended)
NO → Event-driven function?
YES → Azure Functions
NO → Web application?
YES → Azure App Service
NO → Legacy/specialized → Virtual Machines
| Service | Best For | Pricing Model | Operational Overhead |
|---|---|---|---|
| Container Apps | Microservices, APIs, background jobs | Consumption or dedicated | Low |
| AKS | Complex K8s workloads, service mesh | Node-based | High |
| Functions | Event-driven, short tasks (<10 min) | Consumption or premium | Low |
| App Service | Web apps, simple APIs | Dedicated plans | Low |
| Virtual Machines | Legacy apps, specialized software | VM-based | High |
Recommendation: Start with Azure Container Apps for 80% of containerized workloads (simpler and cheaper than AKS).
Reference references/compute-services.md for detailed comparison with Bicep and Terraform examples.
| Tier | Access Pattern | Cost/GB/Month | Minimum Storage Duration |
|---|---|---|---|
| Hot | Daily access | $0.018 | None |
| Cool | <1/month access | $0.010 | 30 days |
| Cold | <90 days access | $0.0045 | 90 days |
| Archive | Rare access | $0.00099 | 180 days |
Pattern: Use lifecycle management policies to automatically move data to lower-cost tiers.
File system interface required?
YES → Protocol?
SMB → Azure Files (or NetApp Files for high performance)
NFS → Azure Files (NFS 4.1)
NO → Object storage → Blob Storage
Block storage → Managed Disks (Standard/Premium SSD/Ultra)
Analytics → Data Lake Storage Gen2
Reference references/storage-patterns.md for lifecycle policies, redundancy options, and performance tuning.
Relational data?
YES → SQL Server compatible?
YES → Need VM-level access?
YES → SQL Managed Instance
NO → Azure SQL Database
NO → Open source?
PostgreSQL → PostgreSQL Flexible Server
MySQL → MySQL Flexible Server
NO → Data model?
Document/JSON → Cosmos DB (NoSQL API)
Graph → Cosmos DB (Gremlin API)
Wide-column → Cosmos DB (Cassandra API)
Key-value cache → Azure Cache for Redis
Time-series → Azure Data Explorer
| Level | Use Case | Latency | Throughput |
|---|---|---|---|
| Strong | Financial transactions, inventory | Highest | Lowest |
| Bounded Staleness | Real-time leaderboards with acceptable lag | High | Low |
| Session | Shopping carts, user sessions (default) | Medium | Medium |
| Consistent Prefix | Social feeds, IoT telemetry | Low | High |
| Eventual | Analytics, ML training data | Lowest | Highest |
Reference references/database-selection.md for capacity planning, indexing strategies, and migration patterns.
Use Cases:
Key Advantages:
Integration Pattern:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = AzureOpenAI(
azure_endpoint="https://myopenai.openai.azure.com",
azure_ad_token_provider=token_provider,
api_version="2024-02-15-preview"
)
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
| Service | Purpose | Common Use Cases |
|---|---|---|
| Cognitive Services | Pre-built AI models | Vision, Speech, Language, Decision |
| Azure Machine Learning | Custom model training | MLOps, model deployment, feature engineering |
| Azure AI Search | Semantic search engine | RAG patterns, document search |
Reference references/ai-integration.md for RAG architecture, function calling, and fine-tuning patterns.
| Service | Pattern | Message Size | Ordering | Transactions | Best For |
|---|---|---|---|---|---|
| Service Bus | Queue/Topic | 256 KB - 100 MB | Yes (sessions) | Yes | Enterprise messaging |
| Event Grid | Pub/Sub | 1 MB | No | No | Event-driven architectures |
| Event Hubs | Streaming | 1 MB | Yes (partitions) | No | Big data ingestion, telemetry |
| Storage Queues | Simple queue | 64 KB | No | No | Async work, <500k msgs/sec |
When to Use What:
Reference references/messaging-patterns.md for implementation examples, retry policies, and dead-letter handling.
| Aspect | Private Endpoint | Service Endpoint |
|---|---|---|
| Security Model | Private IP in VNet | Optimized route to public endpoint |
| Data Exfiltration Protection | Yes (network-isolated) | Limited (service firewall only) |
| Cost | ~$7.30/month per endpoint | Free |
| Recommendation | Production workloads | Dev/test environments |
Best Practice: Use Private Endpoints for all PaaS services in production (treat public endpoints as anti-pattern).
Components:
Benefits:
Reference references/networking-architecture.md for hub-spoke Bicep templates, NSG patterns, and DNS configuration.
Always use Managed Identity instead of:
System-Assigned vs. User-Assigned:
| Type | Lifecycle | Use Case |
|---|---|---|
| System-Assigned | Tied to resource | Single resource needs access |
| User-Assigned | Independent | Multiple resources share identity |
Example Flow:
from azure.identity import DefaultAzureCredential
# Works automatically with Managed Identity
credential = DefaultAzureCredential()
keyvault_client = SecretClient(vault_url="...", credential=credential)
Reference references/identity-access.md for Entra ID integration, Conditional Access policies, and B2C patterns.
Common Policy Patterns:
Policy Effects:
Optimization Strategies:
| Pattern | Savings | Use Case |
|---|---|---|
| Reserved Instances (1-year) | 40-50% | Steady-state workloads (databases, VMs) |
| Reserved Instances (3-year) | 60-70% | Long-term commitments |
| Spot VMs | Up to 90% | Fault-tolerant batch processing |
| Auto-shutdown | Variable | Dev/test resources (off-hours) |
| Storage lifecycle policies | 50-90% | Move to Cool/Archive tiers |
Monitoring:
Reference references/governance-compliance.md for Azure Landing Zones, Policy definitions, and Blueprints.
| Tool | Best For | Azure Integration | Multi-Cloud |
|---|---|---|---|
| Bicep | Azure-native projects | Excellent (official) | No |
| Terraform | Multi-cloud environments | Good (azurerm provider) | Yes |
| Pulumi | Developer-first approach | Good (native SDK) | Yes |
| Azure CLI | Scripts and automation | Excellent | No |
Recommendation:
Reference Bicep and Terraform examples in examples/bicep/ and examples/terraform/ directories.
| Control | Implementation | Priority |
|---|---|---|
| Managed Identity | Enable on all compute resources | Critical |
| Private Endpoints | All PaaS services in production | Critical |
| Key Vault | Store secrets, keys, certificates | Critical |
| Network Segmentation | NSGs, application security groups | High |
| Microsoft Defender | Enable for all resource types | High |
| Azure Policy | Preventive controls | High |
| Just-In-Time Access | VMs and privileged access | Medium |
Reference references/security-architecture.md (see also security-hardening and auth-security skills).
Compute:
Storage:
Database:
Use Azure Pricing Calculator: https://azure.microsoft.com/pricing/calculator/
| If You Need... | Choose |
|---|---|
| Kubernetes features (CRDs, operators) | Azure Kubernetes Service |
| Microservices without K8s complexity | Azure Container Apps |
| Event-driven functions (<10 min) | Azure Functions |
| Traditional web app (Node, .NET, Python) | Azure App Service |
| Batch processing, HPC | Azure Batch or VM Scale Sets |
| Legacy application migration | Virtual Machines |
| If You Need... | Choose |
|---|---|
| SMB file shares | Azure Files |
| NFS file shares | Azure Files (NFS 4.1) |
| Object storage (images, backups) | Blob Storage |
| High-performance file storage | Azure NetApp Files |
| Block storage for VMs | Managed Disks |
| Big data analytics | Data Lake Storage Gen2 |
| If You Need... | Choose |
|---|---|
| SQL Server features (T-SQL, SQL Agent) | Azure SQL Database or Managed Instance |
| PostgreSQL | PostgreSQL Flexible Server |
| MySQL | MySQL Flexible Server |
| Global distribution, multi-model | Cosmos DB |
| In-memory cache | Azure Cache for Redis |
| Graph database | Cosmos DB (Gremlin API) |
| Time-series data | Azure Data Explorer |
For detailed implementation guidance, see:
references/compute-services.md - Container Apps, AKS, Functions, App Service with Bicep/Terraformreferences/storage-patterns.md - Blob Storage, Files, Disks, lifecycle managementreferences/database-selection.md - SQL Database, Cosmos DB, PostgreSQL patternsreferences/ai-integration.md - Azure OpenAI, RAG architecture, function callingreferences/messaging-patterns.md - Service Bus, Event Grid, Event Hubs examplesreferences/networking-architecture.md - Hub-spoke, Private Endpoints, DNS configurationreferences/identity-access.md - Entra ID, Managed Identity, RBACreferences/governance-compliance.md - Azure Policy, Landing Zones, cost optimizationreferences/well-architected.md - Five pillars implementation guideWorking examples available in:
examples/bicep/ - Infrastructure templates (Container Apps, AKS, networking, databases)examples/terraform/ - Multi-cloud IaC examplesexamples/sdk/python/ - Python SDK integration (OpenAI, Managed Identity, messaging)examples/sdk/typescript/ - TypeScript SDK examples