Create data pipeline and analytics architecture diagrams using PlantUML syntax with database/analytics stencil icons. Best for ETL pipelines, data lakes, real-time streaming, data warehousing, and BI dashboard design.
Quick Start: Define data sources → Declare ingestion/ETL icons → Connect to storage/warehouse → Add BI/visualization → Wrap in ```plantuml fence.
⚠️ IMPORTANT: Always use
```plantumlor```pumlcode fence. NEVER use```text— it will NOT render as a diagram.
@startuml and ends with @endumlleft to right direction for data pipelines (Source → Ingest → Transform → Store → Visualize)mxgraph.aws4.* stencil syntax for analytics, database, and storage iconsfillColor or strokeColorrectangle "Zone" { ... } or for grouping pipeline stagespackage "Layer" { ... }-->, async/streaming flows use ..> (dashed)Full stencil reference: See stencils/README.md for 9500+ available icons.
mxgraph.aws4.<icon> "Label" as <alias>
| Category | Stencils | Purpose |
|---|---|---|
| Query Engine | athena, athena_data_source_connectors | Serverless SQL on S3 data |
| ETL | glue, glue_crawlers, glue_data_catalog, aws_glue_data_quality, aws_glue_for_ray | Data integration & cataloging |
| Streaming | kinesis, kinesis_data_streams, kinesis_data_firehose, kinesis_data_analytics, kinesis_video_streams | Real-time data streaming |
| MapReduce | emr, emr_engine, emr_engine_mapr_m3, emr_engine_mapr_m5 | Big data processing (Spark, Hive) |
| Data Warehouse | redshift, redshift_ra3, redshift_streaming_ingestion, redshift_ml | Columnar analytics warehouse |
| Search | opensearch_service_data_node, opensearch_ingestion, cloudsearch | Full-text search & log analytics |
| BI | quicksight | Dashboards & visualizations |
| Data Lake | lake_formation, s3, glacier, glacier_deep_archive | Governed data lake storage |
| Catalog | datazone_custom_asset_type, data_exchange | Data governance & sharing |
| Streaming Kafka | msk, msk_connect | Managed Kafka streaming |
| Category | Stencils | Purpose |
|---|---|---|
| Relational | aurora, aurora_instance, rds, rds_instance, rds_mysql_instance, rds_postgresql_instance | Transactional databases |
| NoSQL | dynamodb, dynamodb_table, dynamodb_global_secondary_index, dynamodb_stream | Key-value & document store |
| Graph | neptune | Graph database |
| In-Memory | elasticache, elasticache_for_redis, elasticache_for_memcached | Cache & session store |
| Document | documentdb, documentdb_with_mongodb_compatibility | Document database |
| Ledger | quantum_ledger_database | Immutable transaction log |
| Wide-Column | keyspaces | Cassandra-compatible |
| Syntax | Meaning | Use Case |
|---|---|---|
A --> B | Solid arrow | Batch data flow / API call |
A ..> B | Dashed arrow | Streaming / async / CDC |
A -- B | Solid line | Bidirectional sync |
A --> B : "label" | Labeled connection | Describe data format or volume |
@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs
s3 --> glue
glue --> rs
rs --> qs
@enduml
| Type | Purpose | Key Stencils | Example |
|---|---|---|---|
| Data Lake | Centralized raw data store | s3, lake_formation, glue, athena | data-lake.md |
| Real-time Streaming | Event stream processing | kinesis, msk, lambda_function, opensearch_service | real-time-streaming.md |
| Data Warehouse | Star-schema analytics | redshift, glue, quicksight | data-warehouse.md |
| ETL Pipeline | Extract-transform-load | glue, glue_crawlers, glue_data_catalog, s3 | etl-pipeline.md |
| Log Analytics | Centralized logging | kinesis_data_firehose, opensearch_service, lambda_function | log-analytics.md |
| ML Feature Store | Feature engineering pipeline | glue, s3, athena, emr | ml-feature-pipeline.md |
| CDC Pipeline | Database change capture | dynamodb_streams, kinesis, lambda_function, redshift | cdc-pipeline.md |
| Multi-source BI | Cross-database reporting | aurora, dynamodb, redshift, quicksight | multi-source-bi.md |