Provides data engineering patterns for building scalable pipelines and data infrastructure. Use when designing data architectures or pipelines. Use when optimizing ETL/ELT workflows. Use when implementing data governance and quality checks. Do not use for frontend application development. Do not use for basic database queries without scale considerations.
World-class senior data engineer skill for production-grade AI/ML/Data systems.
# Core Tool 1
python scripts/pipeline_orchestrator.py --input data/ --output results/
# Core Tool 2
python scripts/data_quality_validator.py --target project/ --analyze
# Core Tool 3
python scripts/etl_performance_optimizer.py --config config.yaml --deploy
This skill covers world-class capabilities in:
Languages: Python, SQL, R, Scala, Go ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost Data Tools: Spark, Airflow, dbt, Kafka, Databricks LLM Frameworks: LangChain, LlamaIndex, DSPy Deployment: Docker, Kubernetes, AWS/GCP/Azure Monitoring: MLflow, Weights & Biases, Prometheus Databases: PostgreSQL, BigQuery, Snowflake, Pinecone
Comprehensive guide available in references/data_pipeline_architecture.md covering:
Complete workflow documentation in references/data_modeling_patterns.md including:
Technical reference guide in references/dataops_best_practices.md with:
Enterprise-scale data processing with distributed computing:
Production ML system with high availability:
High-throughput inference system:
Latency:
Throughput:
Availability:
# Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/
# Training
python scripts/train.py --config prod.yaml
python scripts/evaluate.py --model best.pth
# Deployment
docker build -t service:v1 .
kubectl apply -f k8s/
helm upgrade service ./charts/
# Monitoring
kubectl logs -f deployment/service
python scripts/health_check.py
references/data_pipeline_architecture.mdreferences/data_modeling_patterns.mdreferences/dataops_best_practices.mdscripts/ directoryAs a world-class senior professional:
Technical Leadership
Strategic Thinking
Collaboration
Innovation
Production Excellence