Carlos Forero
Data Engineer
Especializado en diseño e implementación de pipelines de datos escalables en GCP y Python
stack.py
1# data_engineer_stack.py2from dataclasses import dataclass34@dataclass5class DataEngineerStack:6cloud: list = ["GCP", "BigQuery", "Dataflow"]7processing: list = ["Apache Beam", "Airflow", "dbt"]8languages: list = ["Python", "SQL", "Bash"]9tools: list = ["Docker", "Git", "Linux"]1011def pipeline_expertise(self):12return {13"etl": "Batch & Streaming",14"scale": "TB-scale data",15"optimization": "Cost & Performance"16}
Core Competencies
Cloud & Data Warehousing
- • Google Cloud Platform
- • BigQuery optimization
- • Cloud Storage & Pub/Sub
- • IAM & Security
Data Processing
- • Apache Beam / Dataflow
- • Batch & Streaming ETL
- • Airflow orchestration
- • dbt transformations
Development & Tools
- • Python (advanced)
- • SQL optimization
- • Docker & CI/CD
- • Git version control
Featured Projects
View all →Real-time Analytics Pipeline
Streaming data pipeline processing 2.5TB/day using Pub/Sub, Dataflow, and BigQuery
GCPApache BeamBigQuery
ETL Optimization Framework
Cost optimization framework reducing BigQuery costs by 40% through query optimization and partitioning
PythonSQLOptimization
Impact Metrics
Data Processed Daily
2.5 TB
↑ 15% last month
Pipeline Uptime
99.9%
Stable
Cost Reduction
40%
↑ vs previous quarter