🚀 Data Engineer @ VNPT AI
🔹 Building scalable Lakehouse & Data Platforms
🔹 Experienced with Big Data, Streaming, and Cloud Infrastructure
🔹 Passionate about Data Infrastructure, APIs, and Workflow Orchestration
- Python (ETL, APIs, data pipelines, orchestration)
- Java (Big Data, Kafka, Flink, Spark ecosystem)
- Apache Iceberg, Delta Lake
- Apache Spark, Apache Flink
- Kafka, Kafka Connect, Debezium (CDC from Postgres/MySQL/MongoDB)
- Google BigQuery, Cloud Scheduler
- AWS S3, MinIO
- GCS (Google Cloud Storage)
- PostgreSQL, MySQL, MongoDB
- Qdrant (Vector Database)
- Apache Superset
- Apache Airflow, Cronjob, Cloud Scheduler
- Docker, Docker Compose
- Kubernetes, Helm
- Terraform, GitHub Actions
- FastAPI
- Git, GitHub (version control & collaboration)
- 🏗️ Lakehouse with Iceberg + Spark – End-to-end data lakehouse with schema evolution & time travel
- 🔄 CDC Data Pipelines with Debezium + Kafka – Real-time CDC ingestion from Postgres/MySQL/MongoDB into lakehouse
- 📊 BigQuery ETL Framework – Managed ETL workflows using Airflow + BigQuery
- ☁️ Data Platform on GCP – Orchestration with Cloud Scheduler, storage in GCS, analytics in BigQuery
- 📈 BI Dashboard with Superset – Interactive dashboards on top of data warehouse
- 🐳 K8s Data Service Deployment – Deploying scalable data services with Helm & Kubernetes
- 🔎 Vector Search with Qdrant – Semantic search and embedding-powered retrieval pipeline
- Data mesh & federated query engines (Trino/Presto, Dremio)
- Advanced Iceberg optimizations (partitioning, compaction, metadata scaling)
- Hybrid pipelines (batch + streaming with Flink + Spark)
- AI/LLM integration with vector databases (Qdrant)
⭐️ From ducdn

