Do Ngoc Duc duc-dn

👋 Hi, I'm Duc

🚀 Data Engineer @ VNPT AI
🔹 Building scalable Lakehouse & Data Platforms
🔹 Experienced with Big Data, Streaming, and Cloud Infrastructure
🔹 Passionate about Data Infrastructure, APIs, and Workflow Orchestration

🔧 Tech Stack

💻 Programming Languages

Python (ETL, APIs, data pipelines, orchestration)
Java (Big Data, Kafka, Flink, Spark ecosystem)

📊 Data & Lakehouse

Apache Iceberg, Delta Lake
Apache Spark, Apache Flink
Kafka, Kafka Connect, Debezium (CDC from Postgres/MySQL/MongoDB)

☁️ Cloud & Storage

Google BigQuery, Cloud Scheduler
AWS S3, MinIO
GCS (Google Cloud Storage)

🗄️ Databases & Vector Search

PostgreSQL, MySQL, MongoDB
Qdrant (Vector Database)

📈 BI & Visualization

Apache Superset

🕒 Workflow Orchestration & Scheduling

Apache Airflow, Cronjob, Cloud Scheduler

⚙️ DevOps & Infra

Docker, Docker Compose
Kubernetes, Helm
Terraform, GitHub Actions

🌐 API & Software

FastAPI
Git, GitHub (version control & collaboration)

📌 Featured Projects

🏗️ Lakehouse with Iceberg + Spark – End-to-end data lakehouse with schema evolution & time travel
🔄 CDC Data Pipelines with Debezium + Kafka – Real-time CDC ingestion from Postgres/MySQL/MongoDB into lakehouse
📊 BigQuery ETL Framework – Managed ETL workflows using Airflow + BigQuery
☁️ Data Platform on GCP – Orchestration with Cloud Scheduler, storage in GCS, analytics in BigQuery
📈 BI Dashboard with Superset – Interactive dashboards on top of data warehouse
🐳 K8s Data Service Deployment – Deploying scalable data services with Helm & Kubernetes
🔎 Vector Search with Qdrant – Semantic search and embedding-powered retrieval pipeline

🌱 What I’m Learning

Data mesh & federated query engines (Trino/Presto, Dremio)
Advanced Iceberg optimizations (partitioning, compaction, metadata scaling)
Hybrid pipelines (batch + streaming with Flink + Spark)
AI/LLM integration with vector databases (Qdrant)

📫 Connect with Me

Email

⭐️ From ducdn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly