Skip to content
View bala93kumar's full-sized avatar
💭
Works with data pipe lines
💭
Works with data pipe lines

Block or report bala93kumar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bala93kumar/README.md

👋 Hi, I'm Balakumar

💼 Data Engineer | Spark & Databricks Specialist | Cloud Data Pipelines

Welcome to my GitHub! I love building scalable, optimized data pipelines that power analytics and business decisions.


🚀 Tech Stack & Expertise

🔹 Big Data & Distributed Processing

  • Apache Spark (PySpark, Spark SQL)
  • Databricks (Workflows, Delta Lake, Z-Ordering, Optimizations)
  • SAP HANA Data Extraction & Performance Tuning
  • Parallelism, Shuffle Optimization, Cluster Tuning

🔹 Data Engineering & ETL

  • Complex SQL Transformations & CTE Pipelines
  • Metadata-driven ETL Frameworks
  • Snapshot Validation, Partition Management
  • Incremental Loads & Rolling-window Logic
  • Job Monitoring Dashboards

🔹 Cloud & Storage

  • AWS Glue ETL
  • Delta Lake
  • Lakehouse Architectures
  • Snowflake (Community Edition)

🔹 Tools & Languages

  • Python (ETL frameworks, automation)
  • SQL (Analytical queries, joins optimization)
  • REST APIs (Databricks Jobs API)

📊 What I Work On

  • Optimizing large-scale Spark SQL jobs
  • Improving slow ETL pipelines
  • Building data quality & monitoring frameworks
  • Snapshot comparison systems for B2B analytics
  • Designing scalable metadata-based ETL workflows

📚 Currently Learning

  • Kubernetes (K8s)
  • Docker
  • GitHub Actions
  • Spark on Kubernetes (future goal)

🛠️ Projects You'll Find Here

  • Automated Snapshot Validation System
  • Databricks Job Monitoring Dashboard
  • Metadata-driven PySpark ETL Framework
  • Delta Lake Optimization Scripts

📫 Contact

If you'd like to collaborate or discuss data engineering ideas — feel free to reach out on my linkedin profile Bala !


Thanks for visiting my profile!

Popular repositories Loading

  1. spring_flight_reserv_app spring_flight_reserv_app Public

    simple flight reservation app using spring

    Java 1

  2. Item_Based_collaborative_filtering Item_Based_collaborative_filtering Public

    sample code for item based collaborative filtering recommendation engine

    Scala

  3. Machine-learning-on-spark Machine-learning-on-spark Public

    Trying Machine learning modules on pyspark

    Jupyter Notebook

  4. Machine-learning-101 Machine-learning-101 Public

    ML 101

    Jupyter Notebook 1

  5. Dice-Roll-Game Dice-Roll-Game Public

    Developed using JS

    JavaScript

  6. NodeJs NodeJs Public

    NodeJs Tutorials

    JavaScript