Skip to content
View DJ92's full-sized avatar
🔌
REPL
🔌
REPL
  • Wayfair
  • Boston

Block or report DJ92

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DJ92/README.md

Banner

Hi, I’m Dheeraj Joshi (DJ) 👋

Senior Systems & Machine Learning Engineer designing large-scale personalization, search, recommendation, and GenAI systems.

Background

  • 🇮🇳 Born & raised in Mumbai, India
  • 📍 Currently based in Boston, MA, US
  • 🧠 ML Systems • Recommenders • GenAI / Agentic Workflows
  • ⚡ Focus: retrieval → ranking → serving → experimentation
  • 📧 [[email protected]]

Where to start


What I work on

  • AI Safety & Alignment: Constitutional AI, preference learning (RLHF/DPO), guardrails, red-teaming
  • Foundational ML: Transformer architectures, pre-training techniques, parameter-efficient fine-tuning
  • LLM Evaluation: Automated metrics, LLM-as-judge, faithfulness analysis, interpretability
  • Production GenAI: RAG systems, agentic workflows, tool use, prompt engineering
  • ML Serving Systems: Low-latency inference, ranking, retrieval at scale
  • Experimentation: Bridging offline metrics ↔ online impact via A/B testing
  • ML Infrastructure: Feature stores, embeddings, vector search, training pipelines

Featured Projects

🎯 AI Research Portfolio (12 Projects)

Demonstrating both applied AI research and deep foundational knowledgehttps://github.com/DJ92/ai-research-portfolio

Foundational Knowledge (Projects 9-12):

  • Transformer Architecture from Scratch - Self-attention, multi-head attention, positional encodings (sinusoidal, learned, RoPE)

    • 38 tests, 74% coverage, full GPT implementation with generation
  • Pre-training Techniques - Causal Language Modeling (CLM) & Masked Language Modeling (MLM)

    • 11 tests, 87% coverage, 80/10/10 masking strategy
  • Post-Training Methods - Supervised Fine-Tuning, RLHF (reward model + PPO), DPO

    • 12 tests, 86% coverage, Bradley-Terry preference learning
  • Parameter-Efficient Fine-Tuning - LoRA implementation, 4-bit/8-bit quantization

    • 8 tests, 81% coverage, 99% parameter reduction, 4-8× compression

Alignment & Safety Research (Projects 5-7):

  • Constitutional AI & Preference Learning - Anthropic's critique-revision loop, RLHF simulation

    • 92% test coverage, 112% improvement in harmlessness, 82% preference accuracy
  • CoT Faithfulness Analysis - Testing if reasoning actually drives answers via counterfactual interventions

    • 91% test coverage, 94% precision detecting unfaithful reasoning
  • Agent Safety & Guardrails - Production safety mechanisms, prompt injection detection

    • 94% precision, 88% recall on injection detection, 73% attack reduction

Evaluation & Interpretability (Projects 1, 8):

  • LLM Evaluation Framework - Automated metrics + LLM-as-judge, cost tracking

    • 87% coverage, 0.82 judge-human correlation
  • Model Interpretability Toolkit - Attention analysis, logit attribution, perplexity metrics

    • r = -0.78 PPL-accuracy correlation, 82% attribution precision

Production Systems (Projects 2-4):

  • Production RAG System - Chunking strategies, retrieval metrics, hybrid search

    • 85% MRR@10 with semantic chunking
  • Tool Use & Function Calling - ReAct agents, schema validation

    • 94% tool selection accuracy
  • Prompt Engineering Lab - Systematic evaluation of techniques

    • Few-shot optimal: 3-5 examples (91% accuracy)

📘 Technical Writing & Blog

5 Deep Dives on LLM Foundations & Productionhttps://dj92.github.io/interview-notes

  • Transformer Architecture - Self-attention mechanics, positional encodings (sinusoidal/RoPE), O(n²) complexity analysis
  • LLM Pre-training - CLM vs MLM, 80/10/10 masking, warmup+cosine decay, Chinchilla scaling laws
  • Alignment: RLHF vs DPO - Reward modeling, PPO, Direct Preference Optimization, Constitutional AI
  • Parameter-Efficient Fine-Tuning - LoRA (99% param reduction), 4-bit/8-bit quantization, QLoRA
  • Long Context LLMs - RAG vs Long Context vs Hybrid, "lost in the middle" problem, cost-quality tradeoffs

🧠 ML System Design

Complete Architecture Designshttps://github.com/DJ92/ml-system-design

  • News Feed Ranking (COMPLETE) - Two-stage retrieval+ranking, multi-task DNN, <100ms P99, 100M+ DAU scale
  • 8 System Designs Outlined - Video recommendations, search ranking, RTB ads, fraud detection, email ranking, explore/exploit, embeddings

🤖 GenAI Applications

Production Patternshttps://github.com/DJ92/genai

  • 9 Project Implementations - Production RAG (hybrid search, reranking), agentic code review (ReAct pattern), multi-agent support, prompt optimization, guardrails, semantic code search, document intelligence, cost optimizer (65% reduction), multimodal RAG (88% accuracy on text+image)

🛠 Applied ML Research

Classical to Modern Techniqueshttps://github.com/DJ92/applied-ml

  • 11 Projects - Logistic regression from scratch, decision trees/random forests, gradient boosting, collaborative filtering, neural CF, sequential recommendations (GRU4Rec), ARIMA/LSTM time series, text classification (BERT), NER (BiLSTM-CRF), RL fundamentals (Q-learning → PPO, foundation for RLHF)

🤗 HuggingFace Model


Tech Stack

Python PyTorch GCP AWS Kubernetes Kafka Postgres


How I think

  • Bias for action, with mechanisms to course-correct quickly
  • Treat ML models as production software, not research artifacts
  • Prefer simple, well-scoped designs with explicit trade-offs and failure modes
  • Make decisions with imperfect data, then validate via experiments and metrics
  • Care deeply about end-to-end ownership: data → model → serving → impact
  • Balance model quality, latency, and reliability based on user and business goals
  • Document assumptions, measure outcomes, and learn in public (within teams)
  • Optimize for observability, debuggability, and iteration speed over premature complexity

Portfolio Architecture

My work follows a structured progression from foundations to production systems:

┌─────────────────────────────────────────────────────────────────┐
│                    FOUNDATIONS (12 projects)                     │
│  Transformer Architecture • Pre-training • RLHF/DPO • LoRA      │
│              ai-research-portfolio/09-12                         │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              RESEARCH & SAFETY (8 projects)                      │
│  Constitutional AI • CoT Faithfulness • Agent Safety             │
│  LLM Evaluation • Interpretability • RAG • Tool Use              │
│              ai-research-portfolio/01-08                         │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│           PRODUCTION GenAI (9 projects + 5 blog posts)           │
│  Production RAG • Agents • Guardrails • Multimodal RAG           │
│  Cost Optimization • Long Context Strategies                     │
│         genai/ + interview-notes/                                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│          ML SYSTEMS (1 complete + 7 designs, 11 projects)        │
│  News Feed Ranking • RecSys • Time Series • Classical ML • RL    │
│         ml-system-design/ + applied-ml/                          │
└─────────────────────────────────────────────────────────────────┘

Flow: Foundation → Research → Production → Systems demonstrates both depth (can build transformers from scratch) and breadth (can deploy at scale).


Papers Implemented

Research papers implemented across the portfolio:

Transformers & Attention

Pre-training & Fine-tuning

Alignment & Safety

Parameter-Efficient Fine-tuning

Retrieval & RAG

Agents & Tool Use

Recommendation Systems

Reinforcement Learning

Total: 22 papers spanning transformers, alignment, efficiency, RAG, agents, and RL.


Portfolio Metrics

Quantitative overview of work across all repositories:

Code & Documentation

  • Lines of Code: 6,500+ (excluding tests and configs)
  • Test Coverage: 85%+ average across AI research projects
  • Documentation: 12,000+ lines across READMEs, blog posts, system designs
  • Total Projects: 33 (12 AI research + 9 GenAI + 11 Applied ML + 1 complete system design)

Research Quality

  • Papers Implemented: 22 foundational papers
  • Quantitative Metrics: Every project includes measurable results (accuracy, latency, cost, coverage)
  • Reproducibility: Fixed seeds, documented hyperparameters, requirements files
  • Test Coverage by Project:
    • Constitutional AI: 92%
    • CoT Faithfulness: 91%
    • LLM Evaluation: 87%
    • Pre-training: 87%
    • Post-training: 86%
    • PEFT: 81%
    • Transformer Architecture: 74%

Production Readiness

  • System Designs: 1 complete (News Feed Ranking), 7 outlined
  • Latency Optimizations: <100ms P99 (News Feed), <500ms P99 (RAG), <50ms (Guardrails)
  • Cost Optimizations: 65% reduction (GenAI cost optimizer), 100× cheaper (RAG vs Long Context)
  • Scale Targets: 100M+ DAU (News Feed), 10M ratings (RecSys), 1M tokens (Long Context)

Knowledge Breadth

  • AI Safety: 3 projects (Constitutional AI, CoT Faithfulness, Agent Safety)
  • Foundational ML: 4 projects (Transformers, Pre-training, RLHF/DPO, LoRA)
  • Production GenAI: 9 projects (RAG, Agents, Guardrails, Multimodal, etc.)
  • Classical ML: 11 projects (Logistic Regression → RL Fundamentals)
  • Technical Writing: 5 deep-dive blog posts (2,000+ lines)

Languages & Frameworks

  • Python: Primary (PyTorch, Transformers, LangChain, scikit-learn)
  • Infrastructure: Docker, Kubernetes, GCP, AWS
  • Monitoring: MLflow, wandb, custom evaluation frameworks
  • Serving: FastAPI, TorchServe, model optimization

Portfolio demonstrates: Foundational depth (can build transformers) + Applied research (Constitutional AI, CoT) + Production systems (RAG, ranking, serving at scale).


GitHub Stats

GitHub Stats

GitHub Streak

Top Languages

Popular repositories Loading

  1. Equity-Analysis Equity-Analysis Public

  2. datasharing datasharing Public

    Forked from jtleek/datasharing

    The Leek group guide to data sharing

  3. datasciencecoursera datasciencecoursera Public

    1

  4. InformationRetrieval InformationRetrieval Public

    Information Retrieval

    Java

  5. BirdPrediction BirdPrediction Public

    Scala

  6. CtCI-6th-Edition CtCI-6th-Edition Public

    Forked from careercup/CtCI-6th-Edition

    Cracking the Coding Interview 6th Ed. Solutions

    Java