Dheeraj Joshi DJ92

Hi, I’m Dheeraj Joshi (DJ) 👋

Senior Systems & Machine Learning Engineer designing large-scale personalization, search, recommendation, and GenAI systems.

Background

🇮🇳 Born & raised in Mumbai, India
📍 Currently based in Boston, MA, US
🧠 ML Systems • Recommenders • GenAI / Agentic Workflows
⚡ Focus: retrieval → ranking → serving → experimentation
📧 [[email protected]]

Where to start

🔬 AI Research Portfolio (12 projects): https://github.com/DJ92/ai-research-portfolio
- Applied research: Constitutional AI, Safety, Evaluation, RAG, Interpretability
- Foundational: Transformers, Pre-training, RLHF/DPO, LoRA & Quantization
📘 Blog & Technical Writing: https://dj92.github.io/interview-notes
- 5 deep dives: Transformer Architecture, Pre-training, RLHF vs DPO, PEFT, Long Context LLMs
🧠 ML System Design: https://github.com/DJ92/ml-system-design
- News Feed Ranking system (complete design), 7 more outlined
🤖 GenAI Systems: https://github.com/DJ92/genai
- 9 production patterns: RAG, Agents, Guardrails, Multimodal RAG, Cost Optimization
🛠 Applied ML: https://github.com/DJ92/applied-ml
- 11 projects: Classical ML, RecSys, Time Series, NLP, RL Fundamentals

What I work on

AI Safety & Alignment: Constitutional AI, preference learning (RLHF/DPO), guardrails, red-teaming
Foundational ML: Transformer architectures, pre-training techniques, parameter-efficient fine-tuning
LLM Evaluation: Automated metrics, LLM-as-judge, faithfulness analysis, interpretability
Production GenAI: RAG systems, agentic workflows, tool use, prompt engineering
ML Serving Systems: Low-latency inference, ranking, retrieval at scale
Experimentation: Bridging offline metrics ↔ online impact via A/B testing
ML Infrastructure: Feature stores, embeddings, vector search, training pipelines

Featured Projects

🎯 AI Research Portfolio (12 Projects)

Demonstrating both applied AI research and deep foundational knowledge → https://github.com/DJ92/ai-research-portfolio

Foundational Knowledge (Projects 9-12):

Transformer Architecture from Scratch - Self-attention, multi-head attention, positional encodings (sinusoidal, learned, RoPE)
- 38 tests, 74% coverage, full GPT implementation with generation
Pre-training Techniques - Causal Language Modeling (CLM) & Masked Language Modeling (MLM)
- 11 tests, 87% coverage, 80/10/10 masking strategy
Post-Training Methods - Supervised Fine-Tuning, RLHF (reward model + PPO), DPO
- 12 tests, 86% coverage, Bradley-Terry preference learning
Parameter-Efficient Fine-Tuning - LoRA implementation, 4-bit/8-bit quantization
- 8 tests, 81% coverage, 99% parameter reduction, 4-8× compression

Alignment & Safety Research (Projects 5-7):

Constitutional AI & Preference Learning - Anthropic's critique-revision loop, RLHF simulation
- 92% test coverage, 112% improvement in harmlessness, 82% preference accuracy
CoT Faithfulness Analysis - Testing if reasoning actually drives answers via counterfactual interventions
- 91% test coverage, 94% precision detecting unfaithful reasoning
Agent Safety & Guardrails - Production safety mechanisms, prompt injection detection
- 94% precision, 88% recall on injection detection, 73% attack reduction

Evaluation & Interpretability (Projects 1, 8):

LLM Evaluation Framework - Automated metrics + LLM-as-judge, cost tracking
- 87% coverage, 0.82 judge-human correlation
Model Interpretability Toolkit - Attention analysis, logit attribution, perplexity metrics
- r = -0.78 PPL-accuracy correlation, 82% attribution precision

Production Systems (Projects 2-4):

Production RAG System - Chunking strategies, retrieval metrics, hybrid search
- 85% MRR@10 with semantic chunking
Tool Use & Function Calling - ReAct agents, schema validation
- 94% tool selection accuracy
Prompt Engineering Lab - Systematic evaluation of techniques
- Few-shot optimal: 3-5 examples (91% accuracy)

📘 Technical Writing & Blog

5 Deep Dives on LLM Foundations & Production → https://dj92.github.io/interview-notes

Transformer Architecture - Self-attention mechanics, positional encodings (sinusoidal/RoPE), O(n²) complexity analysis
LLM Pre-training - CLM vs MLM, 80/10/10 masking, warmup+cosine decay, Chinchilla scaling laws
Alignment: RLHF vs DPO - Reward modeling, PPO, Direct Preference Optimization, Constitutional AI
Parameter-Efficient Fine-Tuning - LoRA (99% param reduction), 4-bit/8-bit quantization, QLoRA
Long Context LLMs - RAG vs Long Context vs Hybrid, "lost in the middle" problem, cost-quality tradeoffs

🧠 ML System Design

Complete Architecture Designs → https://github.com/DJ92/ml-system-design

News Feed Ranking (COMPLETE) - Two-stage retrieval+ranking, multi-task DNN, <100ms P99, 100M+ DAU scale
8 System Designs Outlined - Video recommendations, search ranking, RTB ads, fraud detection, email ranking, explore/exploit, embeddings

🤖 GenAI Applications

Production Patterns → https://github.com/DJ92/genai

9 Project Implementations - Production RAG (hybrid search, reranking), agentic code review (ReAct pattern), multi-agent support, prompt optimization, guardrails, semantic code search, document intelligence, cost optimizer (65% reduction), multimodal RAG (88% accuracy on text+image)

🛠 Applied ML Research

Classical to Modern Techniques → https://github.com/DJ92/applied-ml

11 Projects - Logistic regression from scratch, decision trees/random forests, gradient boosting, collaborative filtering, neural CF, sequential recommendations (GRU4Rec), ARIMA/LSTM time series, text classification (BERT), NER (BiLSTM-CRF), RL fundamentals (Q-learning → PPO, foundation for RLHF)

🤗 HuggingFace Model

mb-str - Multi-Behavior Sequential Transformer Recommender → https://huggingface.co/Djosh1992/mb-str

Tech Stack

How I think

Bias for action, with mechanisms to course-correct quickly
Treat ML models as production software, not research artifacts
Prefer simple, well-scoped designs with explicit trade-offs and failure modes
Make decisions with imperfect data, then validate via experiments and metrics
Care deeply about end-to-end ownership: data → model → serving → impact
Balance model quality, latency, and reliability based on user and business goals
Document assumptions, measure outcomes, and learn in public (within teams)
Optimize for observability, debuggability, and iteration speed over premature complexity

Portfolio Architecture

My work follows a structured progression from foundations to production systems:

┌─────────────────────────────────────────────────────────────────┐
│                    FOUNDATIONS (12 projects)                     │
│  Transformer Architecture • Pre-training • RLHF/DPO • LoRA      │
│              ai-research-portfolio/09-12                         │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              RESEARCH & SAFETY (8 projects)                      │
│  Constitutional AI • CoT Faithfulness • Agent Safety             │
│  LLM Evaluation • Interpretability • RAG • Tool Use              │
│              ai-research-portfolio/01-08                         │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│           PRODUCTION GenAI (9 projects + 5 blog posts)           │
│  Production RAG • Agents • Guardrails • Multimodal RAG           │
│  Cost Optimization • Long Context Strategies                     │
│         genai/ + interview-notes/                                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│          ML SYSTEMS (1 complete + 7 designs, 11 projects)        │
│  News Feed Ranking • RecSys • Time Series • Classical ML • RL    │
│         ml-system-design/ + applied-ml/                          │
└─────────────────────────────────────────────────────────────────┘

Flow: Foundation → Research → Production → Systems demonstrates both depth (can build transformers from scratch) and breadth (can deploy at scale).

Papers Implemented

Research papers implemented across the portfolio:

Total: 22 papers spanning transformers, alignment, efficiency, RAG, agents, and RL.

Portfolio Metrics

Quantitative overview of work across all repositories:

Code & Documentation

Lines of Code: 6,500+ (excluding tests and configs)
Test Coverage: 85%+ average across AI research projects
Documentation: 12,000+ lines across READMEs, blog posts, system designs
Total Projects: 33 (12 AI research + 9 GenAI + 11 Applied ML + 1 complete system design)

Research Quality

Papers Implemented: 22 foundational papers
Quantitative Metrics: Every project includes measurable results (accuracy, latency, cost, coverage)
Reproducibility: Fixed seeds, documented hyperparameters, requirements files
Test Coverage by Project:
- Constitutional AI: 92%
- CoT Faithfulness: 91%
- LLM Evaluation: 87%
- Pre-training: 87%
- Post-training: 86%
- PEFT: 81%
- Transformer Architecture: 74%

Production Readiness

System Designs: 1 complete (News Feed Ranking), 7 outlined
Latency Optimizations: <100ms P99 (News Feed), <500ms P99 (RAG), <50ms (Guardrails)
Cost Optimizations: 65% reduction (GenAI cost optimizer), 100× cheaper (RAG vs Long Context)
Scale Targets: 100M+ DAU (News Feed), 10M ratings (RecSys), 1M tokens (Long Context)

Knowledge Breadth

AI Safety: 3 projects (Constitutional AI, CoT Faithfulness, Agent Safety)
Foundational ML: 4 projects (Transformers, Pre-training, RLHF/DPO, LoRA)
Production GenAI: 9 projects (RAG, Agents, Guardrails, Multimodal, etc.)
Classical ML: 11 projects (Logistic Regression → RL Fundamentals)
Technical Writing: 5 deep-dive blog posts (2,000+ lines)

Languages & Frameworks

Python: Primary (PyTorch, Transformers, LangChain, scikit-learn)
Infrastructure: Docker, Kubernetes, GCP, AWS
Monitoring: MLflow, wandb, custom evaluation frameworks
Serving: FastAPI, TorchServe, model optimization

Portfolio demonstrates: Foundational depth (can build transformers) + Applied research (Constitutional AI, CoT) + Production systems (RAG, ranking, serving at scale).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dheeraj Joshi DJ92

Achievements

Achievements

Block or report DJ92

Hi, I’m Dheeraj Joshi (DJ) 👋

Background

Where to start

What I work on

Featured Projects

🎯 AI Research Portfolio (12 Projects)

📘 Technical Writing & Blog

🧠 ML System Design

🤖 GenAI Applications

🛠 Applied ML Research

🤗 HuggingFace Model

Tech Stack

How I think

Portfolio Architecture

Papers Implemented

Transformers & Attention

Pre-training & Fine-tuning

Alignment & Safety

Parameter-Efficient Fine-tuning

Retrieval & RAG

Agents & Tool Use

Recommendation Systems

Reinforcement Learning

Portfolio Metrics

Code & Documentation

Research Quality

Production Readiness

Knowledge Breadth

Languages & Frameworks

GitHub Stats

Popular repositories Loading

Uh oh!