Senior Systems & Machine Learning Engineer designing large-scale personalization, search, recommendation, and GenAI systems.
- 🇮🇳 Born & raised in Mumbai, India
- 📍 Currently based in Boston, MA, US
- 🧠 ML Systems • Recommenders • GenAI / Agentic Workflows
- ⚡ Focus: retrieval → ranking → serving → experimentation
- 📧 [[email protected]]
- 🔬 AI Research Portfolio (12 projects): https://github.com/DJ92/ai-research-portfolio
- Applied research: Constitutional AI, Safety, Evaluation, RAG, Interpretability
- Foundational: Transformers, Pre-training, RLHF/DPO, LoRA & Quantization
- 📘 Blog & Technical Writing: https://dj92.github.io/interview-notes
- 5 deep dives: Transformer Architecture, Pre-training, RLHF vs DPO, PEFT, Long Context LLMs
- 🧠 ML System Design: https://github.com/DJ92/ml-system-design
- News Feed Ranking system (complete design), 7 more outlined
- 🤖 GenAI Systems: https://github.com/DJ92/genai
- 9 production patterns: RAG, Agents, Guardrails, Multimodal RAG, Cost Optimization
- 🛠 Applied ML: https://github.com/DJ92/applied-ml
- 11 projects: Classical ML, RecSys, Time Series, NLP, RL Fundamentals
- AI Safety & Alignment: Constitutional AI, preference learning (RLHF/DPO), guardrails, red-teaming
- Foundational ML: Transformer architectures, pre-training techniques, parameter-efficient fine-tuning
- LLM Evaluation: Automated metrics, LLM-as-judge, faithfulness analysis, interpretability
- Production GenAI: RAG systems, agentic workflows, tool use, prompt engineering
- ML Serving Systems: Low-latency inference, ranking, retrieval at scale
- Experimentation: Bridging offline metrics ↔ online impact via A/B testing
- ML Infrastructure: Feature stores, embeddings, vector search, training pipelines
Demonstrating both applied AI research and deep foundational knowledge → https://github.com/DJ92/ai-research-portfolio
Foundational Knowledge (Projects 9-12):
-
Transformer Architecture from Scratch - Self-attention, multi-head attention, positional encodings (sinusoidal, learned, RoPE)
- 38 tests, 74% coverage, full GPT implementation with generation
-
Pre-training Techniques - Causal Language Modeling (CLM) & Masked Language Modeling (MLM)
- 11 tests, 87% coverage, 80/10/10 masking strategy
-
Post-Training Methods - Supervised Fine-Tuning, RLHF (reward model + PPO), DPO
- 12 tests, 86% coverage, Bradley-Terry preference learning
-
Parameter-Efficient Fine-Tuning - LoRA implementation, 4-bit/8-bit quantization
- 8 tests, 81% coverage, 99% parameter reduction, 4-8× compression
Alignment & Safety Research (Projects 5-7):
-
Constitutional AI & Preference Learning - Anthropic's critique-revision loop, RLHF simulation
- 92% test coverage, 112% improvement in harmlessness, 82% preference accuracy
-
CoT Faithfulness Analysis - Testing if reasoning actually drives answers via counterfactual interventions
- 91% test coverage, 94% precision detecting unfaithful reasoning
-
Agent Safety & Guardrails - Production safety mechanisms, prompt injection detection
- 94% precision, 88% recall on injection detection, 73% attack reduction
Evaluation & Interpretability (Projects 1, 8):
-
LLM Evaluation Framework - Automated metrics + LLM-as-judge, cost tracking
- 87% coverage, 0.82 judge-human correlation
-
Model Interpretability Toolkit - Attention analysis, logit attribution, perplexity metrics
- r = -0.78 PPL-accuracy correlation, 82% attribution precision
Production Systems (Projects 2-4):
-
Production RAG System - Chunking strategies, retrieval metrics, hybrid search
- 85% MRR@10 with semantic chunking
-
Tool Use & Function Calling - ReAct agents, schema validation
- 94% tool selection accuracy
-
Prompt Engineering Lab - Systematic evaluation of techniques
- Few-shot optimal: 3-5 examples (91% accuracy)
5 Deep Dives on LLM Foundations & Production → https://dj92.github.io/interview-notes
- Transformer Architecture - Self-attention mechanics, positional encodings (sinusoidal/RoPE), O(n²) complexity analysis
- LLM Pre-training - CLM vs MLM, 80/10/10 masking, warmup+cosine decay, Chinchilla scaling laws
- Alignment: RLHF vs DPO - Reward modeling, PPO, Direct Preference Optimization, Constitutional AI
- Parameter-Efficient Fine-Tuning - LoRA (99% param reduction), 4-bit/8-bit quantization, QLoRA
- Long Context LLMs - RAG vs Long Context vs Hybrid, "lost in the middle" problem, cost-quality tradeoffs
Complete Architecture Designs → https://github.com/DJ92/ml-system-design
- News Feed Ranking (COMPLETE) - Two-stage retrieval+ranking, multi-task DNN, <100ms P99, 100M+ DAU scale
- 8 System Designs Outlined - Video recommendations, search ranking, RTB ads, fraud detection, email ranking, explore/exploit, embeddings
Production Patterns → https://github.com/DJ92/genai
- 9 Project Implementations - Production RAG (hybrid search, reranking), agentic code review (ReAct pattern), multi-agent support, prompt optimization, guardrails, semantic code search, document intelligence, cost optimizer (65% reduction), multimodal RAG (88% accuracy on text+image)
Classical to Modern Techniques → https://github.com/DJ92/applied-ml
- 11 Projects - Logistic regression from scratch, decision trees/random forests, gradient boosting, collaborative filtering, neural CF, sequential recommendations (GRU4Rec), ARIMA/LSTM time series, text classification (BERT), NER (BiLSTM-CRF), RL fundamentals (Q-learning → PPO, foundation for RLHF)
- mb-str - Multi-Behavior Sequential Transformer Recommender → https://huggingface.co/Djosh1992/mb-str
- Bias for action, with mechanisms to course-correct quickly
- Treat ML models as production software, not research artifacts
- Prefer simple, well-scoped designs with explicit trade-offs and failure modes
- Make decisions with imperfect data, then validate via experiments and metrics
- Care deeply about end-to-end ownership: data → model → serving → impact
- Balance model quality, latency, and reliability based on user and business goals
- Document assumptions, measure outcomes, and learn in public (within teams)
- Optimize for observability, debuggability, and iteration speed over premature complexity
My work follows a structured progression from foundations to production systems:
┌─────────────────────────────────────────────────────────────────┐
│ FOUNDATIONS (12 projects) │
│ Transformer Architecture • Pre-training • RLHF/DPO • LoRA │
│ ai-research-portfolio/09-12 │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ RESEARCH & SAFETY (8 projects) │
│ Constitutional AI • CoT Faithfulness • Agent Safety │
│ LLM Evaluation • Interpretability • RAG • Tool Use │
│ ai-research-portfolio/01-08 │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PRODUCTION GenAI (9 projects + 5 blog posts) │
│ Production RAG • Agents • Guardrails • Multimodal RAG │
│ Cost Optimization • Long Context Strategies │
│ genai/ + interview-notes/ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ML SYSTEMS (1 complete + 7 designs, 11 projects) │
│ News Feed Ranking • RecSys • Time Series • Classical ML • RL │
│ ml-system-design/ + applied-ml/ │
└─────────────────────────────────────────────────────────────────┘
Flow: Foundation → Research → Production → Systems demonstrates both depth (can build transformers from scratch) and breadth (can deploy at scale).
Research papers implemented across the portfolio:
- ✅ Attention Is All You Need - Transformer architecture (Project 09)
- ✅ RoFormer: Enhanced Transformer with Rotary Position Embedding - RoPE implementation (Project 09)
- ✅ BERT: Pre-training of Deep Bidirectional Transformers - MLM objective (Project 10)
- ✅ Language Models are Unsupervised Multitask Learners - GPT-2 CLM (Project 10)
- ✅ Training Compute-Optimal Large Language Models - Chinchilla scaling laws (Project 10)
- ✅ Training language models to follow instructions with human feedback - InstructGPT/RLHF (Project 11)
- ✅ Direct Preference Optimization - DPO implementation (Project 11)
- ✅ Constitutional AI: Harmlessness from AI Feedback - CAI critique-revision (Project 06)
- ✅ Measuring Faithfulness in Chain-of-Thought Reasoning - CoT faithfulness (Project 07)
- ✅ LoRA: Low-Rank Adaptation of Large Language Models - LoRA implementation (Project 12)
- ✅ QLoRA: Efficient Finetuning of Quantized LLMs - 4-bit quantization (Project 12)
- ✅ Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - RAG system (Project 03, GenAI Project 01)
- ✅ Lost in the Middle: How Language Models Use Long Contexts - Long context analysis (Blog post)
- ✅ ReAct: Synergizing Reasoning and Acting in Language Models - ReAct agents (Project 04, GenAI Project 02)
- ✅ Chain-of-Thought Prompting Elicits Reasoning in Large Language Models - CoT prompting (Project 05)
- ✅ Neural Collaborative Filtering - NCF implementation (Applied ML Project 05)
- ✅ Session-based Recommendations with Recurrent Neural Networks - GRU4Rec (Applied ML Project 06)
- ✅ Proximal Policy Optimization Algorithms - PPO implementation (Applied ML Project 11)
- ✅ Playing Atari with Deep Reinforcement Learning - DQN (Applied ML Project 11)
Total: 22 papers spanning transformers, alignment, efficiency, RAG, agents, and RL.
Quantitative overview of work across all repositories:
- Lines of Code: 6,500+ (excluding tests and configs)
- Test Coverage: 85%+ average across AI research projects
- Documentation: 12,000+ lines across READMEs, blog posts, system designs
- Total Projects: 33 (12 AI research + 9 GenAI + 11 Applied ML + 1 complete system design)
- Papers Implemented: 22 foundational papers
- Quantitative Metrics: Every project includes measurable results (accuracy, latency, cost, coverage)
- Reproducibility: Fixed seeds, documented hyperparameters, requirements files
- Test Coverage by Project:
- Constitutional AI: 92%
- CoT Faithfulness: 91%
- LLM Evaluation: 87%
- Pre-training: 87%
- Post-training: 86%
- PEFT: 81%
- Transformer Architecture: 74%
- System Designs: 1 complete (News Feed Ranking), 7 outlined
- Latency Optimizations: <100ms P99 (News Feed), <500ms P99 (RAG), <50ms (Guardrails)
- Cost Optimizations: 65% reduction (GenAI cost optimizer), 100× cheaper (RAG vs Long Context)
- Scale Targets: 100M+ DAU (News Feed), 10M ratings (RecSys), 1M tokens (Long Context)
- AI Safety: 3 projects (Constitutional AI, CoT Faithfulness, Agent Safety)
- Foundational ML: 4 projects (Transformers, Pre-training, RLHF/DPO, LoRA)
- Production GenAI: 9 projects (RAG, Agents, Guardrails, Multimodal, etc.)
- Classical ML: 11 projects (Logistic Regression → RL Fundamentals)
- Technical Writing: 5 deep-dive blog posts (2,000+ lines)
- Python: Primary (PyTorch, Transformers, LangChain, scikit-learn)
- Infrastructure: Docker, Kubernetes, GCP, AWS
- Monitoring: MLflow, wandb, custom evaluation frameworks
- Serving: FastAPI, TorchServe, model optimization
Portfolio demonstrates: Foundational depth (can build transformers) + Applied research (Constitutional AI, CoT) + Production systems (RAG, ranking, serving at scale).

