I'm Kshitij Gupta
Machine Learning Engineer
Specializing in Large Language Models, Deep Learning, and Production ML Systems
I am a Machine Learning Engineer at Chubb, where I design, optimize, and deploy large-scale
LLM-powered systems in production. I graduated from BITS Pilani, Pilani Campus with a
Bachelor’s degree in Electrical and Electronics Engineering.
My work focuses on large language models, agentic AI systems, and high-performance inference.
I have built end-to-end fine-tuning pipelines for LLaMA-3.1 (70B) using PEFT techniques
such as LoRA and QLoRA, combined with RAG over internal domain data. These systems improved
task accuracy by 25%, reduced production drift by 15%, and reliably serve over 10,000 daily
requests under peak load.
I specialize in scalable inference and deployment, using vLLM on Kubernetes (AKS) across
A100 and H100 GPUs. Through KV-cache and GPU memory optimizations, I reduced p95 latency by
40% and increased throughput by 50%, while maintaining strict SLA guarantees. My work also
includes production-grade APIs, CI/CD pipelines, monitoring, and automated scaling for
ML systems.
Previously, I was an NLP Research Intern at Nanyang Technological University, where I worked
on code-switching language models and published at ACIIDS and IALP. My research has also been
accepted at venues such as ACL ARR and AACL-IJCNLP. I enjoy building systems that combine
strong theoretical grounding with real-world impact, spanning agentic hiring copilots,
applied NLP research, and large-scale AI infrastructure.
• Fine-tuning at scale (LLaMA-3.1 70B): Built an end-to-end pipeline with PEFT (LoRA/QLoRA) + RAG over internal domain data, improving task accuracy on internal benchmarks by 25% and reducing production drift by 15% over the evaluation window; governed by offline holdout tests and progressive traffic gating.
• Agentic AI (orchestration & planning): Integrated a multi-agent workflow directly into the same application. A planner/router decomposes user intents into sub-goals, selects tools (internal search, scraping, structured DB lookups) based on a question taxonomy, and emits step-by-step CoT plans to guide execution.
• Agentic AI (evidence retrieval & verification): Implemented retrieval/scrape agents for structured and unstructured sources with source-level citation tracking, plus a verification agent that runs CoT-based cross-checks. This improved factual grounding vs. a single-agent baseline by 18% while keeping response times within the application SLA through caching and bounded tool-use.
• High-performance inference & deployment: Productionized with vLLM on AKS across A100/H100 nodes; GPU memory/KV cache optimizations cut p95 latency by 40% and lifted throughput by 50%, reliably serving 10K+ daily requests under peak load.
• Integrated robust CI/CD processes, monitoring, and automated scaling strategies to ensure continuous model improvement and reliable production deployments across cloud-based environments.
• Architected scalable data pipelines: Implemented robust data processing solutions using SQL Server, Azure Databricks, and PySpark, cutting processing times by 30% and significantly enhancing overall system performance.
• Integrated CI/CD, monitoring, and automated scaling: Established end-to-end processes that ensured continuous model improvement and reliable production deployments across cloud-based environments.
• Developed a machine learning language model tailored for English-Malay code-switched data, achieving a 20% improvement in accuracy over baseline models by implementing advanced statistical and neural augmentation techniques.
• Integrated linguistically informed algorithms—including part-of-speech tagging and grammatical coherence—to enhance multilingual NLP robustness and advance the state-of-the-art in code-switching language processing.
• Enhanced the model's ability to handle diverse linguistic patterns, advancing the state-of-the-art in code-switching language processing.
• Contributed to the development of bilingual communication technologies by applying cutting-edge machine learning techniques for code-switching scenarios.
Assemble AI is my open-source initiative dedicated to leveraging AI for real-world problems. Through this platform, I create innovative LLM-powered tools that demonstrate the practical applications of modern AI technologies, from personalized content generation to intelligent data processing.
SmartChef is an intelligent recipe generation system that utilizes GPT and advanced NLP techniques to craft custom recipes based on available ingredients, cuisine preferences, dietary restrictions, and nutritional needs.
Key Features:
Technologies: GPT, OpenAI API, Python, NLP, Docker, AWS
HawkHire is an end-to-end AI-powered hiring copilot that assists recruiters and interview panels with resume normalization, explainable job description matching, and evidence-backed interview analysis. It combines multi-agent orchestration with structured reasoning to deliver auditable, high-confidence hiring decisions.
Key Capabilities:
Impact:
Technologies: GPT, OpenAI API, Python, Multi-Agent Systems, RAG, NLP, Data Processing
Interested in collaborating on LLM tools or learning more about these projects?
Python, Java, C, C++, C#, JavaScript
PyTorch, TensorFlow, Transformers, Hugging Face, vLLM, Keras
AWS, Azure, Docker, Kubernetes, Git, GitHub
Flask, Spring Boot, Databricks, Maven, LaTeX
SQL, RDS
Large Language Models, NLP, Computer Vision, MLOps