ML/AI Engineer & Researcher

Master's student in Computer Science at UC San Diego, specializing in Artificial Intelligence. Experienced in building scalable ML systems, RAG pipelines, and deploying production-ready AI solutions.

3.97
MS CGPA (UCSD)
15+
Projects
4
Internships

Experience

Software Engineer Intern

Artemis Life Sciences · San Diego, CA
Jul 2025 – Sep 2025
Built AI-powered FDA regulatory system processing 3,000+ documents with RAG and vector search
  • Built a large-scale FDA regulatory intelligence system by implementing PDF chunking (3,000+ documents) and metadata extraction pipelines using OpenAI models
  • Designed and deployed Pinecone-based vectorized indexing for both PDF-derived content and heterogeneous JSON datasets from multiple FDA databases
  • Developed an interactive Streamlit-powered chat interface integrated with GPT-5.1 and RAG-system, accelerating medical device submission drafting across 15+ categories
Python OpenAI Pinecone Streamlit RAG AWS S3 GPT-5.1

Research Assistant

CAIDA, SDSC · UC San Diego
May 2025 – Jul 2025
Advanced LLM research extracting structured metadata from 30+ papers using transformer models
  • Conducted large-scale analysis on 30+ scientific research papers using transformer-based LLMs (LlaMA3, DeepSeek, Gemma) to extract structured metadata
  • Formulated and evaluated multi-shot prompting strategies, including few-shot, retrieval-augmented, and context-aware techniques
  • Built end-to-end pipelines for benchmarking LLM-driven metadata extraction models supporting reproducible internet measurement research
Python LlaMA3 DeepSeek Gemma NLP RAG Transformers

Machine Learning Researcher

IIT Madras
Apr 2023 – Sep 2023
Created music analysis ML model achieving 90.3% accuracy on 6,500+ Carnatic percussion samples
  • Performed research on stroke identification in Carnatic music concerts, specializing in Mridangam analysis
  • Developed a Random Forest classifier achieving 90.3% accuracy on 6500+ single-stroke audio samples by extracting 8 time-frequency features
  • Led data preprocessing and feature engineering efforts contributing to real-time transcription for South Indian classical percussion
Python TensorFlow Essentia Librosa Scikit-Learn Signal Processing

NLP Engineer Intern

Willings Inc. · Tokyo, Japan (Remote)
Jun 2022 – Jul 2022
Delivered production NLP system classifying resumes with 85% accuracy in 6 weeks
  • Built a resume categorization system using BERT and spaCy, classifying resumes into 8 job categories with 85% accuracy
  • Designed and deployed an OpenAPI-based endpoint to support real-time job category retrieval
  • Collaborated in cross-functional team to deliver production-ready solution within 6-week timeline
Python BERT SpaCy NLP OpenAPI Scikit-Learn

Projects

Document Q&A with RAG using Fine-Tuned LLaMA-2

🤖

Built a Retrieval-Augmented Generation pipeline with a fine-tuned 4-bit quantized LLaMA-2 model using Low-Rank Adaptation (LoRA). Integrated FAISS vector database for efficient document retrieval and similarity search. Optimized memory and retrieval latency by combining dynamic batching, cosine learning rate scheduling, and quantization-aware inference.

LLaMA-2 RAG FAISS LoRA HuggingFace PEFT BitsAndBytes

Real-Time Fraud Detection Pipeline

Built a real-time fraud detection system using Apache Kafka, PySpark Structured Streaming, and FastAPI, capable of identifying fraudulent transactions with over 95% precision on an imbalanced dataset. Trained a class-imbalance-aware XGBoost model and deployed it as a lightweight microservice using Docker, achieving sub-50ms prediction latency. Scalable up to 1M+ transactions/day.

Apache Kafka PySpark FastAPI XGBoost Docker MLOps

LearningSnake - Deep Reinforcement Learning

🐍

Developed a Reinforcement Learning agent using Deep Q-Network (DQN) architecture to play and improve performance in a custom-built Snake game environment. Designed and trained a Neural network model with checkpointing, experience replay and fine-tuning strategies, enabling the agent to achieve superhuman performance with an average high score of 150 across 120 games.

PyTorch Deep Q-Learning Numpy Matplotlib Reinforcement Learning

Obesity Risk Prediction Dashboard

📊

Built and deployed an interactive Streamlit web application to predict obesity risk based on 14 health and lifestyle features using Machine Learning models with dynamic hyperparameter tuning. Enhanced model explainability with LIME, SHAP, and PDPBox to visualize feature impact and provide transparent health risk insights for end users.

Streamlit XGBoost SHAP LIME PDPBox Scikit-Learn

Streamboard - Data Visualization Dashboard

📈

Interactive data visualization dashboard built with Streamlit for exploratory data analysis and ML model deployment. Features comprehensive data analysis tools, machine learning model integration, and dynamic visualization capabilities using Pandas, XGBoost, and Seaborn for professional-grade data insights.

Streamlit Pandas XGBoost Seaborn Data Visualization

To-Do-Stream - Task Management App

A web-based to-do list application created using Streamlit and SQLite3. Features a clean, intuitive interface for task management with persistent storage using SQLite database. Implements CRUD operations for managing tasks efficiently with real-time updates and responsive design for seamless user experience.

Streamlit SQLite3 Python Database

Placement-Gyaan Portal

🎓

A comprehensive placement preparation portal built with Python-Flask backend, featuring real-time communication via sockets, RESTful API integration, and responsive frontend using HTML, CSS, and JavaScript. Includes YAML-based configuration management and structured data handling for student placement resources and company information.

Python-Flask Sockets YAML API HTML CSS JavaScript

Flask Flash Cards

🎴

Full-stack flashcard application built with Python-Flask backend and Vue.js frontend. Features asynchronous task processing with Celery, YAML-based configuration, and a modern responsive interface. Implements spaced repetition algorithms and user authentication for personalized learning experiences.

Python-Flask Vue.js Celery HTML CSS JavaScript YAML

Opticals Data Analysis

👓

Comprehensive data analysis project using Jupyter Notebook to analyze optical sales and customer data. Implements statistical analysis, data visualization, and predictive modeling techniques to derive actionable business insights. Features exploratory data analysis, trend identification, and customer segmentation using Python data science stack.

Jupyter Python Data Analysis Pandas Matplotlib Seaborn

Conway's Game of Life

🎮

Implementation of Conway's Game of Life cellular automaton using Python and PyGame. Features interactive grid manipulation, multiple pattern presets, adjustable simulation speed, and real-time visualization. Demonstrates computational thinking and algorithmic problem-solving in simulating complex emergent behaviors from simple rules.

Python PyGame Algorithms Simulation

Virtual Assistant

🎙️

Voice-activated virtual assistant built with Python using speech recognition and text-to-speech capabilities. Features include web browsing automation, system command execution, information retrieval, and natural language processing. Integrates multiple APIs for weather, news, and general queries with hands-free voice control.

Python Speech Recognition NLP Web Browser API Integration

Calculator Extension

🔢

Browser extension calculator built with vanilla HTML, CSS, and JavaScript. Features a clean, responsive interface with support for basic arithmetic operations, keyboard shortcuts, and memory functions. Demonstrates proficiency in browser extension development and front-end engineering fundamentals.

HTML CSS JavaScript Browser Extension

Technical Skills

Get In Touch

I'm currently pursuing my Master's in Computer Science at UC San Diego and actively seeking opportunities in ML/AI engineering and research. Feel free to reach out!

Location: San Diego, CA, USA