CMU · Carnegie Mellon University · M.S. Computational Data Science

Sai Gopal Reddy Kovvuri

I work on machine learning systems for LLM inference, compiler-backed kernels, information retrieval, and distributed ML infrastructure at CMU.

About

Researcher and engineer at the intersection of ML and systems.

I am Sai Gopal Reddy Kovvuri, a Master's student in Computational Data Science at Carnegie Mellon University (CMU), and a Research Assistant with the CX Group and a Research Assistant with the Catalyst Group. My current work focuses on compiler and inference components across open-source ML systems, including Apache TVM, MLC-LLM, WebLLM, and FlashInfer-Bench.

Previously, I was a Product Engineer at Juspay, where I built production payment systems and RAG-based developer tooling. I completed my B.Tech in Computer Science at Shiv Nadar University with high distinction, and my research has appeared at BMVC and NCC.

LLM inference ML compilers GPU kernels Information retrieval RAG systems Distributed ML

Education

Academic Background

Carnegie Mellon University

M.S. in Computational Data Science

Aug 2025 - Dec 2026

Coursework includes Search Engines, Large Language Models Applications, Cloud Computing, and Machine Learning Systems.

Shiv Nadar University

B.Tech in Computer Science

Aug 2020 - May 2024

Graduated with CGPA 9.12/10, high distinction, and four Dean's List honors.

Experience

Research and Engineering

Feb 2026 - Present
Carnegie Mellon University logo

Research Assistant, CX Group

  • Engineering a distributed web crawler with fine-tuned LLM raters to acquire high-quality pretraining data at scale, with RL and reward-model filtering.
Feb 2026 - Present
Carnegie Mellon University logo

Research Assistant, Catalyst Group

  • Built compiler and inference components across Apache TVM, MLC-LLM, and WebLLM to improve deployment performance and device compatibility.
  • Extended FlashInfer-Bench for newer model architectures and real SGLang inference workloads, improving benchmarking coverage.
  • Built FP16 GEMM kernels for NVIDIA Blackwell GPUs using TVM/TIR and Python DSLs for kernel development.
Jun 2024 - Jul 2025
Juspay logo

Product Engineer, Juspay Technologies

  • Built CodeGen, an internal RAG-based developer tool that grounded LLMs in the codebase and automated 28% of payment gateway integration work.
  • Integrated six payment gateways into the payment orchestrator, maintaining critical payment logic and encryption workflows.
  • Enabled On-Us transaction processing for HSBC, reducing per-transaction network fees through direct in-network routing.
Dec 2023 - May 2024
Juspay logo

Product Engineer Intern, Juspay Technologies

  • Used Kibana and structured logging to analyze transaction logs and surface Redis cache performance issues per API flow.
  • Contributed to microservices handling 175M+ daily transactions by shipping requirements, resolving production issues, and improving reliability.
Jul 2023 - Aug 2023
Code for GovTech logo

Data Science Intern, Code for GovTech 2023

  • Built an on-demand data generation and fine-tuning pipeline for Hugging Face models from natural-language user prompts.
  • Applied Stanford NLP's Demonstrate-Search-Predict framework to improve responses for government scheme queries.
  • Built fuzzy-matching retrieval that improved rural village document matching accuracy.

Work

Projects

Cloud Computing, CMU

Twitter Recommendation Engine

Architected a three-tier Go microservice with gRPC serving 1,000+ RPS, backed by Spark/Scala ETL, MySQL, and AWS infrastructure managed with Terraform, Helm, and Kubernetes.

Machine Learning Systems, CMU

Distributed Training Systems

Implemented 2D parallel training from scratch using MPI collectives and Megatron-style tensor parallel communication, with ZeRO Stage-3 parameter sharding.

Search Engines, CMU

QryEval

Built an end-to-end neural search and RAG engine using BM25, dense retrieval, BERT reranking, SVMRank, and pseudo-relevance feedback.

Blockchain Application

WarrantyNFT

Blockchain-based platform for NFT warranty cards, enabling issuance, transfer control, and warranty management.

View Project

NLP Toolkit

AutoTuneNLP

A toolkit for seamless data generation and fine-tuning of NLP models, packed into a single convenient workflow.

View Project

Publications

Peer-Reviewed Research

National Conference on Communications, 2025

Revisiting Subject-Action Relevance for Egocentric Activity Recognition

Reddy, K.S.G., Prabhakar, M., and Mukherjee, S. Dual-stream CNN-LSTM approach for egocentric activity recognition.

IEEE Xplore

British Machine Vision Conference, 2024

UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters

Reddy, K.S.G., Bodduluri, S., Adityaja, A.M., Shigwan, S., Kumar, N., and Mukherjee, S. Graph-neural unsupervised image segmentation with ARMA filters.

BMVC Proceedings

Skills

Technical Toolkit

Programming

Python, Go, C++

ML and Data

PyTorch, scikit-learn, Hugging Face, NumPy, Pandas, FAISS

Systems and Cloud

AWS, Azure, GCP, Docker, Kubernetes, Terraform, Helm

Databases and Tools

MySQL, PostgreSQL, MongoDB, Redis, Git, Kibana