Skip to main content
Saurabh Ghatnekar

The Short Version

I’m an AI systems engineer progressing from data engineering through LLMOps and GPU engineering toward RL infrastructure — the most complex distributed systems problem in AI today.

This is not a story of pivots and detours. It’s a deliberate, logical progression toward a single, ambitious goal. Each layer builds on the last.


The Four Layers

My career maps to a 4-layer stack. Each layer solves a harder problem, and each one requires everything from the layers below it.

LAYER 4
RL Infrastructure Expert
Train a million agents to learn overnight
Goal
LAYER 3
GPU Engineer
Make models run 10x faster on specific hardware
Next
LAYER 2
LLMOps Engineer
Serve, monitor, and update LLMs in production
In Progress
LAYER 1
Data Engineer
Move, store, and process massive data reliably
Completed

How Each Layer Builds

Think of it as building a skyscraper. Each stage adds a more specialized layer on top of a solid foundation.

Layer 1 → Layer 2: Data Engineering → LLMOps

My foundational skills in building robust, scalable data pipelines directly applied to the unique challenges of LLMs:

  • ETL pipelines became the foundation for RAG and embedding pipelines
  • Data quality and monitoring became the foundation for LLM observability
  • Scalable systems knowledge enabled reliable inference APIs and agentic workflows

Layer 2 → Layer 3: LLMOps → GPU Engineering

This is the pivot from the application layer to the hardware layer — moving from using the tools to understanding how they work. It’s driven by asking “why?”:

  • “Why is my vector search slow?” → Because the indexing algorithm causes non-coalesced memory access on the GPU, and the kernel isn’t optimized for this hardware’s cache size.
  • “How can I serve more concurrent users?” → By implementing paged attention and speculative decoding at the kernel level.

LLMOps gives you the problems. GPU engineering gives you the solutions.

Layer 3 → Layer 4: GPU Engineering → RL Infrastructure

RL Infrastructure is the final and most important leap. It’s the most complex distributed systems problem in AI, and it requires mastery of all previous layers:

  • RL requires massive data pipelines for experience replay, often petabytes in scale — a Data Engineering problem
  • RL requires serving multiple models (actors, critics, world models) with different performance characteristics — an LLMOps problem
  • RL requires extreme performance optimization to make training feasible — a GPU Engineering problem

My Journey: From Data to Decisions

My career has been a deliberate progression up the stack, from the fundamentals of data to the frontiers of AI infrastructure. Each step has been driven by a desire to solve harder problems and build more capable systems.

The Foundation: Data Engineering

My career began in data engineering, where I learned the fundamental principles of building robust, scalable, and reliable systems. I spent five years designing and implementing the data pipelines that power large-scale applications, mastering the art of moving and processing massive datasets. This experience taught me to think in terms of systems, not just code.

The Application: LLMOps

As large language models emerged, I pivoted to LLMOps, applying my systems-thinking to the unique challenges of deploying and managing AI in production. At Measurebit, I built the infrastructure for agentic workflows and LLM observability, learning how to bridge the gap between research and real-world application. This is where I first encountered the deep infrastructure challenges that limit the potential of AI.

The Deep Dive: GPU Engineering

To solve these challenges, I know I need to go deeper. I’m now studying GPU architecture, CUDA programming, and distributed training paradigms — moving from using the tools to understanding how the tools themselves work at a low level. This is the pivot from the application layer to the hardware layer, driven by a need to optimize performance at the most fundamental level.

The Frontier: RL Infrastructure

The end goal is the most complex distributed systems problem in AI: Reinforcement Learning Infrastructure. Training a million robots to learn a new skill requires mastery of the entire stack, from the high-level RL algorithm down to the low-level CUDA kernel. This is where every layer of my career converges, and I believe it’s where the most important work in AI will be done over the next decade.


Currently

  • Senior Data Engineer @ Measurebit — building data and AI systems
  • MTech in Data Engineering @ IIT Jodhpur — GenAI + RL specialization
  • Writing about what I learn as I progress through each layer
  • Currently at Layer 2 (LLMOps), building toward Layer 3 (GPU Engineering)

Get in Touch

I’m always happy to connect with others working on AI infrastructure, distributed systems, or the path from data engineering to RL.

LAUNCHING MARCH 2026

Build an AI Agent in 5 Days

10 minutes a day. 5 days. One working AI agent. Join the waitlist — launching first week of March.

See the Full Curriculum