The Short Version

I’m an AI systems engineer progressing from data engineering through LLMOps and GPU engineering toward RL infrastructure — the most complex distributed systems problem in AI today.

This is not a story of pivots and detours. It’s a deliberate, logical progression toward a single, ambitious goal. Each layer builds on the last.

The Four Layers

My career maps to a 4-layer stack. Each layer solves a harder problem, and each one requires everything from the layers below it.

LAYER 4

RL Infrastructure Expert

Train a million agents to learn overnight

Goal

LAYER 3

GPU Engineer

Make models run 10x faster on specific hardware

LAYER 2

LLMOps Engineer

Serve, monitor, and update LLMs in production

In Progress

LAYER 1

Data Engineer

Move, store, and process massive data reliably

Completed

How Each Layer Builds

Think of it as building a skyscraper. Each stage adds a more specialized layer on top of a solid foundation.

Layer 1 → Layer 2: Data Engineering → LLMOps

My foundational skills in building robust, scalable data pipelines directly applied to the unique challenges of LLMs:

ETL pipelines became the foundation for RAG and embedding pipelines
Data quality and monitoring became the foundation for LLM observability
Scalable systems knowledge enabled reliable inference APIs and agentic workflows

Layer 2 → Layer 3: LLMOps → GPU Engineering

This is the pivot from the application layer to the hardware layer — moving from using the tools to understanding how they work. It’s driven by asking “why?”:

“Why is my vector search slow?” → Because the indexing algorithm causes non-coalesced memory access on the GPU, and the kernel isn’t optimized for this hardware’s cache size.
“How can I serve more concurrent users?” → By implementing paged attention and speculative decoding at the kernel level.

LLMOps gives you the problems. GPU engineering gives you the solutions.

Layer 3 → Layer 4: GPU Engineering → RL Infrastructure

RL Infrastructure is the final and most important leap. It’s the most complex distributed systems problem in AI, and it requires mastery of all previous layers:

RL requires massive data pipelines for experience replay, often petabytes in scale — a Data Engineering problem
RL requires serving multiple models (actors, critics, world models) with different performance characteristics — an LLMOps problem
RL requires extreme performance optimization to make training feasible — a GPU Engineering problem

My Journey: From Data to Decisions

My career has been a deliberate progression up the stack, from the fundamentals of data to the frontiers of AI infrastructure. Each step has been driven by a desire to solve harder problems and build more capable systems.

The Foundation: Data Engineering

My career began in data engineering, where I learned the fundamental principles of building robust, scalable, and reliable systems. I spent five years designing and implementing the data pipelines that power large-scale applications, mastering the art of moving and processing massive datasets. This experience taught me to think in terms of systems, not just code.

The Application: LLMOps

As large language models emerged, I pivoted to LLMOps, applying my systems-thinking to the unique challenges of deploying and managing AI in production. At Measurebit, I built the infrastructure for agentic workflows and LLM observability, learning how to bridge the gap between research and real-world application. This is where I first encountered the deep infrastructure challenges that limit the potential of AI.

The Deep Dive: GPU Engineering

To solve these challenges, I know I need to go deeper. I’m now studying GPU architecture, CUDA programming, and distributed training paradigms — moving from using the tools to understanding how the tools themselves work at a low level. This is the pivot from the application layer to the hardware layer, driven by a need to optimize performance at the most fundamental level.

The Frontier: RL Infrastructure

The end goal is the most complex distributed systems problem in AI: Reinforcement Learning Infrastructure. Training a million robots to learn a new skill requires mastery of the entire stack, from the high-level RL algorithm down to the low-level CUDA kernel. This is where every layer of my career converges, and I believe it’s where the most important work in AI will be done over the next decade.

Currently

Senior Data Engineer @ Measurebit — building data and AI systems
MTech in Data Engineering @ IIT Jodhpur — GenAI + RL specialization
Writing about what I learn as I progress through each layer
Currently at Layer 2 (LLMOps), building toward Layer 3 (GPU Engineering)

Get in Touch

I’m always happy to connect with others working on AI infrastructure, distributed systems, or the path from data engineering to RL.

GitHub: github.com/saurabhghatnekar
LinkedIn: linkedin.com/in/saurabhgghatnekar
X/Twitter: x.com/saurabh_works

About