About
I'm Saurabh Ghatnekar — an AI systems engineer building toward RL infrastructure expertise. From data pipelines to GPU kernels, each layer builds on the last.
The Short Version
I’m an AI systems engineer progressing from data engineering through LLMOps and GPU engineering toward RL infrastructure — the most complex distributed systems problem in AI today.
This is not a story of pivots and detours. It’s a deliberate, logical progression toward a single, ambitious goal. Each layer builds on the last.
The Four Layers
My career maps to a 4-layer stack. Each layer solves a harder problem, and each one requires everything from the layers below it.
How Each Layer Builds
Think of it as building a skyscraper. Each stage adds a more specialized layer on top of a solid foundation.
Layer 1 → Layer 2: Data Engineering → LLMOps
My foundational skills in building robust, scalable data pipelines directly applied to the unique challenges of LLMs:
- ETL pipelines became the foundation for RAG and embedding pipelines
- Data quality and monitoring became the foundation for LLM observability
- Scalable systems knowledge enabled reliable inference APIs and agentic workflows
Layer 2 → Layer 3: LLMOps → GPU Engineering
This is the pivot from the application layer to the hardware layer — moving from using the tools to understanding how they work. It’s driven by asking “why?”:
- “Why is my vector search slow?” → Because the indexing algorithm causes non-coalesced memory access on the GPU, and the kernel isn’t optimized for this hardware’s cache size.
- “How can I serve more concurrent users?” → By implementing paged attention and speculative decoding at the kernel level.
LLMOps gives you the problems. GPU engineering gives you the solutions.
Layer 3 → Layer 4: GPU Engineering → RL Infrastructure
RL Infrastructure is the final and most important leap. It’s the most complex distributed systems problem in AI, and it requires mastery of all previous layers:
- RL requires massive data pipelines for experience replay, often petabytes in scale — a Data Engineering problem
- RL requires serving multiple models (actors, critics, world models) with different performance characteristics — an LLMOps problem
- RL requires extreme performance optimization to make training feasible — a GPU Engineering problem
My Journey: From Data to Decisions
My career has been a deliberate progression up the stack, from the fundamentals of data to the frontiers of AI infrastructure. Each step has been driven by a desire to solve harder problems and build more capable systems.
The Foundation: Data Engineering
My career began in data engineering, where I learned the fundamental principles of building robust, scalable, and reliable systems. I spent five years designing and implementing the data pipelines that power large-scale applications, mastering the art of moving and processing massive datasets. This experience taught me to think in terms of systems, not just code.
The Application: LLMOps
As large language models emerged, I pivoted to LLMOps, applying my systems-thinking to the unique challenges of deploying and managing AI in production. At Measurebit, I built the infrastructure for agentic workflows and LLM observability, learning how to bridge the gap between research and real-world application. This is where I first encountered the deep infrastructure challenges that limit the potential of AI.
The Deep Dive: GPU Engineering
To solve these challenges, I know I need to go deeper. I’m now studying GPU architecture, CUDA programming, and distributed training paradigms — moving from using the tools to understanding how the tools themselves work at a low level. This is the pivot from the application layer to the hardware layer, driven by a need to optimize performance at the most fundamental level.
The Frontier: RL Infrastructure
The end goal is the most complex distributed systems problem in AI: Reinforcement Learning Infrastructure. Training a million robots to learn a new skill requires mastery of the entire stack, from the high-level RL algorithm down to the low-level CUDA kernel. This is where every layer of my career converges, and I believe it’s where the most important work in AI will be done over the next decade.
Currently
- Senior Data Engineer @ Measurebit — building data and AI systems
- MTech in Data Engineering @ IIT Jodhpur — GenAI + RL specialization
- Writing about what I learn as I progress through each layer
- Currently at Layer 2 (LLMOps), building toward Layer 3 (GPU Engineering)
Get in Touch
I’m always happy to connect with others working on AI infrastructure, distributed systems, or the path from data engineering to RL.
- GitHub: github.com/saurabhghatnekar
- LinkedIn: linkedin.com/in/saurabhgghatnekar
- X/Twitter: x.com/saurabh_works