About
I'm Ashraf Bhuiyan. This site hosts my technical writing, starting with a 15-part series on building an LLM inference engine from scratch.
About the Series
Building an LLM Inference Engine from Scratch teaches how production inference servers like vLLM and SGLang work by isolating each technique into a standalone blog post with a runnable code implementation.
Each post is 200-800 lines of Python that you can read, run, and modify. By the end, you'll understand every major subsystem well enough to navigate the real codebase, debug production issues, or contribute upstream.
Source Code
All code for the series is available on GitHub.
Contact
Find me on GitHub.