About

I'm Ashraf Bhuiyan. This site hosts my technical writing, starting with a 15-part series on building an LLM inference engine from scratch.

About the Series

Building an LLM Inference Engine from Scratch teaches how production inference servers like vLLM and SGLang work by isolating each technique into a standalone blog post with a runnable code implementation.

Each post is 200-800 lines of Python that you can read, run, and modify. By the end, you'll understand every major subsystem well enough to navigate the real codebase, debug production issues, or contribute upstream.

Source Code

All code for the series is available on GitHub.

Contact

Find me on GitHub.