LLM in C# in 200 lines? Hold my beer...

I once heard, “If you fear something, learn about it, disassemble it to the tiniest pieces, and the fear will just go away.”

Well, it didn’t work. I read a book about building LLM from scratch, which helped me understand the model's architecture and how it works inside. However, I am still concerned about the power of AI models and the risks our world may face in the future. Although we still don't understand many fundamental concepts of how the human brain works, and some scientists say we are not even close to getting to human-level intelligence, I am still a bit worried about the scale and speed the new technologies emerge. Many inventions of science were not achieved by logical thinking or inference but by mistake or trial and error. Spawning millions of model instances and automating them to make “random” experiments to discover something new doesn’t seem that impossible to me.

The book shows how to build the smallest version of GPT-2 in Python and preload model weights published by OpenAI. By the way, GPT-3 has the same architecture, but the model is scaled to a larger number of parameters.

I was curious if this could be done in C#, and I found the TorchSharp library. It is a wrapper for native libraries used by PyTorch. The API was intentionally kept to be as close to Python as possible, so the code does not look like .NET at all. But it makes the library easy to learn and use, since a vast majority of examples are in Python. What surprised me is that the actual LLM implementation in C# has only about 200 lines of code. All the magic is in the model weights. PyTorch/TorchSharp provides a very nice abstraction over the primitives from which deep neural networks are composed.

I was wondering if it makes sense to do a session about it, for example, at our MeetUpdate. The problem is that I am not an AI scientist, and the topic is hard. I think I understand all the crucial aspects and will be able to explain what is going on. But still, there are many things I have practically no experience with. Second, understanding the session requires at least some knowledge of how neural networks work and the basics of linear algebra. I am not sure what the experience of the audience would be. And finally, I would be speaking about something that is not my creation at all - it would be merely a description of things others have invented, and my added value would only be in trying to explain it in a short meetup session.

On the other hand, playing with it was really fun, and maybe it can motivate someone to start learning more about ML and neural networks.

Shall I do it?

Crazy developer explains LLM internals to the students