🐦‍🔥 myT-LLM

A from-scratch implementation of a modern Generative Pre-trained Transformer (GPT) for research and understanding.

Mahanth Yalla

M.Tech Artificial Intelligence

Indian Institute of Science, Bengaluru

Objective & Overview

myT-LLM is a full-stack, from-scratch implementation of a Generative Pre-trained Transformer (GPT) designed to build, train, and evaluate a decoder-only Transformer from first principles. As a continuation of the miniGPT project, it integrates the latest architectural and optimization techniques. The core goal is to understand each component's role in LLM design by training scalable GPT variants and benchmarking them against SOTA enhancements like SwiGLU, RMSNorm, GQA, and Flash Attention, all within a clean, open-source research framework.

Project Under Active Development

Please note that myT-LLM is an ongoing research project. The architecture, training pipelines, and results are subject to change. New features and stability improvements are being added regularly. Stay tuned for future updates!

Architecture Highlights

The model architecture is a modern, decoder-only Transformer inspired by the QWEN and LLaMA lineage, incorporating the following state-of-the-art components:

Component Choice / Implementation Reference
Model Type Decoder-only Transformer QWEN / LLaMA lineage
Normalization Pre-RMSNorm Zhang et al., 2019
Attention Multi-Head / Grouped-Query / Flash Attention Dao et al., 2022
Activation SwiGLU (2× FFN) Shazeer, 2020
Optimizer AdamW (with fused kernels) -
Scheduler Cosine Decay + Warmup Chinchilla, 2022
Tokenizer Custom BPE / SentencePiece -
Initialization GPT-2 style (std = 0.02) Radford et al., 2019

Key Features

Modern Architecture

Integrates RMSNorm, SwiGLU, and Grouped-Query Attention (GQA) for a highly efficient and powerful model foundation.

Flash Attention

Utilizes Flash Attention for faster training and inference on long context windows with significantly reduced memory usage.

End-to-End Training

Provides a complete suite for LLM pre-training, including checkpointing, resumption, and detailed logging.

Custom Tokenizer

Includes tools to train your own Byte-Pair Encoding (BPE) tokenizer from scratch for any corpus.

Repository Structure & Lineage

Code Structure

The repository is organized into development, production, and research directories, reflecting the project's evolution from experimentation to a finalized training pipeline.

myT-LLM/
├── Research/                 # Papers, notes, experiments
├── dev/                      # Tokenization & preprocessing pipelines
├── stage/                    # Archived early stages
├── prod/                     # Production-ready LLM trainer
│   ├── architecture.py
│   ├── tokenizer.py
│   ├── trainer.py
│   ├── configs/
│   └── main.py
└── assets/                   # Images

Research Lineage

This project is the direct successor to the miniGPT project, evolving from a theoretical prototype to a scaled-up, SOTA implementation with modern training infrastructure. Key influences include:

Motto

Don't just use Transformers — understand them.

See All Projects of Mine →