myT-LLM Project

Objective & Overview

myT-LLM is a full-stack, from-scratch implementation of a Generative Pre-trained Transformer (GPT) designed to build, train, and evaluate a decoder-only Transformer from first principles. As a continuation of the miniGPT project, it integrates the latest architectural and optimization techniques. The core goal is to understand each component's role in LLM design by training scalable GPT variants and benchmarking them against SOTA enhancements like SwiGLU, RMSNorm, GQA, and Flash Attention, all within a clean, open-source research framework.

Project Under Active Development

Please note that myT-LLM is an ongoing research project. The architecture, training pipelines, and results are subject to change. New features and stability improvements are being added regularly. Stay tuned for future updates!

Architecture Highlights

The model architecture is a modern, decoder-only Transformer inspired by the QWEN and LLaMA lineage, incorporating the following state-of-the-art components:

Component	Choice / Implementation	Reference
Model Type	Decoder-only Transformer	QWEN / LLaMA lineage
Normalization	Pre-RMSNorm	Zhang et al., 2019
Attention	Multi-Head / Grouped-Query / Flash Attention	Dao et al., 2022
Activation	SwiGLU (2× FFN)	Shazeer, 2020
Optimizer	AdamW (with fused kernels)	-
Scheduler	Cosine Decay + Warmup	Chinchilla, 2022
Tokenizer	Custom BPE / SentencePiece	-
Initialization	GPT-2 style (std = 0.02)	Radford et al., 2019

Key Features

Modern Architecture

Integrates RMSNorm, SwiGLU, and Grouped-Query Attention (GQA) for a highly efficient and powerful model foundation.

Flash Attention

Utilizes Flash Attention for faster training and inference on long context windows with significantly reduced memory usage.

End-to-End Training

Provides a complete suite for LLM pre-training, including checkpointing, resumption, and detailed logging.

Custom Tokenizer

Includes tools to train your own Byte-Pair Encoding (BPE) tokenizer from scratch for any corpus.

Repository Structure & Lineage

Code Structure

The repository is organized into development, production, and research directories, reflecting the project's evolution from experimentation to a finalized training pipeline.

myT-LLM/
├── Research/                 # Papers, notes, experiments
├── dev/                      # Tokenization & preprocessing pipelines
├── stage/                    # Archived early stages
├── prod/                     # Production-ready LLM trainer
│   ├── architecture.py
│   ├── tokenizer.py
│   ├── trainer.py
│   ├── configs/
│   └── main.py
└── assets/                   # Images

Research Lineage

This project is the direct successor to the miniGPT project, evolving from a theoretical prototype to a scaled-up, SOTA implementation with modern training infrastructure. Key influences include:

Attention is All You Need — Vaswani et al., 2017
Language Models are Few-Shot Learners — OpenAI, 2020
FlashAttention — Dao et al., 2022
Chinchilla Scaling Laws — DeepMind, 2022
LLaMA: Open and Efficient Foundation Language Models — Meta, 2023

Motto

Don't just use Transformers — understand them.

See All Projects of Mine →