Objective & Overview
Project Under Active Development
Please note that myT-LLM is an ongoing research project. The architecture, training pipelines, and results are subject to change. New features and stability improvements are being added regularly. Stay tuned for future updates!
Architecture Highlights
The model architecture is a modern, decoder-only Transformer inspired by the QWEN and LLaMA lineage, incorporating the following state-of-the-art components:
| Component | Choice / Implementation | Reference |
|---|---|---|
| Model Type | Decoder-only Transformer | QWEN / LLaMA lineage |
| Normalization | Pre-RMSNorm | Zhang et al., 2019 |
| Attention | Multi-Head / Grouped-Query / Flash Attention | Dao et al., 2022 |
| Activation | SwiGLU (2× FFN) | Shazeer, 2020 |
| Optimizer | AdamW (with fused kernels) | - |
| Scheduler | Cosine Decay + Warmup | Chinchilla, 2022 |
| Tokenizer | Custom BPE / SentencePiece | - |
| Initialization | GPT-2 style (std = 0.02) | Radford et al., 2019 |
Key Features
Modern Architecture
Integrates RMSNorm, SwiGLU, and Grouped-Query Attention (GQA) for a highly efficient and powerful model foundation.
Flash Attention
Utilizes Flash Attention for faster training and inference on long context windows with significantly reduced memory usage.
End-to-End Training
Provides a complete suite for LLM pre-training, including checkpointing, resumption, and detailed logging.
Custom Tokenizer
Includes tools to train your own Byte-Pair Encoding (BPE) tokenizer from scratch for any corpus.
Repository Structure & Lineage
Code Structure
The repository is organized into development, production, and research directories, reflecting the project's evolution from experimentation to a finalized training pipeline.
myT-LLM/
├── Research/ # Papers, notes, experiments
├── dev/ # Tokenization & preprocessing pipelines
├── stage/ # Archived early stages
├── prod/ # Production-ready LLM trainer
│ ├── architecture.py
│ ├── tokenizer.py
│ ├── trainer.py
│ ├── configs/
│ └── main.py
└── assets/ # Images
Research Lineage
This project is the direct successor to the miniGPT project, evolving from a theoretical prototype to a scaled-up, SOTA implementation with modern training infrastructure. Key influences include:
- Attention is All You Need — Vaswani et al., 2017
- Language Models are Few-Shot Learners — OpenAI, 2020
- FlashAttention — Dao et al., 2022
- Chinchilla Scaling Laws — DeepMind, 2022
- LLaMA: Open and Efficient Foundation Language Models — Meta, 2023
Motto
Don't just use Transformers — understand them.