🏗️ Architecture Basics
The architecture of a Large Language Model (LLM) refers to the internal structure and mechanisms that allow the model to understand, process, and generate human-like text.
Most modern LLMs are based on a highly successful design known as the Transformer architecture.
🏛️ Main Types of AI/LLM Architectures
1️⃣ Transformer Architecture
The backbone of modern LLMs
- Introduced by Google in “Attention is All You Need” (2017)
- Uses Self-Attention to process entire sequences at once
- Great for long-range dependencies
- Examples: GPT series, BERT, T5, LLaMA, Gemini
2️⃣ Encoder-Decoder Architecture
Common in translation and summarization
- Encoder: Reads and understands input text
- Decoder: Generates output from encoded understanding
- Best when input ≠ output (e.g., translation)
- Examples: T5, BART, mT5
3️⃣ Decoder-Only Architecture
Widely used for text generation tasks
- Uses only the decoder block
- Predicts the next token from previous context
- Efficient for chatting, writing, answering
- Examples: GPT-2, GPT-3, GPT-4, Claude
4️⃣ Encoder-Only Architecture
Focused on understanding, not generating
- Uses only the encoder
- Great for classification, semantic search, NLU
- Examples: BERT, RoBERTa, DeBERTa
🎯 Quick Summary Table
Architecture Type
Focus
Examples
Transformer
Language understanding/generation
GPT, BERT, T5
Encoder-Decoder
Translation, Summarization
T5, BART, mT5
Decoder-Only
Text generation
GPT-2, GPT-3, GPT-4
Encoder-Only
Text understanding
BERT, RoBERTa, DeBERTa
Retrieval-Augmented (RAG)
Knowledge retrieval + generation
RAG, OpenAI Retrieval Plugin
Mixture of Experts (MoE)
Scale efficiently
Switch Transformer, GShard, Mixtral
Multimodal
Text + Images + Video
GPT-4V, Gemini 1.5, Flamingo