🏗️ Architecture Basics

The architecture of a Large Language Model (LLM) refers to the internal structure and mechanisms that allow the model to understand, process, and generate human-like text. Most modern LLMs are based on a highly successful design known as the Transformer architecture.

🏛️ Main Types of AI/LLM Architectures

1️⃣ Transformer Architecture

The backbone of modern LLMs

  • Introduced by Google in “Attention is All You Need” (2017)
  • Uses Self-Attention to process entire sequences at once
  • Great for long-range dependencies
  • Examples: GPT series, BERT, T5, LLaMA, Gemini
2️⃣ Encoder-Decoder Architecture

Common in translation and summarization

  • Encoder: Reads and understands input text
  • Decoder: Generates output from encoded understanding
  • Best when input ≠ output (e.g., translation)
  • Examples: T5, BART, mT5
3️⃣ Decoder-Only Architecture

Widely used for text generation tasks

  • Uses only the decoder block
  • Predicts the next token from previous context
  • Efficient for chatting, writing, answering
  • Examples: GPT-2, GPT-3, GPT-4, Claude
4️⃣ Encoder-Only Architecture

Focused on understanding, not generating

  • Uses only the encoder
  • Great for classification, semantic search, NLU
  • Examples: BERT, RoBERTa, DeBERTa

🎯 Quick Summary Table

Architecture Type
Focus
Examples
Transformer
Language understanding/generation
GPT, BERT, T5
Encoder-Decoder
Translation, Summarization
T5, BART, mT5
Decoder-Only
Text generation
GPT-2, GPT-3, GPT-4
Encoder-Only
Text understanding
BERT, RoBERTa, DeBERTa
Retrieval-Augmented (RAG)
Knowledge retrieval + generation
RAG, OpenAI Retrieval Plugin
Mixture of Experts (MoE)
Scale efficiently
Switch Transformer, GShard, Mixtral
Multimodal
Text + Images + Video
GPT-4V, Gemini 1.5, Flamingo