Glossary

A

Agent Frameworks: Software that helps AI coordinate complex, multi-step tasks. Think of them as project managers for AI – breaking down big goals, choosing the right tools, and handling errors when things go wrong.

API (Application Programming Interface): The direct line to AI models, bypassing all the packaging of chat apps. Send a request, get a response, with full control over all the settings.

API Key: Your secret password for accessing AI services. Like a phone number that only you should know.

Attention Mechanism: The Transformer's breakthrough feature that lets AI understand how words relate to each other across entire sentences. Every word can "look at" every other word simultaneously.

Augmentation: Giving AI superpowers beyond its training – access to current information, databases, tools, and the ability to take actions in the real world.

Autoregressive Generation: How AI writes – one token at a time, with each new token based on everything that came before. Like building a sentence where each word must fit perfectly with all previous words.

B

Base Models: The wild, untamed version of LLMs. Trained on massive text but not taught manners. They complete text rather than follow instructions, and might say anything.

Batch Inference: Processing multiple AI requests together. Efficient but not real-time.

Benchmarks: Standardized tests for AI models (MMLU, HumanEval, etc.). Useful for comparison but often poor predictors of real-world usefulness.

Bias (in AI): Unfair prejudices AI learns from its training data. A major safety concern requiring active detection and mitigation.

C

Chain-of-Thought (CoT): A prompting technique where you ask AI to "think step by step." Often dramatically improves performance on complex problems.

Claude: Anthropic's family of AI models, known for thoughtful responses and strong safety features. Comes in Opus (powerful), Sonnet (balanced), and Haiku (fast) versions.

Closed-source Models: AI models accessed only through APIs, where the weights and training details are kept secret. You rent, not own.

Context Window: How much text an AI can "see" at once – including your prompt, conversation history, and its response. Like the size of the AI's desk.

Concept Drift: When the world changes but your AI's knowledge doesn't. A model trained in 2023 won't know about events in 2024.

D

Data Drift: When the types of questions users ask start differing from what the model was trained on.

Decode Phase: The second part of inference where AI generates its response token by token.

Decoder-only Architecture: The design used by most modern generative AI (GPT, Claude, Gemini). Optimized for generating text rather than translating between languages.

E

Embeddings: Mathematical representations of text meaning. How AI converts words into numbers it can search and compare.

End-of-Sequence (EOS) Token: The special token that tells AI to stop generating. Without it, AI would ramble forever.

Evaluation: Systematic assessment of AI performance, safety, and reliability. The quality control department for AI.

F

Few-shot Prompting: Providing examples in your prompt to show AI exactly what you want. Like teaching by demonstration.

Fine-tuning: Continuing to train a pre-trained model on specialized data. Taking a generalist and making it an expert in your specific domain.

Full Fine-tuning: Retraining all parameters of a model. Powerful but expensive – like sending someone back to university.

Function Calling: Giving AI the ability to use tools and take actions. Transforms AI from a thinker to a doer.

G

Gemini: Google's family of AI models. Includes Pro (powerful) and Flash (fast and efficient) versions.

GPT (Generative Pre-trained Transformer): OpenAI's model family. GPT-4 remains highly capable despite newer releases.

Guardrails: Safety filters that block harmful AI inputs or outputs. The safety fence around AI systems.

H

Hallucination: When AI confidently makes things up. A fundamental challenge since AI generates plausible-sounding text whether it's true or not.

Human Evaluation: The gold standard for assessing AI quality. Automated metrics help, but humans judge what really matters.

I

Inference: The process of using a trained model to generate responses. What happens when you actually use AI.

Instruct Models: LLMs fine-tuned to follow instructions helpfully and safely. The polite, helpful versions of base models.

Instruction Fine-tuning: Teaching models to follow commands through examples of good behavior. How wild base models become helpful assistants.

K

Key (K): In attention mechanisms, represents "what information do I have to offer?" Part of the Query-Key-Value system.

KV Cache: Optimization that stores previous calculations during text generation. Why AI can maintain long conversations efficiently.

L

LangChain: Popular framework for building AI applications, especially those using RAG or agents.

Large Language Model (LLM): AI systems trained on massive text datasets to understand and generate language. The technology behind ChatGPT, Claude, and others.

LLaMA: Meta's family of open-source models. Democratized AI by making powerful models freely available.

LoRA (Low-Rank Adaptation): Efficient fine-tuning technique that adds small "adapter" modules instead of retraining the whole model. Like Post-It notes on encyclopedia pages.

M

Max Tokens: Limit on how much text AI can generate in one response. Prevents infinite rambling.

MCP (Model Context Protocol): Anthropic's universal standard for how AI connects to tools and data. Like USB-C for AI integrations.

Multi-Head Attention: Running attention mechanisms multiple times in parallel, each focusing on different aspects (grammar, meaning, tone, etc.).

Multimodal Models: AI that handles multiple types of input/output – text, images, audio, video. Not just reading and writing anymore.

O

Open-source Models: Models with publicly available weights you can download and run yourself. Full control but requires your own hardware.

Output Control Parameters: Settings that shape AI responses – temperature, top-p, max length, etc. The dials and knobs for fine-tuning output.

P

Parameter-Efficient Fine-Tuning (PEFT): Techniques for adapting models by training only a small fraction of parameters. Gets 95% of results with 5% of the effort.

Parameters: The billions of numbers inside a model that encode its knowledge. More parameters generally means more capability but also more cost.

Perplexity: Metric measuring how "surprised" a model is by text. Lower is better – indicates better understanding.

Positional Embeddings: How AI remembers word order when processing everything simultaneously. Like seat numbers at a theater.

Prefill Phase: First part of inference where AI rapidly processes your entire prompt to understand context.

Prompt Engineering: The art and science of crafting effective AI inputs. Learning to speak AI's language.

Prompting: The basic act of giving instructions to AI. The quality of your prompt largely determines the quality of the output.

Q

QLoRA: Even more efficient than LoRA – compresses the main model while adding adapters. Maximum efficiency for fine-tuning.

Quantization: Reducing model precision to save memory and increase speed. Like compressing a photo – slightly lower quality but much smaller file.

Query (Q): In attention mechanisms, represents "what information am I looking for?" Works with Keys and Values.

R

RAG (Retrieval-Augmented Generation): Automatically finding and injecting relevant information into AI prompts. Like giving AI a research assistant.

Rate Limits: Restrictions on how many API requests you can make. Prevents overwhelming the service.

Reasoning Models: Latest evolution of AI that can work through problems step-by-step and check their own logic. Think before they speak.

Red-teaming: Security experts trying to break AI safety features. Ethical hacking for AI systems.

Reinforcement Learning from Human Feedback (RLHF): Teaching AI to be helpful by learning from human preferences. How models learn manners.

S

Stop Sequences: Specific text that makes AI immediately stop generating. The emergency brake.

Streaming: Getting AI responses word-by-word as they're generated. Like watching someone type rather than waiting for the full message.

Supervised Fine-Tuning (SFT): Training models on labeled examples. The foundation of most fine-tuning efforts.

System Prompt: Hidden instructions that shape AI behavior throughout a conversation. The personality and rules you never see in chat apps.

T

Temperature: Controls randomness in AI responses. Low = predictable and safe. High = creative and wild.

Token: The basic unit AI processes – usually parts of words. "Unbelievable" might be "un-believ-able" in tokens.

Tokenization: Breaking text into tokens. How AI converts human language into something it can process.

Top-P (Nucleus Sampling): Another randomness control. Affects vocabulary diversity rather than overall wildness.

Transformer Architecture: The revolutionary design behind all modern LLMs. Enables understanding of long-range word relationships.

V

Value (V): In attention mechanisms, the actual information content. Works with Queries and Keys to create understanding.

Vector Database: Specialized storage for embeddings. Enables semantic search – finding documents by meaning, not just keywords.

Z

Zero-shot Prompting: Asking AI to do something without providing examples. Just throwing a question and hoping for the best.

License

© 2025 Uli Hitzel  

This book is released under the Creative Commons Attribution–NonCommercial 4.0 International license (CC BY-NC 4.0).  
You may copy, distribute, and adapt the material for any non-commercial purpose, provided you give appropriate credit, include a link to the license, and indicate if changes were made. For commercial uses, please contact the author.

Version 0.1, last updated July 8th 2025