Glossary
A
Agent Frameworks: Software that helps AI coordinate complex, multi-step tasks. Think of them as project managers for AI – breaking down big goals, choosing the right tools, and handling errors when things go wrong.
API (Application Programming Interface): The direct line to AI models, bypassing all the packaging of chat apps. Send a request, get a response, with full control over all the settings.
API Key: Your secret password for accessing AI services. Like a phone number that only you should know.
Attention Mechanism: The Transformer's breakthrough feature that lets AI understand how words relate to each other across entire sentences. Every word can "look at" every other word simultaneously.
Augmentation: Giving AI superpowers beyond its training – access to current information, databases, tools, and the ability to take actions in the real world.
Autoregressive Generation: How AI writes – one token at a time, with each new token based on everything that came before. Like building a sentence where each word must fit perfectly with all previous words.
B
Base Models: The wild, untamed version of LLMs. Trained on massive text but not taught manners. They complete text rather than follow instructions, and might say anything.
Batch Inference: Processing multiple AI requests together. Efficient but not real-time.
Benchmarks: Standardized tests for AI models (MMLU, HumanEval, etc.). Useful for comparison but often poor predictors of real-world usefulness.
Bias (in AI): Unfair prejudices AI learns from its training data. A major safety concern requiring active detection and mitigation.
C
Chain-of-Thought (CoT): A prompting technique where you ask AI to "think step by step." Often dramatically improves performance on complex problems.
Claude: Anthropic's family of AI models, known for thoughtful responses and strong safety features. Comes in Opus (powerful), Sonnet (balanced), and Haiku (fast) versions.
Closed-source Models: AI models accessed only through APIs, where the weights and training details are kept secret. You rent, not own.
Context Window: How much text an AI can "see" at once – including your prompt, conversation history, and its response. Like the size of the AI's desk.
Concept Drift: When the world changes but your AI's knowledge doesn't. A model trained in 2023 won't know about events in 2024.
D
Data Drift: When the types of questions users ask start differing from what the model was trained on.
Decode Phase: The second part of inference where AI generates its response token by token.
Decoder-only Architecture: The design used by most modern generative AI (GPT, Claude, Gemini). Optimized for generating text rather than translating between languages.
E
Embeddings: Mathematical representations of text meaning. How AI converts words into numbers it can search and compare.
End-of-Sequence (EOS) Token: The special token that tells AI to stop generating. Without it, AI would ramble forever.
Evaluation: Systematic assessment of AI performance, safety, and reliability. The quality control department for AI.
F
Few-shot Prompting: Providing examples in your prompt to show AI exactly what you want. Like teaching by demonstration.
Fine-tuning: Continuing to train a pre-trained model on specialized data. Taking a generalist and making it an expert in your specific domain.
Full Fine-tuning: Retraining all parameters of a model. Powerful but expensive – like sending someone back to university.
Function Calling: Giving AI the ability to use tools and take actions. Transforms AI from a thinker to a doer.
G
Gemini: Google's family of AI models. Includes Pro (powerful) and Flash (fast and efficient) versions.
GPT (Generative Pre-trained Transformer): OpenAI's model family. GPT-4 remains highly capable despite newer releases.
Guardrails: Safety filters that block harmful AI inputs or outputs. The safety fence around AI systems.
H
Hallucination: When AI confidently makes things up. A fundamental challenge since AI generates plausible-sounding text whether it's true or not.
Human Evaluation: The gold standard for assessing AI quality. Automated metrics help, but humans judge what really matters.
I
Inference: The process of using a trained model to generate responses. What happens when you actually use AI.
Instruct Models: LLMs fine-tuned to follow instructions helpfully and safely. The polite, helpful versions of base models.
Instruction Fine-tuning: Teaching models to follow commands through examples of good behavior. How wild base models become helpful assistants.
K
Key (K): In attention mechanisms, represents "what information do I have to offer?" Part of the Query-Key-Value system.
KV Cache: Optimization that stores previous calculations during text generation. Why AI can maintain long conversations efficiently.
L
LangChain: Popular framework for building AI applications, especially those using RAG or agents.
Large Language Model (LLM): AI systems trained on massive text datasets to understand and generate language. The technology behind ChatGPT, Claude, and others.
LLaMA: Meta's family of open-source models. Democratized AI by making powerful models freely available.
LoRA (Low-Rank Adaptation): Efficient fine-tuning technique that adds small "adapter" modules instead of retraining the whole model. Like Post-It notes on encyclopedia pages.
M
Max Tokens: Limit on how much text AI can generate in one response. Prevents infinite rambling.
MCP (Model Context Protocol): Anthropic's universal standard for how AI connects to tools and data. Like USB-C for AI integrations.
Multi-Head Attention: Running attention mechanisms multiple times in parallel, each focusing on different aspects (grammar, meaning, tone, etc.).
Multimodal Models: AI that handles multiple types of input/output – text, images, audio, video. Not just reading and writing anymore.
O
Open-source Models: Models with publicly available weights you can download and run yourself. Full control but requires your own hardware.
Output Control Parameters: Settings that shape AI responses – temperature, top-p, max length, etc. The dials and knobs for fine-tuning output.
P
Parameter-Efficient Fine-Tuning (PEFT): Techniques for adapting models by training only a small fraction of parameters. Gets 95% of results with 5% of the effort.
Parameters: The billions of numbers inside a model that encode its knowledge. More parameters generally means more capability but also more cost.
Perplexity: Metric measuring how "surprised" a model is by text. Lower is better – indicates better understanding.
Positional Embeddings: How AI remembers word order when processing everything simultaneously. Like seat numbers at a theater.
Prefill Phase: First part of inference where AI rapidly processes your entire prompt to understand context.
Prompt Engineering: The art and science of crafting effective AI inputs. Learning to speak AI's language.
Prompting: The basic act of giving instructions to AI. The quality of your prompt largely determines the quality of the output.
Q
QLoRA: Even more efficient than LoRA – compresses the main model while adding adapters. Maximum efficiency for fine-tuning.
Quantization: Reducing model precision to save memory and increase speed. Like compressing a photo – slightly lower quality but much smaller file.
Query (Q): In attention mechanisms, represents "what information am I looking for?" Works with Keys and Values.
R
RAG (Retrieval-Augmented Generation): Automatically finding and injecting relevant information into AI prompts. Like giving AI a research assistant.
Rate Limits: Restrictions on how many API requests you can make. Prevents overwhelming the service.
Reasoning Models: Latest evolution of AI that can work through problems step-by-step and check their own logic. Think before they speak.
Red-teaming: Security experts trying to break AI safety features. Ethical hacking for AI systems.
Reinforcement Learning from Human Feedback (RLHF): Teaching AI to be helpful by learning from human preferences. How models learn manners.
S
Stop Sequences: Specific text that makes AI immediately stop generating. The emergency brake.
Streaming: Getting AI responses word-by-word as they're generated. Like watching someone type rather than waiting for the full message.
Supervised Fine-Tuning (SFT): Training models on labeled examples. The foundation of most fine-tuning efforts.
System Prompt: Hidden instructions that shape AI behavior throughout a conversation. The personality and rules you never see in chat apps.
T
Temperature: Controls randomness in AI responses. Low = predictable and safe. High = creative and wild.
Token: The basic unit AI processes – usually parts of words. "Unbelievable" might be "un-believ-able" in tokens.
Tokenization: Breaking text into tokens. How AI converts human language into something it can process.
Top-P (Nucleus Sampling): Another randomness control. Affects vocabulary diversity rather than overall wildness.
Transformer Architecture: The revolutionary design behind all modern LLMs. Enables understanding of long-range word relationships.
V
Value (V): In attention mechanisms, the actual information content. Works with Queries and Keys to create understanding.
Vector Database: Specialized storage for embeddings. Enables semantic search – finding documents by meaning, not just keywords.
Z
Zero-shot Prompting: Asking AI to do something without providing examples. Just throwing a question and hoping for the best.
License
© 2025 Uli Hitzel This book is released under the Creative Commons Attribution–NonCommercial 4.0 International license (CC BY-NC 4.0). You may copy, distribute, and adapt the material for any non-commercial purpose, provided you give appropriate credit, include a link to the license, and indicate if changes were made. For commercial uses, please contact the author.
Version 0.1, last updated July 8th 2025