ionews
Learn AI

Glossary

55 terms, defined in plain language — the vocabulary behind everything on the Learn page. No circular definitions, no jargon explained with more jargon.

Core concepts

AI (Artificial Intelligence)
Umbrella term for software that performs tasks we associate with human intelligence. Today it almost always means machine learning systems, not hand-coded rules.
Machine Learning (ML)
Programs that learn patterns from data instead of following explicit instructions. You provide examples; the system tunes itself to fit them.
Neural network
A model built from layers of simple weighted connections, loosely inspired by neurons. Stacking many layers is what "deep learning" means.
LLM (Large Language Model)
A neural network trained on massive amounts of text to predict the next token. Chat assistants like Claude and ChatGPT are LLMs with extra training to be helpful.
Transformer
The neural network architecture behind modern AI, introduced in 2017 ("Attention Is All You Need"). Its key trick is attention: weighing every part of the input against every other part.
Attention
The mechanism inside transformers that lets the model decide which earlier words matter for predicting the next one — how "it" finds what it refers to.
Parameters / weights
The numbers inside a model that training adjusts — the learned knowledge itself. "A 70B model" means 70 billion parameters.
Token
The unit models actually read and write: a word fragment of roughly 3–4 characters of English. Pricing, speed, and context limits are all measured in tokens.
Context window
The maximum amount of text (in tokens) a model can consider at once — its working memory. Anything beyond it is invisible to the model.
Training
The compute-heavy process of adjusting a model's parameters against data. Happens once, in a datacenter — the model does not learn while you chat with it.
Inference
Running a trained model to get output. When you chat with an assistant, you are doing inference.
Fine-tuning
Taking a trained model and training it a little more on your own data to specialise it. Vastly cheaper than training from scratch.
RLHF (Reinforcement Learning from Human Feedback)
Training a model against human preference ratings so it behaves helpfully rather than just predicting internet text. A key step that turns a raw LLM into an assistant.
Benchmark
A standardised test used to compare models. Useful but gameable — treat day-one benchmark claims with patience.
Overfitting
When a model memorises its training data instead of learning the general pattern, so it fails on anything new.
Open weights
A model whose parameters are published so anyone can run or fine-tune it locally (e.g. DeepSeek, GLM, Qwen, Llama). Distinct from open source: the training data and code usually stay private.
Frontier model
A model at the current capability edge — the flagship tier from the major labs. What counts as frontier turns over every few months.
Quantization
Shrinking a model by storing its weights at lower numeric precision so it fits on smaller hardware, at a small quality cost. How a 70B model runs on a home GPU.
GPU / VRAM
Graphics processors do the parallel math neural networks need; VRAM is their onboard memory. Model size is limited by VRAM, which is why it is the spec that matters.

Using LLMs

Prompt
Everything you send the model: the question, instructions, and any included documents. The model's only view of your problem.
Prompt engineering
Writing prompts deliberately — clear instructions, examples, output format — to get reliable results. Less magic than the name implies; mostly clear technical writing.
System prompt
Hidden standing instructions that frame the whole conversation (persona, rules, tools) before the user says anything.
Temperature
A dial for randomness in output. Low = consistent and repetitive; high = varied and creative. Zero-ish for facts and code, higher for brainstorming.
Hallucination
When a model states something false with total confidence. It is a prediction engine, not a database — always verify names, numbers, and citations.
Embedding
A list of numbers representing the meaning of some text, where similar meanings land near each other. The foundation of semantic search.
RAG (Retrieval-Augmented Generation)
Fetching relevant documents (usually via embeddings) and pasting them into the prompt so the model answers from your data instead of its memory. The standard cure for hallucination about private knowledge.
Agent
An LLM given tools (browse, run code, edit files) and a goal, looping autonomously: act, observe the result, act again.
Tool use / function calling
The mechanism that lets a model invoke external functions — search, calculators, APIs — instead of guessing.
MCP (Model Context Protocol)
An open standard for plugging tools and data sources into AI assistants, so any client can talk to any tool server.
Chain-of-thought / reasoning models
Models that write out intermediate thinking before answering, trading time and tokens for accuracy on hard problems.
Multimodal
A model that handles more than text — images, audio, or video, in or out.
AI IDE / agentic coding
Code editors (Cursor, Antigravity, Devin Desktop) and terminal agents (Claude Code, Codex) where the AI edits files, runs commands, and iterates — you review and steer rather than type everything.
Vibe coding
Building software by describing what you want and accepting AI-written code with light review. Great for prototypes; risky for anything that handles money or user data.
Copilot (Microsoft)
Microsoft's brand for AI assistants across its products: the free consumer chatbot, GitHub Copilot for code, and the paid Microsoft 365 Copilot inside Word, Excel, and friends. Same name, different products.

Image & video generation

Diffusion model
The technique behind most image and video generators: start with pure noise and iteratively denoise it toward an image matching your prompt.
Latent space
The compressed internal representation where diffusion actually happens — the model works on a small "essence" of the image, then decodes it to pixels. Cheaper than working on pixels directly.
Checkpoint
A saved snapshot of model weights. In the image-gen world, "a checkpoint" usually means a full downloadable model, often a community fine-tune with its own style.
LoRA
A tiny add-on trained on top of a checkpoint that injects a style, character, or concept. Megabytes instead of gigabytes, and stackable.
ControlNet
A method for steering image generation with a structural guide — a pose skeleton, depth map, or sketch — instead of words alone.
Sampler / steps
The algorithm and iteration count used during denoising. More steps is slower and usually sharper, with diminishing returns.
CFG scale
Classifier-free guidance: how strongly generation is pulled toward your prompt. Low drifts off-prompt; too high looks overcooked.
Negative prompt
A second prompt listing what you do not want (blur, extra fingers, watermarks). Supported by most diffusion UIs.
Inpainting / outpainting
Regenerating only a masked region of an image (inpainting) or extending the canvas beyond its borders (outpainting).
Upscaling
Enlarging an image with a model that invents plausible detail, rather than simple interpolation.
Text-to-video
Generating video clips from a prompt. Hosted leaders in 2026 are Google's Veo, Kling, and Grok Imagine; open models like Wan run locally. Clips stay short, but now with native audio.
Image-to-video (i2v)
Animating a still image into a clip — often better-controlled than pure text-to-video because you fix the first frame yourself.

Databases & data

SQL
The standard language for querying relational databases — SELECT, JOIN, WHERE. Fifty years old and more relevant than ever; every AI application sits on one.
SQLite
A complete database in a single file, embedded in your program — no server to run. The default choice for local apps, prototypes, and anything single-user. Example: your browser history is SQLite.
PostgreSQL
The leading open-source client-server database — multi-user, concurrent, extensible. The default choice when an application grows past SQLite. Example: most production web apps you use daily.
pgvector
A PostgreSQL extension that stores embeddings and searches them by similarity — letting one database serve both your app data and your RAG retrieval.
Vector database
A database specialised for storing embeddings and answering "what is most similar to this?" (Qdrant, Chroma, Milvus). For modest datasets, pgvector or SQLite extensions do the same job with less infrastructure.
DuckDB
SQLite's analytics-focused sibling: an embedded database built for crunching large tables and Parquet/CSV files fast. Excellent for exploring ML datasets.
NoSQL
Catch-all for databases that skip the relational model — document stores (MongoDB), key-value stores (Redis). Flexible schemas, different trade-offs.
Index
A lookup structure that makes queries fast in exchange for slightly slower writes and more storage. The first thing to check when a query is slow.
Schema
The declared structure of a database: tables, columns, types, and how they relate. Designing it well is most of database craft.