AI Glossary

A comprehensive guide to artificial intelligence and machine learning terminology. 65+ terms explained for practitioners and decision-makers.

A B C D E F G H I K L M N P Q R S T V Z

A

Active Learning

A machine learning approach where the model strategically selects the most informative unlabeled examples for human annotation to maximize learning efficiency.

Agentic Workflow

An AI-driven process where language models autonomously plan, execute, and iterate through multi-step tasks using tools, memory, and decision-making.

AI Agent

An autonomous AI system that can perceive its environment, make decisions, use tools, and take actions to accomplish goals with minimal human intervention.

AI Safety

The research and engineering discipline focused on ensuring AI systems behave reliably, avoid harmful outcomes, and remain aligned with human values.

Alignment

The challenge of ensuring AI systems pursue goals and exhibit behaviors that are consistent with human intentions, values, and expectations.

Attention Mechanism

A neural network component that dynamically weighs the importance of different input elements when producing an output, enabling models to focus on relevant context.

Autoencoder

A neural network architecture that learns compressed representations of data by training to reconstruct its input through a bottleneck layer.

B

Benchmark

A standardized evaluation dataset and methodology used to measure and compare AI model performance across specific tasks or capabilities.

C

Chain-of-Thought (CoT)

A prompting technique that improves AI reasoning by instructing the model to break down complex problems into explicit intermediate steps.

Computer Vision

The field of AI that enables machines to interpret and understand visual information from images, video, and other visual inputs.

CUDA

NVIDIA proprietary parallel computing platform and API that enables developers to use NVIDIA GPUs for general-purpose processing and AI workloads.

D

Data Labeling

The process of annotating raw data with informative tags or labels that enable supervised machine learning models to learn from examples.

Data Pipeline

An automated workflow that extracts, transforms, and loads data from various sources into formats suitable for AI model training and inference.

Deep Learning

A subset of machine learning using neural networks with many layers to automatically learn hierarchical representations from large amounts of data.

Differential Privacy

A mathematical framework that provides provable privacy guarantees by adding calibrated noise to data or computations, preventing individual identification.

Diffusion Model

A generative AI architecture that creates data by learning to reverse a gradual noise-addition process, excelling at high-quality image and video generation.

E

Edge Inference

Running AI model inference directly on local devices or edge hardware near the data source, rather than sending data to cloud servers for processing.

Embeddings

Dense numerical vector representations that capture the semantic meaning of text, images, or other data in a high-dimensional space.

F

Feature Store

A centralized platform for managing, storing, and serving machine learning features consistently across training and inference pipelines.

Federated Learning

A distributed machine learning approach where models are trained across multiple devices or organizations without sharing raw data, preserving privacy.

Few-Shot Learning

The ability of AI models to learn and perform tasks from only a small number of examples provided in the prompt or training data.

Fine-Tuning

The process of further training a pre-trained model on a domain-specific dataset to improve its performance on targeted tasks.

Foundation Model

A large-scale AI model pre-trained on broad data that can be adapted to a wide range of downstream tasks through fine-tuning or prompting.

Function Calling

The ability of language models to generate structured output that invokes external functions or APIs, enabling interaction with external systems and data.

G

GAN (Generative Adversarial Network)

A generative model architecture consisting of two competing neural networks, a generator and discriminator, that train each other to produce realistic outputs.

Generative AI

AI systems capable of creating new content including text, images, code, audio, and video based on patterns learned from training data.

GPU Computing

The use of graphics processing units for general-purpose parallel computation, providing the massive throughput needed for training and running AI models.

Guardrails

Safety mechanisms and content filters applied to AI systems to prevent harmful, off-topic, or non-compliant outputs in production.

H

Hallucination

When an AI model generates plausible-sounding but factually incorrect, fabricated, or unsupported information in its output.

I

Inference

The process of running a trained AI model to generate predictions or outputs from new input data, as opposed to the training phase.

K

Knowledge Distillation

A training methodology where a compact student model learns to replicate the outputs and reasoning patterns of a larger, more capable teacher model.

Knowledge Graph

A structured representation of entities and their relationships that enables machines to understand connections and reason about domain knowledge.

L

Large Language Model (LLM)

A neural network with billions of parameters trained on massive text corpora that can understand, generate, and reason about natural language.

Latency Optimization

Techniques and engineering practices that reduce the response time of AI systems from input to output for better user experience and throughput.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that trains small adapter matrices instead of updating all model weights, dramatically reducing compute requirements.

M

Machine Learning

A branch of artificial intelligence where systems learn patterns from data to make predictions or decisions without being explicitly programmed for each scenario.

Mixture of Experts (MoE)

A neural network architecture that uses multiple specialized sub-networks and a routing mechanism to activate only relevant experts for each input.

MLOps

A set of practices combining machine learning, DevOps, and data engineering to reliably deploy and maintain ML models in production.

Model Distillation

A compression technique where a smaller student model is trained to replicate the behavior and performance of a larger teacher model.

Model Monitoring

The practice of continuously tracking AI model performance, data quality, and system health in production to detect degradation and trigger remediation.

Model Registry

A centralized repository for storing, versioning, and managing machine learning models throughout their lifecycle from development to production.

Model Serving

The infrastructure and systems that host trained AI models and handle incoming prediction requests in production environments.

Multimodal AI

AI systems that can process, understand, and generate content across multiple data types including text, images, audio, and video simultaneously.

N

Natural Language Processing (NLP)

The field of AI focused on enabling computers to understand, interpret, generate, and interact with human language in useful ways.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected layers of nodes that learn patterns from data through training.

NVIDIA NIM

NVIDIA Inference Microservices, a set of optimized containers that package AI models with TensorRT-LLM for high-performance, GPU-accelerated inference.

P

Perplexity

A metric measuring how well a language model predicts a text sample, with lower values indicating the model assigns higher probability to the actual text.

Prompt Engineering

The systematic practice of designing and optimizing input prompts to elicit accurate, relevant, and useful outputs from large language models.

Pruning

A model compression technique that removes unnecessary or redundant parameters from neural networks to reduce size and computational requirements.

Q

Quantization

The process of reducing AI model weight precision from higher-bit formats to lower-bit representations to decrease memory usage and increase inference speed.

R

RAG (Retrieval-Augmented Generation)

A technique that enhances large language model outputs by retrieving relevant documents from an external knowledge base before generating a response.

Red Teaming

The practice of systematically probing AI systems for vulnerabilities, failure modes, and harmful outputs through adversarial testing before deployment.

Reinforcement Learning

A machine learning paradigm where an agent learns optimal behavior through trial and error, receiving rewards or penalties for its actions in an environment.

Retrieval-Augmented Generation

The full term for RAG, an architecture that combines document retrieval with language model generation to produce grounded, accurate responses.

S

Semantic Search

Search technology that understands the meaning and intent behind queries rather than matching keywords, using vector embeddings for relevance.

Small Language Model (SLM)

A language model with fewer parameters, typically under 10 billion, optimized for specific tasks with lower compute requirements and faster inference.

Sovereign AI

AI infrastructure and models deployed within specific jurisdictional boundaries to comply with data residency, privacy, and regulatory requirements.

T

TensorRT

NVIDIA high-performance deep learning inference optimizer and runtime that maximizes throughput and minimizes latency on NVIDIA GPUs.

Tokenization

The process of splitting text into smaller units called tokens that language models process as their fundamental input and output elements.

Tokens

The fundamental units of text that language models process, representing words, subwords, or characters depending on the tokenization method.

Training Data

The curated dataset used to train or fine-tune machine learning models, directly determining model capabilities, biases, and limitations.

Transfer Learning

A machine learning technique where knowledge gained from training on one task is applied to improve performance on a different but related task.

Transformer

A neural network architecture based on self-attention mechanisms that processes input sequences in parallel, forming the foundation of modern large language models.

V

Vector Database

A specialized database designed to store, index, and query high-dimensional vector embeddings for efficient similarity search at scale.

Z

Zero-Shot Learning

The ability of AI models to perform tasks they were not explicitly trained on, using only natural language instructions without any task-specific examples.

Need Help Implementing These Technologies?

From RAG pipelines to MLOps infrastructure, we help enterprises build production AI systems.

Get in Touch