Glossary Archive - Page 47 of 90 - Dapp Ahoy! World beyond Web3

August 22, 2025

Attention kernels that reduce memory/time complexity using tiling, FlashAttention, or linearized variants.

August 22, 2025

Planning algorithm that balances exploration/exploitation by sampling trajectories; used in agents.

August 22, 2025

Technique that accumulates velocity of gradients to smooth updates; basis for SGD with momentum, Adam.

August 22, 2025

Multi‑layer perceptron used inside transformer blocks for token‑wise transformations.

August 22, 2025

Core linear algebra operation dominating transformer compute; optimized with tiling and tensor cores.

August 22, 2025

Training with lower precision formats to save memory and increase throughput, with loss‑scaling.

August 22, 2025

Reducing numerical precision (e.g., FP16→INT8/4) to shrink models and speed inference.

August 22, 2025

Document describing model intended use, data, metrics, risks, and limitations.

August 22, 2025

Training and controls to ensure models follow human intent and avoid harmful behavior.

August 22, 2025

Architecture where a router activates subsets of expert MLPs per token, increasing capacity efficiently.

August 22, 2025

Pretraining objective where tokens are masked and predicted; used in BERT‑style models.

Dapp Ahoy! World beyond Web3