Model trained on a single modality such as text or images, as opposed to multimodal systems that process multiple input types together. ← Fair Launch