Image model that applies transformer architectures to patch tokens, scaling well with data and compute, often used in multimodal and perception stacks. ← Fair Launch