Lightweight transformer module that learns query tokens to extract useful features from vision encoders for multimodal models. ← Quadratic Funding Fee Market →