![OpenAI and the road to text-guided image generation: DALL·E, CLIP, GLIDE, DALL·E 2 (unCLIP) | by Grigory Sapunov | Intento OpenAI and the road to text-guided image generation: DALL·E, CLIP, GLIDE, DALL·E 2 (unCLIP) | by Grigory Sapunov | Intento](https://miro.medium.com/max/1400/1*ofeJ4nWLdw8l27OqHJHM2w.png)
OpenAI and the road to text-guided image generation: DALL·E, CLIP, GLIDE, DALL·E 2 (unCLIP) | by Grigory Sapunov | Intento
Aran Komatsuzaki on Twitter: "+ our own CLIP ViT-B/32 model trained on LAION-400M that matches the performance of OpenaI's CLIP ViT-B/32 (as a taste of much bigger CLIP models to come). search
![Niels Rogge on Twitter: "The model simply adds bounding box and class heads to the vision encoder of CLIP, and is fine-tuned using DETR's clever matching loss. 🔥 📃 Docs: https://t.co/fm2zxNU7Jn 🖼️Gradio Niels Rogge on Twitter: "The model simply adds bounding box and class heads to the vision encoder of CLIP, and is fine-tuned using DETR's clever matching loss. 🔥 📃 Docs: https://t.co/fm2zxNU7Jn 🖼️Gradio](https://pbs.twimg.com/media/FZZ5L93WYAEBBXZ.jpg:large)