
Exploring how AI connects text, vision, and audio to interact with the real world.
How models "see" images and describe the visual world.
The tech behind human-like voices and real-time conversation.
Using one modality to drive another (e.g., text to video).
Build an app that tells a story based on photos you take.
An interactive demo that narratizes real-world surroundings.