Welcome to the Dreamscape
A small, wide-eyed robot closes its eyes. A glowing bubble floats above its head, filled with images of laundry, soft toys, and a robotic arm folding shirts in a sunlit bedroom. This isn't a Pixar short. It's a test project built using Google Veo, inspired by a very real breakthrough in robotic learning: NVIDIA's new DreamGen framework. (To clarify, I read a paper by Jim Fan about robots who dream of folding laundry and decided to try out Google Veo both in the same day. Hence, today, you have Google Veo videos of robots folding laundry while I talk about the Jim Fan paper. I’d like to clarify that these images were generated by me- and were in no way generated by NVIDIA.)
DreamGen is what happens when robotics meets imagination. Or more precisely, when foundation video models simulate millions of "what if" scenarios, giving robots access to experiences they’ve never actually lived. It's a kind of synthetic dreaming, and it just might redefine how machines learn physical tasks. As I discovered when I tried various prompts of robots folding laundry, I had a lot more failures than successes.
The Paper: Introducing DreamGen
Unveiled by Jim Fan and the NVIDIA GEAR team, DreamGen proposes a four-step recipe:
* Fine-tune a SOTA (state-of-the-art) video model on your target robot.
* Prompt it with language, asking how your robot would behave in imagined scenarios.
* Recover pseudo-actions from those videos using inverse dynamics or latent action modeling.
* Train a robot foundation model on this hallucination-turned-data, as if the robot had experienced it.
And the results? DreamGen took a robot trained only on a single task—pick-and-place—and enabled it to generalize to 22 new behaviors. That includes pouring, scooping, ironing, and hammering. With no human demos. No teleoperation. Just dreams.
Why It Matters
This is more than clever data augmentation. It's a shift in how we think about experience itself. Robots trained this way can generalize to unseen verbs and objects. In internal tests, success rates jumped from zero to over 43% on new verbs. For unseen environments, performance went from 0% to 28%.
In other words: the robot imagined folding a towel—then actually did it.
#robotics #robotdreams #droidsnewsletter
www.droidsnewsletter.com