Generative Depth Supervision for Embodied Vision-Language Models

Listen

Description

Vision-language model that adds generative depth prediction during pre-training for physical grounding; achieves SOTA on embodied benchiments and transfers directly to real-robot tasks.

Generative Depth Supervision for Embodied Vision-Language Models

Listen

Description

Want to check another podcast?