Listen

Description

本期的 15 篇论文如下:

[00:24] 🔗 Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss(通过辅助损失耦合专家混合模型中的专家与路由器)

[01:07] 🎬 LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation(LiveTalk:通过改进的策略内蒸馏实现实时多模态交互式视频扩散)

[01:55] 🌍 Yume-1.5: A Text-Controlled Interactive World Generation Model(Yume-1.5:一种文本控制的交互式世界生成模型)

[02:30] 🔍 SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents(SmartSnap:自验证智能体的主动证据寻求范式)

[02:59] 🔮 Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation(扩散模型知晓透明度:将视频扩散模型重新用于透明物体的深度与法线估计)

[03:40] 🎯 SpotEdit: Selective Region Editing in Diffusion Transformers(SpotEdit:扩散变换器中的选择性区域编辑)

[04:23] 🚀 Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone(Dream-VL与Dream-VLA:基于扩散语言模型骨干的开放视觉-语言与视觉-语言-动作模型)

[05:09] 🔍 GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models(GRAN-TED:为扩散模型生成鲁棒、对齐且细致的文本嵌入)

[05:56] 🤖 Act2Goal: From World Model To General Goal-conditioned Policy(Act2Goal:从世界模型到通用目标条件策略)

[06:31] ⚡ Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion(Stream-DiffVSR:基于自回归扩散的低延迟可流式视频超分辨率)

[06:59] 🌐 Web World Models(Web世界模型)

[07:34] 🚀 DiRL: An Efficient Post-Training Framework for Diffusion Language Models(DiRL:一种高效的扩散语言模型后训练框架)

[08:19] 🎬 Video-BrowseComp: Benchmarking Agentic Video Research on Open Web(Video-BrowseComp:面向开放网络的智能体视频研究基准测试)

[09:02] 🧠 Training AI Co-Scientists Using Rubric Rewards(使用评分标准奖励训练AI科研助手)

[09:39] 🧩 Monadic Context Engineering(单子上下文工程)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递