This Meta November 18 2025 paper details the development, training, and evaluation of **Segment Anything Model 3 (SAM 3)**, a promptable segmentation model for images and videos. A major focus is the creation of the **Segment Anything with Concepts (SA-Co) benchmark**, which uses a multi-stage data engine involving noisy pseudo-labels, human annotators, and AI verifiers to produce high-quality, large-scale training data with an extensive ontological coverage of concepts. The document also explores **model architecture components**, such as temporal disambiguation strategies for multi-object tracking in videos and an ambiguity head to handle multiple valid interpretations of a phrase. Finally, extensive **quantitative results** are presented, comparing SAM 3's performance against various state-of-the-art models across tasks like instance segmentation and object counting.
Source:
https://scontent-sjc6-1.xx.fbcdn.net/v/t39.2365-6/586037495_2236299700208804_3520531923593328648_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=nmZfwAXlWFIQ7kNvwGuKXcX&_nc_oc=Adnm9S5A81iwt1v5NK0_vEawxh12xF9LXksgiuxyQBYKt0QgFzDZlMMCfu1GtGLRR7g&_nc_zt=14&_nc_ht=scontent-sjc6-1.xx&_nc_gid=1CWvrmVm88pkpnwup5jdnA&oh=00_AfjvGlCU_0PFdvGqnjcfyQuKxfa3Qz18c_452htHpqMptw&oe=69251C89