arXiv research summaries for Computation Vision and Pattern Recognition from October 24, 2023.
Today's Themes - Fair Warning - LLM-Generated Summary 😆
Image and Video Synthesis, Editing, and Manipulation
- Methods such as image inpainting, colorization, style transfer, generating images from text, video editing with text guidance, and synthesizing 3D scenes from images and text.
3D Computer Vision
- 3D object detection, 3D scene understanding, point cloud segmentation, and inverse rendering of 3D objects from images.
Self-supervised and Semi-supervised Learning Techniques
- Images, video, and multimodal data. Methods aim to make use of unlabeled data.
Object Detection and Recognition Architectures
- Including transformer-based models like DETR. Research looks at improving localization, classification, and handling occlusion.
Visual Question Answering and Reasoning
- Using images, video, and multimodal data with a focus on improving large language models. Techniques aim to reduce bias and hallucination.
Generation
- Methods for generating visually and semantically diverse image outputs for restoration tasks rather than sampling the posterior. Aims to provide more meaningful diversity.
Validation
- Using synthetic data for validation and continual learning to improve model robustness, avoid overfitting, and handle domain shift.
Applications
- Autonomous vehicles, robotics, medical imaging, human action analysis, image privacy and security, biometrics, etc.