Vec2Vec: Unsupervised Embedding Translation

Description

https://arxiv.org/html/2505.12540v2

This document introduces vec2vec, the first method for translating text embeddings between different models without paired data or encoders. This approach leverages a universal latent representation of text semantics, supporting the Strong Platonic Representation Hypothesis. The authors demonstrate that vec2vec successfully translates embeddings while preserving their geometric structure, enabling attribute inference and text inversion from unknown embedding sources, highlighting significant implications for data security and privacy in vector databases. Experiments show high translation accuracy and the ability to extract sensitive information even from out-of-distribution data and across unimodal and multimodal embedding spaces.

Listen

Description

Want to check another podcast?