The Arc institute introduces State, a novel machine learning model designed to predict cellular responses to various perturbations, such as genetic or chemical treatments. The model tackles key challenges in the field, including cellular heterogeneity and technical variation across datasets, which often limit the generalization of existing computational models. State utilizes a multi-scale architecture, comprising a State Transition (ST) model that learns perturbation effects across cell populations and a State Embedding (SE) model that generates robust cell representations from vast observational data. Evaluated using a comprehensive framework called Cell-Eval, State consistently outperforms prior approaches in predicting gene expression changes and identifying differentially expressed genes across diverse cellular contexts, even in "zero-shot" scenarios where no perturbation data was seen during training for a given context. The authors also provide a theoretical analysis connecting State to optimal transport theory, indicating its capacity to learn precise transformations between cell states.
References: