The paper introduces Evo 2, a biological foundation model designed for genome modeling and design across all domains of life, including prokaryotes, eukaryotes, archaea, and bacteriophages. This advanced model, available in 7B and 40B parameter versions, was trained on OpenGenome2, a massive dataset of over 9.3 trillion nucleotides, emphasizing generalist capabilities rather than task-specific optimization. Evo 2 demonstrates strong predictive performance for mutational effects on protein, RNA, and organismal fitness, even for noncoding variations, and exhibits generative capabilities for novel genomic sequences, including full mitochondrial and minimal bacterial genomes. Furthermore, the research explores mechanistic interpretability of Evo 2, revealing how the model learns and represents complex biological features, and showcases its potential for controllable genomic design guided by external predictive models.
Reference: