The paper introduces CellWhisperer, a novel multimodal artificial intelligence model and software designed for the interactive exploration of single-cell RNA sequencing (scRNA-seq) data using natural-language chats. This system connects transcriptomes (RNA profiles) with textual annotations through multimodal deep learning, building on a large, AI-curated dataset of over a million matched pairs. CellWhisperer uses a fine-tuned large language model (LLM) to enable users to ask questions about cells and genes in plain English, supporting tasks like free-text searching and automated cell-cluster annotation in a zero-shot manner. The authors demonstrate the model's performance in predicting cell types, diseases, and tissues, and showcase its integration into the widely used CELLxGENE Explorer for user-friendly data analysis. Ultimately, this work establishes natural language as an intuitive channel for bioinformatics and represents a building block for future AI-based research assistants.
References:
- Schaefer M, Peneder P, Malzl D, et al. Multimodal learning of transcriptomes and text enables interactive single-cell RNA-seq data exploration with natural-language chats[J]. bioRxiv, 2024: 2024.10. 15.618501.