Base by Base
Malbranke C et al., Proceedings of the National Academy of Sciences (PNAS) - ProteomeLM is a transformer-based language model trained on complete proteomes that produces contextualized protein embeddings and attention signals which recover protein–protein interactions unsupervised and support supervised PPI and gene essentiality prediction across diverse taxa. Key terms: proteome language model, protein–protein interactions, gene essentiality, ProteomeLM, deep learning. Study Highlights: ProteomeLM was trained on ~32,000 proteomes using ESM‑C embeddings and a custom polar loss to reconstruct masked protein embeddings in proteome context. Its attention heads encode protein–protein interactions without supervision and distinguish direct physical binding, complex membership, and broader functional associations. As a fast first-pass filter it outperforms amino-acid coevolution (DCA) in recall while reducing compute by orders of magnitude. Downstream supervised models—ProteomeLM-PPI and ProteomeLM-Ess—achieve state-of-the-art cross-species PPI prediction and strong gene essentiality prediction that generalizes to held-out and synthetic minimal genomes. Conclusion: Representing proteins in whole-proteome context yields interpretable attention signals that capture functional and physical relationships, enabling rapid, accurate interactome screening and improved gene essentiality prediction across the tree of life. Music: Enjoy the music based on this article at the end of the episode. Article title: ProteomeLM: A proteome-scale language model enables accurate and rapid prediction of protein–protein interactions and gene essentiality across taxa First author: Malbranke C Journal: Proceedings of the National Academy of Sciences (PNAS) DOI: 10.1073/pnas.2524201123 Reference: Malbranke C, Zalaffi GP, Bitbol A-F. ProteomeLM: A proteome-scale language model enabling accurate and rapid prediction of protein–protein interactions and gene essentiality across taxa. Proc Natl Acad Sci U S A. 2026;123:e2524201123. doi:10.1073/pnas.2524201123 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/proteomelm-interactomes-essentiality QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-05-26. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited substantive scientific content in transcript: ProteomeLM architecture, functional encoding, polar loss, unsupervised PPI via attention, speed/screening benefits, supervised PPI (ProteomeLM-PPI), gene essentiality predictions (ProteomeLM-Ess), and cross-species/minimal cells. - transcript topics: ProteomeLM architecture and training on whole proteomes; Functional encoding using orthology (OrthoDB); Polar loss and avoiding reliance on coarse functional encoding; Attention coefficients encoding protein-protein interactions (PPI) in unsupervised manner; Unsupervised PPI detection and protein complex membership; Speed and scalability of whole-interactome screening vs DCA QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - ProteomeLM trained on ~32,000 annotated proteomes spanning the tree of life and uses a functional encoding via orthologous groups (OrthoDB). - ProteomeLM’s attention coefficients encode PPI without supervision (no interaction labels during training). - ProteomeLM enables fast whole-interactome screening and is substantially faster than DCA (up to six orders of magnitude); inference under 10 minutes per proteome on a single GPU. - Unsupervised PPI performance in Escherichia coli: a single attention head (head7, layer3) achieves AUC = 0.92. - ProteomeLM can distinguish direct interactions, same-complex interactions, and genetic associations; ribosome and TRiC/CCT complex analyses yield high AUC (>= 0.99 for some tests). - ProteomeLM-PPI achieves state-of-the-art supervised PPI predictions across species; ProteomeLM-Ess predicts gene essentiality; best reported AUC = 0.93 with layer-8 embeddings from QC result: Pass.
380 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y forma parte de la comunidad de Base by Base!