英语轻松读发新版了,欢迎下载、更新

Towards multimodal foundation models in molecular cell biology

2025-04-16 15:26:33 英文原文

作者:Wang, Bo

References

  1. Alberts, B. et al. Molecular Biology of the Cell 6th edn (W. W. Norton, 2020).

  2. Keller, E. F. Making Sense of Life: Explaining Biological Development with Models, Metaphors, and Machines (Harvard Univ. Press, 2002).

  3. Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004). A seminal review on network biology, elucidating how molecular interactions shape cellular and organismal function.

    Article  PubMed  Google Scholar 

  4. Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).

    Article  CAS  PubMed  Google Scholar 

  5. Goldberg, A. P. et al. Emerging whole-cell modeling principles and methods. Curr. Opin. Biotechnol. 51, 97–102 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Johnson, G. T. et al. Building the next generation of virtual cells to understand cellular biology. Biophys. J. 122, 3560–3569 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Karr, J. R., Takahashi, K. & Funahashi, A. The principles of whole-cell modeling. Curr. Opin. Microbiol. 27, 18–24 (2015).

    Article  CAS  PubMed  Google Scholar 

  8. Freddolino, P. L. & Tavazoie, S. The dawn of virtual cell biology. Cell 150, 248–250 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Georgouli, K., Yeom, J.-S., Blake, R. C. & Navid, A. Multi-scale models of whole cells: progress and challenges. Front. Cell Dev. Biol. 11, 1260507 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Karr, J. R. et al. A whole-cell computational model predicts phenotype from genotype. Cell 150, 389–401 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).

    Article  ADS  CAS  Google Scholar 

  12. Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 83 (2017). The potential of multi-omics in uncovering molecular underpinnings of diseases and informing precision medicine.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Regev, A. et al. Science Forum: the Human Cell Atlas. eLife https://doi.org/10.7554/eLife.27041 (2017). An introduction of the HCA initiative, a pivotal project for mapping cellular diversity across human tissues.

  14. Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116.e20 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Deng, Y. et al. Spatial-CUT&Tag: spatially resolved chromatin modification profiling at the cellular level. Science 375, 681–686 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. Swanson, E. et al. Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. eLife 10, e63632 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  21. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021). An overview of the concept, opportunities and challenges of foundation models for diverse artificial intelligence applications.

  22. Vaswani, A. et al. Attention is all you need. Preprint at https://arxiv.org/abs/1706.03762 (2017). An introduction of the transformer architecture, the cornerstone of modern foundation models.

  23. Brown, T. et al. Language models are few-shot learners. In Proc. 34th International Conference on Neural Information Processing Systems 1877–1901 (Curran Associates Inc., 2020). An introduction of GPT-3, a 175-billion parameter language model demonstrating strong few-shot learning capabilities across diverse natural language processing tasks.

  24. Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. 36th International Conference on Neural Information Processing Systems 27730–27744 (Curran Associates Inc., 2022).

  25. Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at https://arxiv.org/abs/2302.13971 (2023). An introduction to LLaMA, a suite of open-source language models (7B to 65B parameters) trained on publicly available data.

  26. Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

  27. llama3: The official meta Llama 3 GitHub site. GitHub https://github.com/meta-llama/llama3 (2024).

  28. Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10674–10685 (IEEE/CVF, 2022).

  29. Podell, D. et al. SDXL: improving latent diffusion models for high-resolution image synthesis. Preprint at https://arxiv.org/abs/2307.01952 (2023).

  30. Blattmann, A. et al. Stable video diffusion: scaling latent video diffusion models to large datasets. Preprint at https://arxiv.org/abs/2311.15127 (2023).

  31. Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. 37th International Conference on Neural Information Processing Systems 34892–34916 (Curran Associates Inc., 2023).

  32. Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Li, R., Li, L., Xu, Y. & Yang, J. Erratum to: Machine learning meets omics applications and perspectives. Brief. Bioinform. 23, bbab560 (2022).

    Article  PubMed  Google Scholar 

  34. Klein, D. et al. Mapping cells through time and space with moscot. Nature 638, 1065–1075 (2025).

  35. Brbić, M. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat. Methods 17, 1200–1206 (2020).

    Article  PubMed  Google Scholar 

  36. Brbić, M. et al. Annotation of spatially resolved single-cell data with STELLAR. Nat. Methods 19, 1411–1418 (2022).

    Article  PubMed  Google Scholar 

  37. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024). A deep learning model integrating gene–gene relationship knowledge graphs to predict transcriptional responses.

  40. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). An introduction to AlphaFold, a deep learning model achieving near-experimental accuracy in predicting protein structures.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  41. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. Preprint at bioRxiv https://doi.org/10.1101/2022.07.20.500902 (2022).

  43. ESM3: simulating 500 million years of evolution with a language model. EvolutionaryScale https://www.evolutionaryscale.ai/blog/esm3-release (2024). A frontier language model for biology that simultaneously reasons over the sequence, structure and function of proteins.

  44. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Cui, H. et al. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024). The development of scGPT, a generative pre-trained transformer model, leveraging over 33 million single-cell datasets to advance single-cell biology.

    Article  CAS  PubMed  Google Scholar 

  46. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). A large model pretrained on 30 million single-cell transcriptomes, facilitating accurate predictions in gene network biology.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Article  Google Scholar 

  48. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  49. Sverchkov, Y. & Craven, M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput. Biol. 13, e1005466 (2017).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  50. Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. Foster, A., Ivanova, D. R., Malik, I. & Rainforth, T. Deep adaptive design: amortizing sequential Bayesian experimental design. In Proc. 38th International Conference on Machine Learning Vol. 139 3384–3395 (PMLR, 2021).

  52. Rainforth, T., Foster, A., Ivanova, D. R. & Smith, F. B. Modern Bayesian experimental design. Statist. Sci. 39, 100–114 (2024).

  53. Vanlier, J., Tiemann, C. A., Hilbers, P. A. J. & van Riel, N. A. W. A Bayesian approach to targeted experiment design. Bioinformatics 28, 1136–1142 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Eyler, C. E. et al. Single-cell lineage analysis reveals genetic and epigenetic interplay in glioblastoma drug resistance. Genome Biol. 21, 174 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Chevrier, S. et al. An immune atlas of clear cell renal cell carcinoma. Cell 169, 736–749.e18 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Zhu, C., Preissl, S. & Ren, B. Single-cell multimodal omics: the power of many. Nat. Methods 17, 11–14 (2020).

    Article  CAS  PubMed  Google Scholar 

  58. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).

    Article  CAS  PubMed  Google Scholar 

  60. Battich, N. et al. Sequencing metabolically labeled transcripts in single cells reveals mRNA turnover strategies. Science 367, 1151–1156 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  61. Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Qiu, Q. et al. Massively parallel and time-resolved RNA sequencing in single cells with scNT-seq. Nat. Methods 17, 991–1001 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell 185, 690–711.e45 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genome. Preprint at https://arxiv.org/abs/2306.15006 (2023).

  66. Han, H. et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 46, D380–D386 (2018).

    Article  CAS  PubMed  Google Scholar 

  67. Liu, Z.-P., Wu, C., Miao, H. & Wu, H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database 2015, bav095 (2015).

  68. Margolin, A. A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S7 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).

    Article  CAS  PubMed  Google Scholar 

  71. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Qin, Q. et al. Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biol. 21, 32 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Kim, S. & Wysocka, J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol. Cell 83, 373–392 (2023).

    Article  CAS  PubMed  Google Scholar 

  74. Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  75. Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proc. 36th International Conference on Neural Information Processing Systems 26711–26722 (Curran Associates Inc., 2022).

  77. Joung, J. et al. A transcription factor atlas of directed differentiation. Cell 186, 209–229.e26 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  81. Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).

    Article  CAS  PubMed  Google Scholar 

  82. Stunnenberg, H. G., International Human Epigenome Consortium & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1897 (2016).

    Article  CAS  PubMed  Google Scholar 

  83. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Tabula Muris Consortium. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Article  ADS  Google Scholar 

  85. Yao, Z. et al. A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex. Nature 598, 103–110 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  86. CZI Single-Cell Biology Program et al. CZ CELL×GENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res. 53, D886–D900 (2025).

  87. Chameleon Team. Chameleon: mixed-modal early-fusion foundation models. Preprint at https://arxiv.org/abs/2405.09818 (2024).

  88. Gage, P. A new algorithm for data compression. C Users J. Arch. 12, 23–38 (1994).

    Google Scholar 

  89. OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  90. Barnum, G., Talukder, S. & Yue, Y. On the benefits of early fusion in multimodal representation learning. Preprint at https://arxiv.org/abs/2011.07191 (2020). An investigation into early-fusion strategies in multimodal learning, demonstrating that immediate integration of inputs enhances model performance and robustness.

  91. Liu, Z. et al. Swin Transformer: hierarchical vision transformer using Shifted Windows. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9992–10002 (IEEE/CVF, 2021).

  92. Fan, H. et al. Multiscale vision transformers. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6804–6815 (IEEE/CVF, 2021).

  93. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171–4186 (Association for Computational Linguistics, 2019).

  94. Grill, J.-B. et al. Bootstrap your own latent—a new approach to self-supervised learning. In Proc. 34th International Conference on Neural Information Processing Systems 21271–21284 (Curran Associates Inc., 2020).

  95. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds. Iii, H. D. & Singh, A.) 1597–1607 (PMLR, 2020).

  96. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning Vol. 139 8748–8763 (PMLR, 2021).

  97. AlQuraishi, M. & Sorger, P. K. Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat. Methods 18, 1169–1180 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Pan, S. et al. Unifying large language models and knowledge graphs: a roadmap. IEEE Trans. Knowl. Data Eng. 36, 3580–3599 (2024).

  99. Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).

    Article  ADS  CAS  PubMed  Google Scholar 

  100. Fabregat, A. et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

    Article  CAS  PubMed  Google Scholar 

  101. Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (Association for Computing Machinery, 2016).

  102. Hamilton, W. L., Ying, R. & Leskovec, J. Inductive representation learning on large graphs. In Proc. 31st International Conference on Neural Information Processing Systems 1–19 (Curran Associates Inc., 2017).

  103. Zhao, W. X., Liu, J., Ren, R. & Wen, J.-R. Dense text retrieval based on pretrained language models: a survey. ACM Trans. Inf. Syst. Secur. 42, 1–60 (2024).

    Google Scholar 

  104. Jeong, J. et al. Multimodal image-text matching improves retrieval-based chest X-ray report generation. In Proc. Machine Learning Research. Medical Imaging with Deep Learning Vol. 227 (eds Oguz, I. et al.) 978–990 (PMLR, 2024).

  105. Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).

    Article  PubMed  Google Scholar 

  106. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  107. Lv, L. et al. ProLLaMA: a protein large language model for multi-task protein language processing. Preprint at https://arxiv.org/abs/2402.16445 (2024).

  108. Debus, C., Piraud, M., Streit, A., Theis, F. & Götz, M. Reporting electricity consumption is essential for sustainable AI. Nat. Mach. Intell. 5, 1176–1178 (2023).

    Article  Google Scholar 

  109. Hu, E. J. et al. LoRA: low-rank adaptation of large language models. Preprint at https://arxiv.org/abs/2106.09685 (2021).

  110. Pfeiffer, J. et al. AdapterHub: a framework for adapting transformers. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 46–54 (Association for Computational Linguistics, 2020).

  111. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  112. Meyer, P. & Saez-Rodriguez, J. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Syst. 12, 636–653 (2021).

    Article  CAS  PubMed  Google Scholar 

  113. Saez-Rodriguez, J. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat. Rev. Genet. 17, 470–486 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Lance, C. et al. Multimodal single cell data integration challenge: results and lessons learned. In Proc. NeurIPS 2021 Competitions and Demonstrations Track Vol. 176 (eds Kiela, D., Ciccone, M. & Caputo, B.) 162–176 (PMLR, 2022).

  115. Liu, Z. et al. KAN: Kolmogorov–Arnold networks. Preprint at https://arxiv.org/abs/2404.19756 (2024).

  116. Maynez, J., Narayan, S., Bohnet, B. & McDonald, R. On faithfulness and factuality in abstractive summarization. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 1906–1919 (Association for Computational Linguistics, 2020).

  117. Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 55, 248 (2022).

  118. Manakul, P., Liusie, A. & Gales, M. J. F. SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing 9004–9017 (Association for Computational Linguistics, 2023).

  119. Yin, Z. et al. Do large language models know what they don’t know? In Proc. Findings of the Association for Computational Linguistics: ACL 2023 8653–8665 (Association for Computational Linguistics, 2023).

  120. Tian, K., Mitchell, E., Yao, H., Manning, C. D. & Finn, C. Fine-tuning language models for factuality. Preprint at https://arxiv.org/abs/2311.08401 (2023).

  121. Bommasani, R. et al. The foundation model transparency index. Preprint at https://arxiv.org/abs/2310.12941 (2023).

  122. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT−4. Preprint at https://arxiv.org/abs/2303.12712 (2023).

  123. Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. & Regev, A. Impact of the Human Cell Atlas on medicine. Nat. Med. 28, 2486–2496 (2022).

    Article  CAS  PubMed  Google Scholar 

  124. Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  125. Baysoy, A., Bai, Z., Satija, R. & Fan, R. The technological landscape and applications of single-cell multi-omics. Nat. Rev. Mol. Cell Biol. 24, 695–713 (2023).

    Article  CAS  PubMed  Google Scholar 

Download references

关于《Towards multimodal foundation models in molecular cell biology》的评论


暂无评论

发表评论

摘要

The document you provided appears to be a collection of citations from scientific articles and conference papers related to various aspects of machine learning, natural language processing, systems biology, and computational genomics. Here’s an overview and summary of some key points: ### Machine Learning and Natural Language Processing 1. **Foundation Models and Transparency**: - **Bommasani et al. (2023)**: Discuss the transparency index for foundation models, which assesses various aspects such as robustness, interpretability, and ethical considerations. 2. **Fine-Tuning for Factuality in Language Models**: - **Tian et al. (2023)**: Explore methods to fine-tune large language models to improve factuality, a crucial aspect of ensuring the reliability of generated text. 3. **Detecting Hallucinations and Unreliable Information**: - **Manakul et al. (2023)**: Introduce self-check mechanisms for detecting hallucinations in generative large language models without requiring extensive training data. - **Maynez et al. (2020)**: Examine the faithfulness and factuality issues in abstractive summarization, which are important for ensuring accurate generation of summaries. ### Systems Biology and Computational Genomics 1. **Crowdsourced DREAM Challenges**: - **Meyer & Saez-Rodriguez (2021)**: Highlight the effectiveness of crowdsourcing as an innovation engine in systems biology modeling. 2. **Single-Cell Data Integration**: - **Luecken et al. (2022)**: Benchmark different methods for integrating single-cell genomics data, emphasizing the importance of comprehensive analysis tools. 3. **Human Cell Atlas Impact**: - **Rood et al. (2022)**: Discuss how the Human Cell Atlas project is transforming medical research and understanding cellular landscapes. 4. **Single-Cell Multi-Omics Technologies**: - **Baysoy et al. (2023)**: Review advancements in single-cell multi-omics technologies, which enable comprehensive profiling of individual cells. ### Machine Learning Techniques 1. **Kolmogorov-Arnold Networks (KAN)**: - **Liu et al. (2024)**: Introduce a novel neural network architecture inspired by the Kolmogorov–Arnold representation theorem, which could offer new approaches to modeling complex functions. ### Multi-Modal and Single Cell Data Integration 1. **Multi-Modal Single Cell Challenge**: - **Lance et al. (2022)**: Detail results from a multi-modal single cell data integration challenge, highlighting challenges and lessons learned in integrating diverse types of biological data. These references cover a wide range of topics including model transparency, language generation robustness, systems biology modeling, single-cell genomics, and machine learning architectures. The citations indicate the multidisciplinary nature of current research efforts that integrate computational methods with biological sciences to advance our understanding of complex biological systems and develop more reliable AI technologies. If you need further details or specific insights from any of these references, feel free to ask!