Improving AI coaching with Gemini using real-world Fitbit data

2025-09-17 12:57:31 英文原文

作者:Justin KhasentinoAnastasiya BelyaevaCory Y. McLean

We created and curated three benchmark datasets to assess large language model (LLM) performance on sleep and fitness tasks ranging from answering expert questions to real-world coaching scenarios. Fine-tuning the Gemini LLM on real-world coaching tasks and self-reported sleep-quality outcomes improved its performance and provided a benchmark for further development.

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

27,99 € / 30 days

cancel any time

Subscription info for Chinese customers

We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

References

  1. Zheng, N. S. et al. Sleep patterns and risk of chronic disease as measured by long-term monitoring with commercial wearable devices in the All of Us Research Program. Nat. Med. 30, 2648–2656 (2024). A research article showing that sleep patterns captured by wearable devices are associated with chronic disease incidence.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Gemini Team, Google. Gemini: A family of highly capable multimodal models. Preprint at https://doi.org/10.48550/arXiv.2312.11805 (2023). A technical report introducing the Gemini 1.0 model family.

  3. McDuff, D. et al. The Google Health digital well-being study: protocol for a digital device use and well-being study. JMIR Res. Protoc. 13, e49189 (2024). A research protocol detailing the patient-reported outcomes dataset comprising self-reported sleep disturbance outcomes paired with daily-resolution numerical sensor data.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Cosentino, J. et al. Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models. Nat. Genet. 55, 787–795 (2023). A research article demonstrating that predicting health outcomes from raw high-dimensional data outperforms phenotyping that relies on a smaller set of hand-crafted features.

    Article  CAS  PubMed  Google Scholar 

  5. Belyaeva, A. et al. Multimodal LLMs for health grounded in individual-specific data. ML4MHD 14315, 86–102 (2024). A research article demonstrating that multimodal LLMs can estimate personal disease risk from high-dimensional clinical modalities.

    CAS  Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Khasentino, J. et al. A personal health large language model for sleep and fitness coaching. Nat. Med. https://doi.org/10.1038/s41591-025-03888-0 (2025).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Improving AI coaching with Gemini using real-world Fitbit data. Nat Med (2025). https://doi.org/10.1038/s41591-025-03988-x

Download citation

  • Published:

  • DOI: https://doi.org/10.1038/s41591-025-03988-x

关于《Improving AI coaching with Gemini using real-world Fitbit data》的评论


暂无评论

发表评论

摘要

Three benchmark datasets were created to evaluate large language models (LLMs) on sleep and fitness tasks, including expert question answering and real-world coaching scenarios. Fine-tuning the Gemini LLM on these datasets improved its performance, particularly with self-reported sleep quality outcomes, setting a new benchmark for further development.