审查是在Prisma-SCR指导下进行的。搜索了六个数据库(Medline,Embase,Web of Science,IEEE Xplore,PubMed和Scopus),以搜索2/2/2024之前发布的相关记录。搜索术语与人工智能,预测,健康记录,纵向和癌症有关。提取了与文章的几个领域有关的数据:(1)出版细节,(2)研究特征,(3)输入数据,(4)模型特征,(4)可重复性,(5)使用Probast工具进行质量评估。根据与癌症检测和风险预测模型报告有关的术语框架进行了评估。
在筛选的653个记录中,审查中包括33个记录;10预测癌症的风险,18例进行癌症检测或早期检测,4例预测复发,1例预测转移。研究中预测的最常见的癌症是结直肠癌(n= 9)和胰腺癌(n= 9)。16研究使用功能工程来表示时间数据,其中最常见的特征代表趋势。18使用了深度学习模型,这些模型采用了直接的顺序输入,最常见的神经网络,还包括卷积神经网络和变压器。在研究之间,预测窗口和提前时间也有很大的不同,即使对于预测相同癌症的模型也是如此。在90%的研究中发现了高偏见的高风险。由于研究设计不当,通常引入这种风险(n= 26)和样本量(n= 26)。结论这篇评论强调了纵向数据的癌症预测方法的广度。
许多癌症预测模型使用横截面方法使用EHR数据,而无需考虑数据的时间方面。但是,可以探索纵向数据以充分利用EHRS中存储的信息[8]。应从上下文中查看患者的测量 - 随着时间的推移,随着时间的变化可能会提供有关患者健康的更多信息,而不是观察静态观察,并且最新的观察结果可能比遥远的观察更能提供信息。例如,虽然长期存在的糖尿病是胰腺癌的危险因素,但已建议新发糖尿病是无症状癌症的指标[9,,,,10,,,,11]。诸如实验室测试之类的数量也会受到患者间的可变性,这些值的变化可能比瞬时测量更具信息性[12]。纵向数据已成功地用于其他医疗领域,例如死亡率和败血症预测[13,,,,14,,,,15]。
搜索策略是由作者迭代制定的。搜索了六个数据库:Medline,Embase,Science Web,IEEE Xplore,Scopus和PubMed。搜索策略适用于每个数据库,但是,每个搜索都包含了与范围审查问题相关的每个概念有关的术语。这些概念是人工智能,“预测”,“ Ehrs”,“纵向”和“癌症”。完整的搜索词在附加文件1中提供。搜索是在15上进行的Th2023年8月,两者都更新了2ND2024年2月和9Th2024年8月,一年没有限制。搜索了引用和参考列表,以获取每项合格的研究,以检索初始搜索中未检索的其他记录。
研究使用了来自美国的人口(19,54%),荷兰(4,11%),台湾(5,14%),丹麦(2,6%),瑞典(1,3%),韩国(1,3%),以色列(1,3%)和新加坡(1,3%)。一项研究没有报告人口起源于何处,一项研究将来自英国的额外数据集作为验证集。五项研究(14%)使用单中心数据进行模型开发,20个(57%)使用了来自位置或医疗保健提供者连接的多个中心的数据,九项研究(25%)使用了全国范围的数据集,一项研究(3%)DID不报告研究设置。全国性的研究起源于瑞典,台湾,韩国和丹麦,而多中心的研究使用来自美国的附属实践的数据(n= 15),荷兰(n= 4)和以色列(n= 1)。研究使用了病例对照(9,26%),嵌套的病例对照(6,17%)或队列(20,57%)研究设计。在使用的数据集的设置中,四项研究使用了初级保健(11%),七个使用的二级护理(20%),23个使用了初级和二级护理数据(66%),一项没有报告。结果/预测任务
所选方法之间选择的临床变量有所不同。所有使用特征工程的研究都在其模型中使用了实验室测试,而仅使用顺序输入的模型中只有三分之一的模型也相同。此外,除了特征工程模型之一以外,所有使用人口统计数据都与三分之二(大约三分之二)相反(n= 12)顺序输入模型。模型特征
二十个研究使用了具有连续输入的深度学习模型,即原始序列或bined binned'分为离散的时间间隔。所使用的方法总结在表中2。
表中显示了转移和复发预测模型的窗口 5。一项研究包括对控制的随访[53]。两项研究将观察窗口的开始定义为与原发性癌症有关的特定临床事件[35,,,,40]。
13个研究(36%)有可在线使用的代码。两项研究使用了适合共同数据模型的数据:Kim等。使用了观察性医学结果伙伴关系共同数据模型(OMOP-CDM)[47]和Jia等。[28]使用的数据粘附到Trinetx标准数据模型[66]。一项研究[47]使用的数据可以在线免费获得。15项研究使用了可以要求或购买的数据集:退伍军人事务公司数据仓库[31,,,,32,,,,34,,,,52,,,,55],Kaiser Permanente南加州数据库[30,,,,31,,,,32,,,,33,,,,36],朱利叶斯一般从业者网络[38,,,,39,,,,46],Cerner健康事实[53,,,,54],HCUP状态住院数据库(SID)[50],IQVIA数据集[51]和trinetx [28]。六项研究使用的数据仅适用于原籍国的研究人员[27,,,,55,,,,58,,,,59,,,,61,,,,62]。
将EHR数据用于预测模型的挑战已被充分记录[8],主要与数据质量问题和不同临床医生和网站之间的记录不一致有关[67]。Recording of data may also change over time within a healthcare centres, therefore additional care should be taken when developing longitudinal models to ensure models are robust to temporal shift [68]。While EHR poses extra challenges to analysis, if models are intended for use within EHR systems they are likely to encounter the same quality issues.Consideration of these aspects in model development should make the resulting algorithms more robust to similar issues upon deployment.EHR data provides numerous benefits over prospectively collected data as it is more reflective of clinical practice and is not as expensive or time-consuming to collect.
The intended use case of models is a key consideration when selecting data sources. If a model is intended to be used for early detection, this should be reflected in the dataset by utilising data that would be available at the point of use. Where studies are to use linked primary and secondary data, it should be considered whether these data would be linked in practice as this has implications for clinical applicability. However, proof of concept research demonstrating improved disease detection from linked data can still be valuable as it provides motivation for cohesive electronic health record systems across healthcare networks and many countries are aiming towards linked health data in practice.
The most frequently considered cancers were colorectal and pancreatic, accounting for more than 50% of the included studies. These are likely commonly chosen due to the impact they have globally; colorectal cancer is the third most common cancer and the second leading cause of cancer death. Pancreatic is less common, ranking around 12th, but contributes to the 6th largest number of deaths, and is known to be difficult to diagnose. There is an unmet need for earlier diagnosis of rarer cancers. Although more data is available for patients with more common cancers, there is an opportunity to establish methods on those datasets so they can then be implemented and optimised for rarer cancers.
The choice of features has an impact on the choice of model and vice versaâmany of the approaches to feature engineering shown in Table 1, such as trend features and signal decomposition, would not be appropriate for categorical information such as diagnoses. Similarly, approaches to missing data differ between different types of variables; for categorical features, where the feature indicates whether the feature was present or not at that time, missing data do not need addressing, whereas for numerical features such as laboratory tests missing data must be imputed. This is particularly a problem in models requiring fixed inputs length inputs such as RNN based models and CNNs.
The methods identified in this review are summarised along with their advantages and limitations in Tables1和2分别。The most commonly used feature engineering method was absolute change in measurement, which is likely commonly chosen due to the ease of computation but requires expert knowledge to determine which times to calculate change between.The most common approach using sequential inputs was to use models based on RNNs.A general advantage of feature engineering is that the features can be used in relatively simple artificial intelligence algorithms, reducing the computational cost, although they require human input in crafting meaningful features.Alternatively, deep learning approaches have the capability to learn hidden patterns without the need for explicit crafting by an expert, including potentially undiscovered predictors.This gives rise to a key question;does added complexity increase accuracy, and does this increase justify the increase cost.The two approaches are rarely compared, and future research should aim to do this.
In addition, research should consider whether longitudinal data does improve the predictive capability of models. Few studies in this review compared longitudinal models to cross-sectional approaches, and those that did were not definitive in finding an improvement in performance although there was weak evidence to support an improvement, and no studies reported that longitudinal data harms predictions. Given the additional complexity and cost of incorporating longitudinal data, the question of whether this is justified should be considered.
As previously described, longitudinal data in healthcare provide specific challenges for prediction models. The methods found in this review address these in varying ways. Data irregularity was commonly addressed in feature engineering models by modelling patientsâ trajectories individually to infer values at specific time-points or by calculating slopes from available data. Sequential methods often coded the relative times of observations to provide context to the models or required direct imputation of missing data in the temporal axis. Data opacity was considered in a number of studies aiming to develop explainable methods. The level of explainability achieved by models varied by the approach taken; feature engineering models were more likely to provide model level explanations, which are often simple to implement. However, these may not be as useful for clinicians as prediction level explanations, which can help a user understand why the model classified a patient a certain way, but do require more complex methods to implement, increasing the computational cost of a model.
Risk prediction models were evaluated against the longitudinal model framework described in Fig. 1。All risk prediction models except one used the full study period as the observation window, and no studies evaluated models using different observation windows.In risk predictions studies, this was generally a universal time window for the entire cohort, for example, from 2003â2011.Using all available data as the observation window may result in better performance as the full history is used, providing more context for patient data.Conversely, using all available data may hinder performance, by introducing additional noise into models.In addition, using longer time sequences may increase complexity of models and increase computational expense.Given this potential trade-off, studies should aim to evaluate the impact of various observation windows on model performance.Similarly, only one study experimented with different prediction windows.The risk prediction windows used by other studies varied significantly, even when predicting the same cancer, suggesting there is not clear window that should be assumed without investigation.
Cancer detection models should report three quantities: the observation window, lead time window, and follow-up time. A number of studies used the full patient history as the observation window, which has the potential to introduce bias to models as cancer patients may have systematically shorter trajectories as a result, which may be detected by sequential input models. Potential bias should also be mitigated by ensuring there is sufficient follow up of the control population, as patients may have been diagnosed with the cancer of interest at a later date, indicating a present but as yet undiagnosed cancer. Only one study reported including any follow up time [27]。Lead time is a key parameter to consider in early detection models.Most early detection studies experimented with different lead times, which allows for interpretation of how prediction accuracy changes with distance from the event of interest.
In general, the reporting of time-windows was poor in metastasis and recurrence prediction models. This makes it difficult to not only assess potential bias in the models, but also makes the intended use-case unclear, i.e., where would the prediction be made and how would this aid a clinician. As previously explained, follow-up time should be reported in studies predicting recurrence or metastasis to rule out potentially undiagnosed patients and hence mislabelled occurrences.
Given that current research into the use of longitudinal health records is in the early stages and studies are generally proof-of-concept, the reproducibility of the research is vital to ensure future work can build upon findings. Despite this, only around a third of studies included in the review have code that is available. Due to the confidential nature of health data, open access data is rare, however the availability of commercial datasets such as those used by studies in this review provides the opportunity for comparative works. For research using these sources it is especially important to be clear about how cohorts were selected. Clear reporting of methods and study setting is vital for reproducibility. The recent publication of an AI extension to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD-AI) statement provides a checklist of reporting items that should be followed by predictive models using AI [69]。
The PROBAST assessment showed that most studies were at high risk of bias, with only three studies achieving low risk of bias overall. This is unsurprising given that the studies are generally exploratory, however the results highlight the common areas where risk of bias is introduced. The highest risk domains were domains 1 and 4, concerning study participants and analysis respectively. High risk in domain 1 was introduced due to caseâcontrol study designs and restrictions on participants (e.g., age based) without acknowledgement of how this affects the applicability of the model. These factors may be unavoidable due to data access restrictions; however, researchers should make the potential impact on the risk of bias clear and nested caseâcontrol studies should adjust for outcome frequency as described in the PROBAST framework [22]。In domain 4, common areas introducing risk of bias were an insufficient number of participants with the outcome, lack of follow-up periods for controls, inappropriate performance measures, and no accounting for overfitting.While low numbers of patients with the outcome is often determined by available data, the remaining areas can be mitigated by researchers through the following actions: ensuring follow-up of controls;reporting comprehensive performance measures, including both discrimination and calibration measures;and using cross-validation or bootstrapping to account for overfitting in the model.
In conducting this review, we identified several common areas for improvement which future work should aim to address:
Models should be reported using clear terminology, provided here or by Lauritsen et al.[25] Given the clinical application, it is especially important to clearly explain at which point the prediction model would be used, and which data would be available.
Given the lack of consensus on appropriate prediction windows for each of the cancers, studies should evaluate models at various time points to assess the optimal time windows for prediction. In addition, models should be evaluated against cross-sectional methods. It is not a given that adding longitudinal data improves performance, but it is likely to increase complexity. Researchers should aim to evaluate whether this added expense is justified for the problem.
To ensure reproducibility of research, studies using AI for prediction of cancer should adhere to the TRIPODâ+âAI statement to ensure methods are transparent [69]。Where possible, data should be made available for sharing.As this is often not appropriate for patient data, reporting of datasets should be comprehensive.Researchers should make the code used in the research publicly available.
When conducting cancer prediction research, researchers should be mindful of how bias may be introduced to the model. The forthcoming PROBAST-AI will provide guidance [24]。Mitigation strategies include ensuring sufficient follow-up of controls;reporting a variety of performance measures, including discrimination and calibration;and accounting for optimism and overfitting in the model using cross-validation or bootstrapping.
Strengths of the current study
This review has multiple strengths. Firstly, the scope of the review covered all longitudinal methods to include a wide range of methodologies, found through an exhaustive search strategy. In addition, the review adheres to the PRISMA guidance for conducting scoping reviews. The review also provides a PROBAST assessment, highlighting common areas where risk of bias is high in longitudinal models.
Limitations of the current study
This review has four main limitations:
Firstly, the records were only screened by one author. The impact of this was mitigated by taking a lenient approach to inclusion; articles were only excluded initially if the author was confident in their ineligibility. Where this was not the case, fellow authors were consulted to reach a consensus.
Secondly, a number of studies included in this review were found via citation and reference searching and were not captured as part of the initial search strategy. These studies were missed by the search strategy due to several factors; some did not state in the title or abstract that they included temporal data while some did not mention health records. Two terms were identified from these results that can be used to describe longitudinal data: âsequentialâ and âtrajectoryâ. While we are confident the most significant studies in the area were found, inclusion of these terms could have made the search strategy more comprehensive.
This review did not quantify the retrieved works as precisely as the framework described by Lauritsen [25] as this granularity would have impeded the ability to compare similar studies.
Finally, this review does not comment on the relative performances of each of the methods due to the heterogeneity of applications and datasets. The review can also not provide an answer as to whether longitudinal methods improve upon cross-sectional methods as this was rarely evaluated in the studies and is likely to be problem dependent.
The dataset supporting the conclusions of this article are included within the article and its additional files.
PubMed Google Scholar
文章 CAS PubMed Google Scholar
文章 PubMed Google Scholar
文章 CAS PubMed PubMed Central Google Scholar
文章 PubMed Google Scholar
文章 CAS PubMed Google Scholar
文章 PubMed PubMed Central Google Scholar
PubMed Google Scholar
章
Google Scholar
PubMed PubMed Central Google Scholar
Code used for searching each database is provided in Additional File 1.
This research was financially supported by the UK Research and Innovation Engineering and Physical Sciences Research Council (grant number EP/S024336/1/).
Additional File 5. This file contains the extracted data items describing the feature engineering and sequential input methods of the studies.
Moglia, V., Johnson, O., Cook, G.
等。Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.BMC Med Res Methodol25 , 24 (2025). https://doi.org/10.1186/s12874-025-02473-w下载引用