Identifying prenatal risk factors of postpartum depression with machine learning

2025-10-04 05:19:14 英文原文

作者：Schwabe, Inga

Introduction

Postpartum depression (PPD) is characterized by a depressive episode within 12 months after childbirth¹. PPD affects approximately 12–15% of mothers worldwide² and about 10% of new mothers in the Netherlands specifically³, making PPD the most common psychiatric disorder among new mothers. PPD can have far-reaching consequences for the well-being of both mothers and their children⁴: Symptoms of PPD can negatively affect their physical and psychological health, social relationships, interactions with their partners, and overall quality of life⁵. For newborns, these maternal symptoms can have a negative impact on growth and development^6,7. Moreover, PPD can negatively influence breastfeeding behaviors, mother-infant interactions, and the bonding process between mother and child^5,6.

Fortunately, research suggests that early intervention, before symptoms arise, can prevent the onset of PPD⁸. The key to successful intervention is the early detection of women at risk for developing PPD^9,10. Therefore, accurately predicting which women are at risk has been a long-standing goal in clinical practice¹¹. Traditionally, researchers have relied on statistical models such as linear regression to identify risk factors and predict the onset of PPD. These efforts resulted in the identification of risk factors such as a history of depression and depression during pregnancy¹². Although linear regression can handle many risk factors at the same time, it is limited in predicting PPD when the number of predictors is large relative to the sample size¹³. Given that PPD likely arises from a combination of psychological, biological, and social factors, relying on traditional simple linear regression models may hinder our ability to accurately predict which mothers are at risk for developing PPD¹⁴.

To overcome these challenges, more recent research has utilized machine learning (ML) methods to identify risk factors for PPD¹⁵. Unlike traditional models, ML approaches are able to handle large datasets with numerous variables. ML allows for the integration of a wide range of psychological, biological, and social factors into one comprehensive predictive model. For example, Andersson et al.¹⁶, utilized ML models to analyze data from 4313 mothers, each assessed on a total of 50 pre- and postnatal variables, identifying self-reported symptoms of depression and anxiety during pregnancy as the most important risk factors for the later onset of PPD. Their best-performing ML model predicted new cases with 73% accuracy. Another study that used ML, involving 266,544 women and using variables from electronic health records (e.g. labor complications, history of medical diagnosis) to predict PPD, showed similar good predictive performance (AUC equal to 0.84)¹⁷. Here, the best-performing ML model identified the total number of drug prescriptions before- and during pregnancy as the most predictive risk factors. Identifying the total number of drug prescriptions as a predictor is plausible, as it likely reflects underlying psycho-physiological conditions that could increase the risk of PPD. However, because their models did not include variables such as depressive symptoms during pregnancy, we cannot assess the predictive value of these symptoms in their study.

The results of the two aforementioned studies demonstrate the potential of the ML framework in identifying mothers at risk for developing PPD. However, despite the promising results, several challenges remain unaddressed. Prior studies have focused on a narrow subset of potential risk factors, exploring only medical, biological, or psychosocial variables, potentially overlooking important factors or the predictive power of their combination^14,18. Moreover, when studies examine only a few predictors at a time, it becomes difficult to assess the relative importance of these factors compared to others within one model, potentially missing how they interact or contribute collectively to PPD risk (e.g., Amit et al.¹⁷). Furthermore, most studies (e.g. Andersson et al.¹⁶) include postpartum variables in their predictive models¹⁸. If the aim is to proactively predict PPD, relying on risk factors that only emerge after its onset misses the critical window for early intervention. Additionally, existing literature indicates that the relative importance of risk factors may vary throughout pregnancy¹⁹. Certain factors may emerge as risk factors in one trimester, yet diminish in predictive value in subsequent trimesters. This variability suggests that the approach to assessing and responding to PPD risk should be tailored to each trimester. Another research question that has remained unaddressed in prior research is how early during pregnancy PPD can be accurately predicted²⁰.

Addressing these gaps, our study aims: (1) to determine the earliest point during pregnancy at which PPD severity (as measured by Edinburgh Depression Scale scores at 8–10 weeks after delivery) can be reliably predicted and (2) to identify the most predictive risk factors for every trimester separately. The Edinburgh (Postnatal) Depression Scale (E(P)DS)²¹ is a validated 10-item questionnaire widely used to screen for postpartum depression, assessing symptoms experienced over the past week²². To address our research question regarding early prediction, we start with data collected during the first trimester and progressively add data from the second and third trimesters. Importantly, this process aligns with real-world clinical workflows by continuously integrating all available data from prenatal check-ups with midwives and/or gynecologists, ensuring the entire set of available risk factors (for example through patient records) can be utilized during the different stages of pregnancy. The data comprised 2865 women and 233 variables (including biological, psychological and, social variables) measured at 12, 20, and 28 weeks (i.e., once per trimester). Importantly, many of the considered variables in this study are often already measured during standard check-ups by midwives or gynecologists. Therefore, our results can inform clinical practice in two ways: First, by informing midwives and gynecologists which variables are important to monitor at what trimester (to predict if a mother is at risk for developing PPD), and second, by identifying risk factors that are not currently measured by default but turn out to be important in predicting the onset of PPD.

Results

Table 1 presents descriptive statistics for the Brabant Study sample, categorized by PPD status. Participants with a sum-score equal to 10 or above on the EDS were classified in the PPD group (N = 397), while those with scores below 10 were categorized in the non-PPD group (N = 2468). Depending on the variable type, either the median or percentage per category was calculated. Additionally, Wilcoxon rank-sum test or chi-square tests were conducted to determine whether the two groups differed with respect to the respective variable. On average, mothers in the PPD group differed from the non-PPD group on known PPD risk factors such as a history of anxiety and depression and enduring a significant life event during pregnancy among others. Mothers in this sample were mostly highly educated and were married or living together.

Table 1 Descriptive statistics.

Model performance

We evaluated model performance at each stage using the mean squared error (MSE) in the testing sample, which represents the mean squared difference between predicted and actual continuous EDS scores (see Fig. 1). We compared three models: (1) one trained only on first-trimester data; (2) one incorporating data from the first and second trimesters; and (3) one including data from all three trimesters.

In the first step, utilizing only first-trimester data, the LassoLars model outperformed all other models, achieving the lowest test MSE among those evaluated. Similarly, in the second step, with the addition of second-trimester data, the LassoLars model again demonstrated the best performance. In the final step, when third-trimester data was included, thus using all available prenatal data, the Lasso linear regression model performed the best. The test MSE and test R² for each best-performing model are presented in Table 2. Statistics for the remaining models are not presented here, but can be obtained from the first author.

Table 2 Best-performing model and performance measures per modeling step. MSE = mean squared error.

Classification performance metrics for the best-performing models at each modeling step are presented in Table 3. Overall, accuracy, negative predictive value (NPV), and specificity remained stable and consistently high across all modeling steps. The area under the precision-recall curve (AUC-PR), sensitivity, and positive predictive value (PPV) were lowest when using only first-trimester data but showed improvement as data from later trimesters were incorporated. However, sensitivity remained relatively low in all models. This means that our models are excellent at predicting non-cases (i.e., women who will not develop PPD after giving birth), but weaker at predicting cases (i.e., women who will develop PPD after birth).

Table 3 Classification performance metrics per modeling step based on the best-performing models at each modeling step (i.e., LassoLars for models estimated based on the first-trimester only and the first and second-trimester and Lasso for all prenatal data).

Hyperparameters

We applied k-Nearest Neighbors (kNN) imputation, where each missing value is inferred from the k most similar cases, and tuned k to minimize error²³. We found that k = 7 resulted in the lowest overall prediction error across all trimesters and modeling steps. For the lambda hyperparameter in the Lasso-based models, setting λ = 0.1 consistently achieved the best performance across all trimesters.

Predictive risk factors

The predictive risk factors identified by the best-performing models at each modeling step are presented in Figs. 2, 3 and 4. Descriptive statistics concerning the top 25 most predictive risk factors can be retrieved from Supplementary Table 1.

First trimester

When using only first-trimester data, the most predictive risk factor was a mother’s sum score on the EDS (measured during the first-trimester, around 12 weeks of pregnancy) where a higher score was predictive for a higher score on the EDS scale after delivery (see Fig. 2). Furthermore, important predictors for a high score on the EDS after delivery included a high sum score on the negative affectivity subscale of the Type D Scale-14 (DS14)²⁴ (a higher score reflecting a higher tendency towards negative emotions) and on the Tilburg Pregnancy Distress Scale (TPDS)²⁵ (a higher score reflecting higher levels of pregnancy distress). Lastly, important predictors were self-reported history of mental disorder treatment (during lifetime), and whether the mother has rheumatism (lifetime, self-reported).

First and second trimesters

When second-trimester data was added to the model, EDS scores from both the first and second trimesters emerged as important predictors (see Fig. 3). Aside from the EDS score at 20 weeks of pregnancy, the other important predictors were similar to what we found for the prediction based on the first trimester only, the second-trimester BMI (calculated using self-reported height and weight) emerged as an important risk factor, while rheumatism no longer belonged to the top 25 predictors.

First, second and third trimesters

With the addition of third-trimester data, EDS scores from all trimesters consistently appeared as the most important predictors. Similar to the second modeling step, negative affectivity (as measured by the DS14 at 12 weeks), second-trimester BMI, and the (self-reported) history of mental disorder treatment continued to be important risk factors for a high EDS score after delivery (see Fig. 4). Additionally, a mother’s score on the neuroticism subscale from the Big Five Inventory 2-S (BFI-2-S)²⁶ was identified as an important variable in this final modeling step.

Discussion

The aim of this study was: (1) to determine how early during pregnancy postpartum depression (PPD) can be predicted and (2) to identify the most predictive prenatal risk factors for developing PPD (measured at 8–10 weeks after delivery). To achieve this, we evaluated the performance of nine machine learning models. These models were progressively trained with increasing amounts of data: the first model included only first-trimester data, and the final model used variables collected throughout the entire pregnancy (i.e., data collected at all three trimesters). This process mimics clinical practice, where health professionals accumulate data from pregnant women over the course of pregnancy. Firstly, our results suggested that women at low risk for PPD can already be identified based on variables collected during the first trimester. Secondly, we identified the following variables as the most important prenatal predictors of PPD: depression during pregnancy, negative affectivity, pregnancy distress and history of mental disorder treatment.

Our findings indicate that PPD can be predicted as early in pregnancy as 12 weeks, using variables collected during the first trimester. Although model performance improved slightly with the addition of data from the second and third trimesters, the mean MSE remained relatively stable. This suggests that additional information from later trimesters does not drastically improve the prediction models, and first-trimester variables are sufficient for early prediction of PPD. Importantly, the high specificity of our models demonstrates that we can accurately identify women who are unlikely to develop PPD, allowing healthcare providers to withhold intensive interventions from this low-risk group and focus resources on those who need them most. However, comparing our MSE values to those from other studies is difficult because most researchers treat PPD prediction as a classification task, dichotomizing the EDS score at a clinical cutoff, rather than as a regression problem¹⁸. Nevertheless, most classification performance metrics found in this study are comparable to those reported in similar studies.

We must however note that the sensitivity (i.e., 0.19) found in this study was lower than found in previous similar work. For example, Andersson et al.¹⁶, predicted PPD at six weeks postpartum and reported a sensitivity of 0.71. Although predictive performance cannot be directly compared across studies due to differences in sample size, variables, and ML algorithms, we still consider different possible explanations for the observed differences. First, many existing studies focus on “nowcasting” rather than actual forecasting, using postpartum data collected at the same time as the to be predicted depressive symptoms. This essentially describes how PPD presents itself (i.e., its indicators and manifestation) rather than predicting it. In contrast, our study aimed to predict PPD with data gathered during pregnancy alone. This strategy is clinically preferable, as it enables early intervention during pregnancy to prevent PPD before its onset. Second, our models were optimized to minimize MSE rather than classification-specific metrics such as sensitivity, which might partially account for the lower sensitivity. Furthermore, the EDS distribution was heavily right-skewed (i.e., a floor effect), with the majority of mothers with a total score below 10. This skewness limits the model’s ability to learn patterns from the data of mothers at higher risk of developing PPD, as there is relatively less information available for this group of mothers. Finally, predicting PPD may be challenging because research indicates that it is not a single, homogeneous disorder²⁷. Many studies, including Osborne et al.²⁸, suggest that PPD consists of multiple subtypes, each associated with different risk factors. Specifically Osborne et al.²⁸, propose that women who have no prenatal depressive symptoms yet develop PPD differ hormonally from those who experience depression during pregnancy and then develop PPD, highlighting the disorder’s heterogeneity.

Our findings further indicate that self-reported depressive symptoms during pregnancy are the most predictive risk factor across all trimesters. This finding, consistent with other recent research, is encouraging because the EDS is a simple, 10-item questionnaire that is easy to administer and score, making it feasible for routine screening during pregnancy²⁹. This closely aligns with the priorities of healthcare providers, who have indicated the need for quick and easy to administer screening tools³⁰.

In addition to self-reported depressive symptoms, several other psychological risk factors were identified as important. These included self-reported measures of pregnancy distress (TPDS)²⁵, neuroticism (BFI-2-S)²⁶, negative affectivity (DS14)²⁴, and a history of treatment for mental disorders. Personality traits such as neuroticism and negative affectivity (the latter being a dimension of Type D personality) have both been linked to PPD in previous research^31,32. Individuals with high levels of neuroticism tend to experience more negative emotions due to heightened stress reactivity³³. The postpartum period, being particularly stressful, has the potential to magnify these negative emotions, increasing the risk of PPD. Similarly, individuals with high negative affectivity have a general predisposition toward negative emotions, which has been associated with an increased risk of PPD³⁴.

In addition to the psychological variables, biological risk factors such as low BMI and rheumatism were also predictive of PPD. Previous research has shown that low BMI is a risk factor for PPD independent of prior depression history, possibly because mood regulation is closely linked to nutritional status³⁵. Note that elevated BMI has also been linked to increased PPD risk, likely via pro‑inflammatory pathways³⁶. However, the underrepresentation of women with high BMI in our cohort reduced the statistical power to replicate this finding. Various types of rheumatic diseases have previously been linked to PPD³⁷. Rheumatoid arthritis, specifically, was found to be associated with an increased risk of PPD among women without a history of psychiatric disorders³⁸. Although our study did not distinguish between different types of rheumatic conditions, it is plausible that similar physiological and psychological mechanisms, such as chronic inflammation, hormonal fluctuations, and immune system dysregulation, underlie the association between rheumatism and PPD.

Our study has several strengths and limitations. One strength is that our research closely aligns with clinical practice, as risk factors were measured in parallel with routine check-ups by midwives and/or gynecologists in the Netherlands. This means that the identified risk factors can be more easily implemented into routine screening protocols. Another strength is the inclusion of a wide range of potential risk factors, which allowed us to evaluate the relative predictive performance of these variables within a single model. Furthermore, the large sample size enhances the robustness and generalizability of our machine learning models.

However, a limitation of this study is the non-representative sample, as it primarily includes highly educated white mothers who are married or cohabiting with their partners. This may limit the generalizability of our findings to more diverse populations. In addition, the right-skewed distribution of EDS scores and our choice to optimize for mean squared error rather than sensitivity potentially constrained the model’s ability to detect true positives among women at highest risk. To address these limitations, future studies should recruit more varied demographic and clinical subgroups (including different socioeconomic and ethnic groups) and incorporate additional predictive data such as perinatal biomarkers (e.g. DNA methylation biomarkers, inflammatory markers) and longitudinal data collected at more frequent intervals to improve sensitivity and ensure that models can accurately identify women who will develop PPD.

Nevertheless, our findings are highly relevant for clinical practice. Midwives can use trimester-specific risk factors to tailor their screening at each stage of pregnancy, as our research suggests that while some risk factors remain consistent, others may vary across trimesters. By identifying the most predictive risk factors early, healthcare providers can implement targeted interventions sooner. Moreover, our results indicate that we can accurately identify mothers-to-be who are unlikely to develop PPD, thereby enabling more efficient allocation of healthcare resources to those at higher risk and potentially improving prevention and intervention efforts. It should be noted that, while including second- and third-trimester data only slightly enhances predictive performance, continued monitoring of risk factors throughout pregnancy and into the postpartum period remains essential, as certain symptoms may emerge later.

In summary, this study demonstrates the potential of machine learning models, particularly regularized linear regression techniques, to predict PPD using prenatal data only. The consistent role of depressive symptoms, along with psychological traits such as neuroticism and negative affectivity across trimesters, underscores the need for early mental health screening during pregnancy. By identifying those at risk earlier, healthcare professionals can intervene sooner, improving both maternal and child outcomes. These findings suggest that integrating routine psychological assessments into prenatal care could enhance early detection of PPD risk, allowing for timely referrals to mental health services, ultimately reducing the prevalence and impact of PPD.

Methods

This study was pre-registered at the Open Science Framework (https://osf.io/6kdt9). Note that we deviated from the pre-registered plan in the sense that we did not include potential risk factors collected after delivery (postpartum) as these would represent indicators of PPD rather than true prenatal predictors. Our focus was on identifying prenatal variables to predict the onset of PPD and facilitate early intervention.

Data description

The data for this study were collected from the Brabant Study³⁹, a longitudinal, prospective cohort study conducted in the Netherlands (South-East Brabant) between May 2018 and January 2023. Pregnant women aged 18 years or older, who had a sufficient understanding of the Dutch language and attended their antenatal visit before 12 weeks of gestation, were eligible for inclusion. Exclusion criteria included multiple pregnancies; known endocrine disorders prior to pregnancy (excluding thyroid function issues); type 1 diabetes; rheumatoid arthritis; severe psychiatric conditions such as schizophrenia, borderline personality disorder, or bipolar disorder; HIV infection; drug or alcohol addiction problems; any other disease requiring treatment with medications potentially harmful to the fetus and necessitating careful monitoring during pregnancy; and lack of internet access. Participants were followed throughout their pregnancy up to 10 weeks postpartum, with data collected at four time points: 12, 20, and 28 weeks of gestation, and 8–10 weeks postpartum. The outcome variable was the Edinburgh Postnatal Depression Scale (EPDS) sum score at 8–10 weeks postpartum. Predictor variables consisted of psychological factors (e.g., anxiety, depression during gestation, depression history), biological factors (e.g., BMI, thyroid function parameters like TSH and fT4 obtained via blood samples), and social factors (e.g., work performance, partner support). For more details on the Brabant Study, see the Brabant Study design paper by Meems et al.³⁹. A complete list of all included variables, including a description, can be retrieved from the first author.

Ethics declaration

The Brabant Study was approved by the Medical Ethics Committee at the Máxima Medical Centre Veldhoven (NL64091.015.17). The study was conducted in compliance with the 1975 Declaration of Helsinki and its 2013 revision. All participants provided written informed consent.

Data preprocessing

Data partitioning for train/test

Data partitioning for training and testing involved randomly splitting the complete dataset into an 80–20 ratio. Eighty percent of the data was utilized for training a particular model, whereas the remaining 20% served as a validation set for assessing each model’s final performance.

Missing data

Variables from the Brabant Study with more than 25% missing were withheld from model training. Missing values that remained left were handled using a k-Nearest Neighbors (kNN) based method²³. Following established practices within the field of machine learning research⁴⁰, we chose the number of neighbors (k) to be used based on the MSE on the validation set.

Modeling

Computational models

Given our dual interest in predicting the postpartum EDS score and the identification of risk factors, we exclusively focused on ML algorithms that incorporated embedded variable selection. We applied the following algorithms: linear regression with Lasso regularization⁴¹, least angle regression with Lasso modification⁴², linear regression with elastic net regularization⁴³, support vector machine for regression with lasso regularization⁴⁴, support vector machine for regression with elastic net regularization⁴⁵, regression tree⁴⁶, random forest⁴⁷, gradient boosting machine⁴⁸ and XGBoost⁴⁹.

Model evaluation

Model selection and hyperparameter tuning were performed to optimize the Mean Squared Error (MSE) on the validation set. We reported the explained variance on the validation set (R²). After the primary model evaluation, we conducted a post hoc classification of mothers in the test set into either the PPD or non-PPD category based on their EDS scores. Mothers with an EDS score of 10 or higher were classified as PPD. This cutoff score has been validated for case identification and is designed to maximize sensitivity, accepting a modest rate of false positives²². Summary statistics of the most predictive variables from the best-performing model are analyzed. For continuous variables, the summary statistics include the median and IQR. For categorical variables, percentage distributions across categories are provided. Depending on the model type, either variable importance scores or regression coefficients are presented for the final models.

Analysis plan

This study aimed to determine how early in pregnancy postpartum depression (PPD) can be predicted and to identify the most predictive risk factors in each trimester. Initially, we restricted the ML models to use data from the first trimester only, mimicking the information typically available at a midwife’s first consultation moment. We then gradually incorporated data from later trimesters, eventually including all available prenatal data.

At each modeling step, we trained and optimized each model using cross-validation. The model with optimal hyperparameters was then used to predict the EDS scores in the validation set. From the best-performing model at each step, we selected the most predictive risk factors. Assessing model performance at each stage provided insights into when during pregnancy PPD can be accurately predicted.

All analyses were implemented in Python 3. The code used for analysis is publicly available (https://github.com/LSibbald/PPD_prediction).

Data availability

The dataset used in this study is accessible upon reasonable request from the corresponding author, provided that data transfer agreements are established in compliance with current regulations.

References

Batt, M. M., Duffy, K. A., Novick, A. M., Metcalf, C. A. & Epperson, C. N. Is postpartum depression different from depression occurring outside of the perinatal period?. Rev. Evid. Focus Am. Psych. Publ. 18(2), 106–119. https://doi.org/10.1176/appi.focus.20190045 (2020).
Article Google Scholar
Liu, X., Wang, S. & Wang, G. Prevalence and risk factors of postpartum depression in women: A systematic review and meta-analysis. J. Clin. Nurs. 31(19–20), 2665–2677. https://doi.org/10.1111/jocn.16121 (2022).
Article PubMed Google Scholar
Trimbos Instituut (n.d.). Zwangerschap en Postpartum Depressie. https://www.trimbos.nl/kennis/depressiepreventie/zwangerschap-en-depressie/
Stewart, D. E. & Vigod, S. N. Postpartum depression: Pathophysiology, treatment, and emerging therapeutics. Annu. Rev. Med. 70, 183–196. https://doi.org/10.1146/annurev-med-041217-011106 (2019).
Article CAS PubMed Google Scholar
Slomian, J., Honvo, G., Emonts, P., Reginster, J. Y. & Bruyère, O. Consequences of maternal postpartum depression: A systematic review of maternal and infant outcomes. Womens Health. 15, 1745506519844044. https://doi.org/10.1177/1745506519844044 (2019).
Article CAS Google Scholar
Oyetunji, A. & Chandra, P. Postpartum stress and infant outcome: A review of current literature. Psych. Res. 284, 112769. https://doi.org/10.1016/j.psychres.2020.112769 (2020).
Article Google Scholar
Rogers, A. et al. Association between maternal perinatal depression and anxiety and child and adolescent development: A meta-analysis. JAMA Pediatr. 174(11), 1082–1092. https://doi.org/10.1001/jamapediatrics.2020.2910 (2020).
Article PubMed Google Scholar
Werner, E., Le, H. N., Babineau, V. & Grubb, M. Preventive interventions for perinatal mood and anxiety disorders: A review of selected programs. Semin. Perinatol. 48(6), 151944. https://doi.org/10.1016/j.semperi.2024.151944 (2024).
Article PubMed Google Scholar
Garapati, J. et al. Postpartum mood disorders: Insights into diagnosis, prevention, and treatment. Cureus 15(7), e42107. https://doi.org/10.7759/cureus.42107 (2023).
Article PubMed PubMed Central Google Scholar
Sockol, L. E., Epperson, C. N. & Barber, J. P. Preventing postpartum depression: A meta-analytic review. Clin. Psychol. Rev. 33(8), 1205–1217. https://doi.org/10.1016/j.cpr.2013.10.004 (2013).
Article PubMed PubMed Central Google Scholar
O’Hara, M. W. & McCabe, J. E. Postpartum depression: Current status and future directions. Annu. Rev. Clin. Psychol. 9, 379–407. https://doi.org/10.1146/annurev-clinpsy-050212-185612 (2013).
Article PubMed Google Scholar
Hymas, R. & Girard, L. C. Predicting postpartum depression among adolescent mothers: A systematic review of risk. J. Affect. Disord. 246, 873–885. https://doi.org/10.1016/j.jad.2018.12.041 (2019).
Article PubMed Google Scholar
James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning Second edition. (Springer, New York, 2013).
Book Google Scholar
Yim, I. S., Tanner Stapleton, L. R., Guardino, C. M., Hahn-Holbrook, J. & Dunkel Schetter, C. Biological and psychosocial predictors of postpartum depression: Systematic review and call for integration. Annu. Rev. Clin. Psychol. 11, 99–137. https://doi.org/10.1146/annurev-clinpsy-101414-020426 (2015).
Article PubMed PubMed Central Google Scholar
Zhong, M., Zhang, H., Yu, C., Jiang, J. & Duan, X. Application of machine learning in predicting the risk of postpartum depression: A systematic review. J. Affect. Disord. https://doi.org/10.1016/j.jad.2022.08.070 (2022).
Article PubMed PubMed Central Google Scholar
Andersson, S., Bathula, D. R., Iliadis, S. I., Walter, M. & Skalkidou, A. Predicting women with depressive symptoms postpartum with machine learning methods. Sci. Rep. 11(1), 7877. https://doi.org/10.1038/s41598-021-86368-y (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Amit, G. et al. Estimation of postpartum depression risk from electronic health records using machine learning. BMC Pregnancy Childbirth 21(1), 630. https://doi.org/10.1186/s12884-021-04087-8 (2021).
Article PubMed PubMed Central Google Scholar
Cellini, P., Pigoni, A., Delvecchio, G., Moltrasio, C. & Brambilla, P. Machine learning in the prediction of postpartum depression: A review. J. Affect. Disord. https://doi.org/10.1016/j.jad.2022.04.093 (2022).
Article PubMed Google Scholar
Yoo, H. et al. Factors influencing prenatal and postpartum depression in Korea: A prospective cohort study. Korean J. Women Health Nurs. 27(4), 326–336. https://doi.org/10.4069/kjwhn.2021.11.17 (2021).
Article PubMed PubMed Central Google Scholar
Miller, E. S. et al. Screening and treatment after implementation of a universal perinatal depression screening program. Obstet. Gynecol. 134(2), 303–309. https://doi.org/10.1097/AOG.0000000000003369 (2019).
Article PubMed Google Scholar
Cox, J. L., Holden, J. M. & Sagovsky, R. Detection of postnatal depression: Development of the 10-item Edinburgh postnatal depression scale. Br. J. Psych. 150(6), 782–786. https://doi.org/10.1192/bjp.150.6.782 (1987).
Article CAS Google Scholar
Gibson, J., McKenzie-McHarg, K., Shakespeare, J., Price, J. & Gray, R. A systematic review of studies validating the Edinburgh postnatal depression scale in antepartum and postpartum women. Acta Psychiatr. Scand. 119(5), 350–364. https://doi.org/10.1111/j.1600-0447.2009.01363.x (2009).
Article CAS PubMed Google Scholar
Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525. https://doi.org/10.1093/bioinformatics/17.6.520 (2001).
Article CAS PubMed Google Scholar
Denollet, J. DS14: Standard assessment of negative affectivity, social inhibition, and type D personality. Psychosom. Med. 67(1), 89–97. https://doi.org/10.1097/01.psy.0000149256.81953.49 (2005).
Article PubMed Google Scholar
Pop, V. J. et al. Development of the Tilburg pregnancy distress scale: The TPDS. BMC Pregnancy Childbirth 11, 80. https://doi.org/10.1186/1471-2393-11-80 (2011).
Article PubMed PubMed Central Google Scholar
Soto, C. J. & John, O. P. Short and extra-short forms of the big five inventory–2: The BFI-2-S and BFI-2-XS. J. Res. Pers. 68, 69–81. https://doi.org/10.1016/j.jrp.2017.02.004 (2017).
Article Google Scholar
Postpartum Depression: Action Towards Causes and Treatment (PACT) Consortium. Heterogeneity of postpartum depression: A latent class analysis. The lancet. Psychiatry 2(1), 59–67. https://doi.org/10.1016/S2215-0366(14)00055-8 (2015).
Article Google Scholar
Osborne, L. et al. Replication of epigenetic postpartum depression biomarkers and variation with hormone levels. Neuropsychopharmacol. Off. Publ. Am. College Neuropsychopharmacol. 41(6), 1648–1658. https://doi.org/10.1038/npp.2015.333 (2016).
Article CAS Google Scholar
Garbazza, C. et al. A machine learning model to predict the risk of perinatal depression: Psychosocial and sleep-related factors in the life-ON study cohort. Psych. Res. 337, 115957 (2024).
Article Google Scholar
van den Heuvel, M. I. From the womb into the world: Protecting the fetal brain from maternal stress during pregnancy. Policy Insights Behav. Brain Sci. 9(1), 96–103. https://doi.org/10.1177/23727322211068024 (2022).
Article Google Scholar
Bos, S. C. et al. Is positive affect in pregnancy protective of postpartum depression?. Rev. Bras. Psiquiatr. 35(1), 5–12. https://doi.org/10.1016/j.rbp.2011.11.002 (2013).
Article PubMed Google Scholar
Puyané, M. et al. Personality traits as a risk factor for postpartum depression: A systematic review and meta-analysis. J. Affect. Disord. 298(Pt A), 577–589. https://doi.org/10.1016/j.jad.2021.11.010 (2022).
Article PubMed Google Scholar
Barlow, D. H., Ellard, K. K., Sauer-Zavala, S., Bullis, J. R. & Carl, J. R. The origins of neuroticism. Perspect. Psychol. Sci. J. Assoc. Psychol. Sci. 9(5), 481–496. https://doi.org/10.1177/1745691614544528 (2014).
Article Google Scholar
Zanardo, V. et al. Maternity blues: A risk factor for anhedonia, anxiety, and depression components of Edinburgh postnatal depression scale. J. Matern. Fetal. Neonatal. Med. 33(23), 3962–3968. https://doi.org/10.1080/14767058.2019.1593363 (2020).
Article PubMed Google Scholar
Silverman, M. E., Smith, L., Lichtenstein, P., Reichenberg, A. & Sandin, S. The association between body mass index and postpartum depression: A population-based study. J. Affect. Disord. 240, 193–198. https://doi.org/10.1016/j.jad.2018.07.063 (2018).
Article PubMed Google Scholar
da Cruz, K. L. D. O. et al. The impact of obesity-related neuroinflammation on postpartum depression: A narrative review. Int. Dev. Neurosci. 82(5), 375–384 (2022).
Article Google Scholar
Shridharmurthy, D. et al. Postpartum depression in reproductive-age women with and without Rheumatic disease: A population-based matched cohort study. J. Rheumatol. 50(10), 1287–1295. https://doi.org/10.3899/jrheum.2023-0105 (2023).
Article PubMed Google Scholar
Luan, M. et al. Rheumatoid arthritis and the risk of postpartum psychiatric disorders: A Nordic population-based cohort study. BMC Med. 21(1), 126. https://doi.org/10.1186/s12916-023-02837-3 (2023).
Article PubMed PubMed Central Google Scholar
Meems, M. et al. The Brabant study: Design of a large prospective perinatal cohort study among pregnant women investigating obstetric outcome from a biopsychosocial perspective. BMJ Open 10(10), e038891. https://doi.org/10.1136/bmjopen-2020-038891 (2020).
Article PubMed PubMed Central Google Scholar
Hasan, M. K. et al. Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021). Inf. Med. Unlocked 27, 100799. https://doi.org/10.1016/j.imu.2021.100799 (2021).
Article Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat Methodol. 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x (1996).
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. Least angle regression. Ann. Stat. https://doi.org/10.1214/009053604000000067 (2004).
Article MathSciNet Google Scholar
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01 (2010).
Article PubMed PubMed Central Google Scholar
Robbins, H. & Monro, S. A stochastic approximation method. Annals Math. Stat. https://doi.org/10.1214/aoms/1177729586 (1951).
Article MathSciNet Google Scholar
Zhou, Q., Chen, W., Song, S., Gardner, J., Weinberger, K., & Chen, Y. (2015). A reduction of the elastic net to support vector machines with an application to GPU computing. In: Proc. of the AAAI Conference on Artificial Intelligence. Vol. 29, pp. 1. https://doi.org/10.1609/aaai.v29i1.9625
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Wadsworth, 1984).
Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32. https://doi.org/10.1023/A:1010933404324 (2001).
Article Google Scholar
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451 (2001).
Article MathSciNet Google Scholar
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system.In: Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 785–794. https://doi.org/10.1145/2939672.2939785

Download references

Acknowledgements

The authors wish to express their gratitude to the participants of the Brabant Study and the study coordinators for their involvement and support. Special thanks are extended to Bram Meijer for his efforts in supporting the data cleaning process.

Author information

Authors and Affiliations

Department of Methodology and Statistics, Tilburg University, Prof. Cobbenhagenlaan 125, 5037 DB, Tilburg, The Netherlands
Lisette Sibbald, Caspar J. van Lissa, Joran Jongerling & Inga Schwabe
Tranzo Scientific Center for Care and Wellbeing, Tilburg University, Prof. Cobbenhagenlaan 125, 5037 DB, Tilburg, The Netherlands
Lisette Sibbald, Marion I. van den Heuvel, Hedwig J. A. van Bakel & Lotte Muskens
Department of Public Health and Primary Care, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
Marcel R. Haas
Department of Clinical Psychology, Open University, Valkenburgerweg 177, 6419 AT, Heerlen, the Netherlands
Hedwig J. A. van Bakel
Department of Developmental Psychology, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
Lianne P. Hulsbosch
Department of Medical and Clinical Psychology, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
Myrthe G. B. M. Boekhorst

Authors

Lisette Sibbald
Marion I. van den Heuvel
Marcel R. Haas
Caspar J. van Lissa
Hedwig J. A. van Bakel
Joran Jongerling
Lianne P. Hulsbosch
Lotte Muskens
Myrthe G. B. M. Boekhorst
Inga Schwabe

Contributions

L.S., M.I.H., M.R.H., C.J.L. and I.S. wrote the main manuscript text. L.S. wrote the code for analysis as well as prepared the figures and tables. All authors made contributions to the interpretation of the results and reviewed the manuscript.

Corresponding author

Correspondence to Lisette Sibbald.

Ethics declarations

Competing interests

The authors declare that there are no competing interests as defined by Nature Research, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sibbald, L., van den Heuvel, M.I., Haas, M.R. et al. Identifying prenatal risk factors of postpartum depression with machine learning. Sci Rep 15, 34610 (2025). https://doi.org/10.1038/s41598-025-18204-6

Download citation

Received: 12 December 2024
Accepted: 29 August 2025
Published: 03 October 2025
DOI: https://doi.org/10.1038/s41598-025-18204-6

Keywords

关于《Identifying prenatal risk factors of postpartum depression with machine learning》的评论

暂无评论

发表评论

摘要

The study "Identifying prenatal risk factors of postpartum depression with machine learning" by Sibbald et al., published in Scientific Reports, aims to identify and predict prenatal risk factors for postpartum depression (PPD) using machine learning techniques. The research utilized data from the Brabant Study, a large prospective perinatal cohort study investigating obstetric outcomes from a biopsychosocial perspective. Here is a summary of the key points: ### Objectives - To identify and predict prenatal risk factors for postpartum depression (PPD) using machine learning techniques. - To evaluate the performance of different machine learning algorithms in predicting PPD. ### Methodology 1. **Data Source**: The study used data from the Brabant Study, which is a large prospective cohort following pregnant women to investigate obstetric outcomes and related biopsychosocial factors. 2. **Variables**: - **Predictor Variables (Prenatal Risk Factors)**: Various demographic, medical, psychological, and social variables collected during pregnancy. - **Outcome Variable**: Postpartum depression (PPD) status at 6 months post-delivery. 3. **Machine Learning Algorithms**: - Logistic Regression - Lasso Regularization - Elastic Net - XGBoost - Random Forests 4. **Performance Metrics**: - Area Under the Curve (AUC) - Accuracy, Precision, Recall, F1-Score - Cohen's Kappa ### Results - The study found that machine learning algorithms outperformed traditional statistical methods in predicting PPD. - XGBoost and Random Forests showed the highest performance metrics among all models tested. - Several prenatal risk factors were identified as significant predictors of PPD, including: - **Psychological Factors**: High levels of anxiety, depressive symptoms during pregnancy - **Social Support**: Lack of social support and perceived stress during pregnancy - **Medical Factors**: History of psychiatric disorders, chronic medical conditions ### Discussion - The findings suggest that machine learning techniques can be effectively used to predict PPD based on prenatal factors. - Early identification of at-risk women through predictive models may help in implementing early interventions to prevent or mitigate the severity of PPD. - The study emphasizes the importance of considering a comprehensive set of biopsychosocial factors for accurate prediction. ### Implications - **Clinical Practice**: Healthcare providers can use these predictive models to identify high-risk pregnant women and offer targeted preventive measures. - **Research**: Future studies should focus on validating these models in other populations and exploring additional predictors, such as genetic and environmental factors. - **Policy**: Public health policies could benefit from integrating these predictive tools to improve mental healthcare for new mothers. ### Conclusion The study highlights the potential of machine learning techniques in predicting postpartum depression based on prenatal risk factors. By leveraging advanced algorithms, researchers can better understand complex relationships between various biopsychosocial variables and PPD, ultimately leading to improved prevention strategies and early intervention programs. This summary provides an overview of the key aspects of the study, including its objectives, methods, findings, and implications for both clinical practice and further research.