作者:Hattab, Georges
Mental health has recently received significant media attention, bringing it into the public eye. A critical aspect of this visibility is how mental illness is portrayed in the media. Numerous publications have shown that mental illness is frequently depicted in a dramatic and inaccurate light, emphasizing danger, criminality, and unpredictability1,2,3,4. Such distorted portrayals shape public perceptions and perpetuate stigma and discrimination against people with mental illness. These negative portrayals can lower self-esteem5 and discourage help-seeking behavior. They can also reduce medication adherence and impair overall recovery outcomes6. As a result, these effects can delay treatment, increase symptom severity, and worsen the long-term prognosis for those affected by mental illness7.
From a public health perspective, addressing media portrayals of mental health is critical to both promoting well-being and achieving health equity. The media is a primary source of information for the public and shapes societal norms and beliefs about mental health. Misrepresentation and stigma in the media can exacerbate existing prejudices, create barriers to effective mental health care, and increase health disparities. This dynamic is particularly harmful for marginalized groups who may already face greater barriers to accessing care. Accurate portrayals can challenge these barriers by providing a more compassionate, inclusive understanding of mental health that supports well-being and helps reduce disparities in mental health outcomes6. Balanced and factual portrayals of mental health in the media are essential to normalizing discussions about mental health, promoting early intervention, and encouraging people to seek help. For example, positive portrayals of people recovering from mental illness can reduce stigma and encourage others to seek care, ultimately contributing to a healthier society8,9. In this way, the media has the potential to actively promote the mental well-being of the population by reducing fear and misunderstanding of mental illness.
Given the wide reach of the media, the responsibility to present a realistic and balanced picture of mental health is essential. Natural Language Processing (NLP) provides an opportunity to analyze and shape media portrayals, identifying both harmful and constructive narratives that influence public perception. Advocacy groups, such as the Time to Change campaign in the United Kingdom10, hold the media accountable for promoting stigma and discrimination, while recognizing its potential to combat public prejudice through positive storytelling. NLP can support these efforts by systematically quantifying the extent and nature of stigmatizing narratives in large media datasets. For example, using co-occurrence network analysis, researchers can identify how mental health terms (e.g., “depression,” “schizophrenia”) are frequently associated with negative contexts (e.g., “violence,” “unemployment”), highlighting where stereotypes persist. This technique provides evidence-based insights to inform public awareness campaigns, allowing advocacy groups to target specific harmful narratives. Research suggests that targeted interventions, particularly those involving education and contact with people with mental illness, can be effective in reducing stigma6,11.
In addition, media discussions often reflect societal responses to mental health issues, and their analysis can provide insights into population-level mental health trends. For example, NLP techniques such as sentiment analysis, emotion detection, and event extraction can be used to assess media responses to societal events, such as economic crises or health emergencies, and can reveal patterns of anxiety, depression, and well-being at the population level12,13. For example, during the COVID-19 pandemic, event extraction could automatically identify key moments (e.g., lockdowns, economic downturns) associated with spikes in anxiety or depression in media discussions, helping researchers identify which events contribute to shifts in public sentiment about mental health12,13.
This method also allows public health officials to monitor real-time responses to crises, providing actionable insights for public mental health interventions. Understanding how mental health narratives are constructed and communicated is critical, especially with large-scale media data. NLP techniques offer robust tools for analyzing these narratives and identifying recurring themes, biases, and emotional tones that shape public perceptions. This study explores the potential of NLP techniques to analyze how mental health topics such as depression, anxiety, and well-being are represented in online news media. Specifically, we assess how NLP techniques can detect and interpret complex constructs, which in psychology refer to concepts that describe real-world phenomena14.
Our review highlights the precision and depth that advanced NLP techniques, such as transformer-based architectures and contextual embeddings, offer in identifying nuanced expressions of mental health issues in media narratives. Techniques such as topic modeling, emotion detection, and named entity recognition (NER) are used to capture how complex mental health constructs are framed and interpreted in news content. By focusing on how these techniques extract and analyze the subtle language used in mental health discussions15, we demonstrate how they contribute to a more informed understanding of the media’s influence on public perceptions of mental health. To improve the accuracy and balance of media portrayals of mental health, we propose ten recommendations for the development of NLP tools specifically tailored to mental health and news media analysis. By advancing NLP techniques in mental health media analysis, we aim to provide practical insights for developers to enable tools that more accurately capture both the stigmatizing and destigmatizing elements of mental health discussions in the news.
This review aims to synthesize and critically assess how natural language processing has been applied to the study of mental health in news media. While research on news media and on mental health has each grown substantially, their intersection remains underexplored. As shown in Fig. 1, most retrieved studies focus on one domain in isolation, with only a small subset addressing both. This imbalance motivates the present review.
Distribution of the Related Work on News Media, Mental Health, and their intersection. This Venn diagram illustrates the categorization of the included publications into three distinct groups. Of the publications, twenty were solely about news media, fourteen were solely about mental health, and six examined the intersection of NLP for news media and mental health.
In this section, we present our results from the scoping review by considering the grouping of traditional and artificial intelligence-based NLP techniques. We then highlight our findings in the context of each group: traditional content analysis methods, NLP for news media, NLP for mental health, and its portrayal. As NLP has become increasingly important to the study of news media and mental health, our scoping review includes relevant research since 2012.
Advances in the field of NLP have allowed researchers to address misinformation and mental health issues. This is exemplified by the COVID-19 pandemic16,17. The COVID-19 pandemic has underscored the urgency of developing effective NLP techniques. The pandemic has led to a surge in mental health issues due to increased stress, anxiety, and social isolation18,19. Ultimately, it has accelerated research to address these new challenges and mitigate long-term effects. Similarly, the pandemic has heightened the spread of misinformation, necessitating the development of NLP tools to detect and combat fake news early20,21.
This review follows an exploratory, purposive approach rather than a systematic protocol, aiming to surface diverse methodological applications of NLP in the context of mental health and news discourse. We used combinations of the search terms “mental health”, “news media”, and “natural language processing”, selected for their alignment with controlled vocabularies in public health and computer science, as well as their broad coverage across domains. Alternative or narrower terms (e.g., “depression,” “newspapers,” “media framing”) were considered during pilot searches but yielded results that were either overly clinical, limited to social media, or too narrow in scope. We thus retained the broader terms to ensure inclusiveness and conceptual coverage across disciplines.
Sources consulted include Google Scholar, Scopus, Web of Science, arXiv, Connected Papers, and Semantic Scholar. These platforms were chosen to cover both peer-reviewed and preprint literature, and to represent diverse indexing strategies across social sciences, public health, and computer science. The search was performed in multiple rounds from April to July 2024, with backward and forward citation chaining to ensure thematic saturation.
The chosen time frame spans from January 1, 2012, to July 1, 2024. This range was selected to capture the rise of deep learning in NLP as well as key public health moments (notably the COVID-19 pandemic) that significantly influenced both media coverage and computational research trends in mental health.
Our initial search yielded approximately 180 documents (see Fig. 2). We screened titles and abstracts for relevance to both (a) the application of NLP methods and (b) the context of mental health in news or social media. Publications focused solely on clinical health records, without a media component, or that employed NLP in unrelated contexts (e.g., biomedical entity extraction) were excluded. After full-text review, 46 publications were selected based on conceptual richness, relevance to the intersection of news media and mental health, and the diversity of NLP techniques applied.
To contextualize our approach, we emphasize that this study was designed as a scoping review rather than a systematic review. Our review process follows principles of scoping reviews, designed to capture a broad and interdisciplinary sample rather than apply a narrowly systematic protocol. By including diverse sources across public health, computer science, and social sciences, we aimed to map methodological variety and highlight key trends and gaps. While not exhaustive, the final set of 46 studies provides a sufficiently representative overview to derive meaningful insights and recommendations.
In this review, we define “news media” as any form of digitally available textual publication by news broadcasting services. This includes information-based texts that report on events and occurrences, as opposed to social media texts that primarily share opinions and positions. News media texts differ from social media texts in that the latter use informal, direct language with slang and abbreviations22.
During our analysis of the different NLP techniques, our attention was directed towards three primary factors. These were chosen to reflect not only the technical capacities of NLP tools, but also their relevance to understanding and improving mental health representations in public discourse. First, we assessed how NLP techniques can detect complex psychological constructs (such as depression, anxiety, and well-being) in media narratives. Second, we examined the precision and depth of analysis these tools offer, emphasizing the portrayal of mental health in a balanced and accurate manner. Third and last, we identified key gaps in existing research and suggested areas for further development to enhance the role of NLP in improving mental health portrayals.
By “key gaps”, we refer to recurring methodological or ethical limitations observed across the reviewed literature. These include the dominance of English-language datasets and tools, limited model transparency and explainability, insufficient cultural or contextual adaptation, and the lack of interdisciplinary evaluation frameworks tailored to mental health media analysis.
Traditional approaches to content analysis have been instrumental in the study of media representations and their impact on public perceptions in psychology and the social sciences. While these methods are labor-intensive and limited in scope, they provide important insights into how mental health is portrayed in the media. The report depends on the type of content analysis, whether qualitative or quantitative.
Qualitative content analysis is a method that involves the systematic and subjective interpretation of textual data through the classification of themes or patterns. Central to this approach is the development of specialized lexicons for specific constructs related to mental health, such as depression or anxiety. Manual coding by experts, guided by predefined coding rules and decision trees, ensures detailed and contextual analysis. This method captures subtle implications and contexts that automated methods may miss. However, it is labor-intensive, subjective, and not scalable to large datasets23.
Quantitative content analysis systematically quantifies the presence of specific words, phrases, or themes in a text using statistical techniques. Coding schemes define categories for analysis, and the data is statistically analyzed to identify patterns and trends. This method offers greater objectivity and efficiency, allowing for the processing of larger data sets. However, it can miss nuanced meanings and contextual subtleties within the text, limiting its ability to capture the full complexity of media representations24.
Researchers often use mixed methods, integrating both techniques for a comprehensive analysis to leverage the strengths of both qualitative and quantitative methods. This iterative process allows for a nuanced understanding, followed by the validation and generalization of findings. While providing balanced insights, combined approaches are complex and resource-intensive, requiring expertise in both qualitative and quantitative techniques25,26.
Examining the trends in NLP for news media and mental health analysis reveals a significant increase in research activity, particularly around 2021 and 2022. This trend is expected to continue, driven by several factors; Interest in understanding the impact of the media on public well-being has grown with increased awareness of mental health. Advances in NLP technology, particularly with models like Bidirectional Encoder Representations from Transformers (BERT)27 and Generative Pre-trained Transformer (GPT)28, have enabled more sophisticated text analysis. Additionally, there is growing interest in digital health solutions, such as mental health monitoring through media analysis. Ethical and inclusive development in NLP also drives research to ensure that AI tools responsibly serve diverse populations29. These factors collectively fuel the expansion of this field (see Fig. 3).
Yearly Distribution of Publications. The figure illustrates the number of reviewed and published publications on applying natural language processing in news media and mental health from 2012 to 2024. An exponential regression Line, fitted to the publication data, indicates an upward trend in research activity. The prediction for 2024 suggests continued growth in this research area.
We illustrate how advances in NLP are critical to addressing the challenges of analyzing mental health narratives in the news media by examining techniques used in the field. NLP tasks such as sentiment analysis and topic modeling have proven particularly valuable in identifying patterns in media representations of mental health. Sentiment analysis, for example, helps measure the emotional tone of news articles or phrases, revealing potential biases in coverage that contribute to stigma or misunderstanding. Topic modeling uncovers recurring themes, such as the association of mental illness with crime or social instability, that shape public perceptions. Other NLP tasks, such as text classification, have also been explored in the context of mental health and the media. However, much of the research remains focused on detecting misinformation rather than mental health narratives specifically. As shown in Figure 4, sentiment analysis and text classification are among the most commonly used techniques in news media-related publications, reflecting their critical role in understanding the framing of mental health issues.
Recent advances in deep learning have significantly improved natural language understanding (NLU) capabilities, particularly in media analysis. Transformer-based architectures have been particularly effective, outperforming earlier models such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks30,31,32 in capturing the nuances of mental health narratives in news media. Large language models (LLMs), built on transformer architectures, have improved the ability to analyze mental health-related content in news articles. These models excel at capturing long-range dependencies and complex linguistic patterns, which are critical for understanding nuanced language in mental health contexts. For example, the BERT model achieved cutting-edge performance on the General Language Understanding Evaluation (GLUE) benchmark with a score of 80.5%27, demonstrating its superior language understanding capabilities. In mental health-specific tasks, domain-adapted models have shown remarkable performance. MentalBERT and MentalRoBERTa, introduced by Ji et al.33, outperformed general-purpose models in detecting mental disorders from social media text. On the eRisk 2018 depression detection task, MentalBERT achieved an F1 score of 0.86, compared to 0.83 for BERT-base. Garg34 developed the WELLXPLAIN dataset for wellness dimension classification and compared different models, including GPT-3 and MentalBERT. Their study showed that GPT-3 achieved the highest accuracy at classifying wellness dimensions at 72.3%, followed closely by MentalBERT at 71.8%. While true NLU remains a complex goal that requires consideration of philosophical, cognitive-linguistic, and technical perspectives, transformers have become the preferred basis for many NLU tasks. They consistently outperform RNN-based models on various benchmarks, including those relevant to mental health analysis in media contexts.
Number of publications by Natural Language Processing Task for News Media and Mental Health. This graph illustrates the distribution of research publications across different NLP tasks. The data is sorted in ascending order based on the number of studies. Notably, ‘text classification’ and ‘sentiment analysis’ are the most relevant tasks in the field, reflecting their critical importance and wide applicability in the NLP domain. On the other hand, tasks such as ’Suicidal Tendency Prediction’ and ’Speech Recognition’ have less related work, indicating potential areas for future research focus.
LLMs extend media content analysis capabilities by providing scalable NLP solutions. They excel at tasks ranging from text generation to summarization and entity recognition, helping to decipher mental health narratives in media. However, issues of interpretability and hallucination risk require caution in critical domains such as mental health. While most related work focuses on single-modal data, multimodal approaches that integrate text, audio, and images are gaining interest35. These methods, while less common, promise a more comprehensive view of mental health representations by considering both verbal and nonverbal cues in media such as news broadcasts and social media posts. This area remains largely unexplored and offers opportunities for future research.
NLP has become an indispensable tool for analyzing the vast amounts of textual content generated by news outlets. It allows researchers to explore the structure, framing, and bias of media coverage, providing valuable insights into how public perception is shaped. Using methods such as text classification, topic modeling, and sentiment analysis, NLP enables the detection of misinformation and media framing, providing a more nuanced understanding of how news is communicated and its impact on society.
Traditional media use formal language and adhere to professional journalistic standards to provide objective, balanced information. In contrast to the informal, subjective content typical of social media, traditional news articles are structured, detailed, and comprehensive, contributing significantly to opinion formation and narrative delivery. This includes a variety of reporting formats, such as news articles, editorials, and commentaries. Understanding these linguistic and structural characteristics is critical to applying NLP techniques to news content, particularly when analyzing how mental health issues are framed.
Related work in this area has demonstrated the utility of NLP for analyzing media bias36,37,38, extracting semantic knowledge36, and understanding political framing39,40. For example, NLP has been instrumental in tracking COVID-19 news coverage41, assessing the accuracy of information shared, and identifying linguistic patterns in reporting. In public health contexts, NLP has enabled event-based monitoring of news articles42, revealing patterns of misinformation43 or bias that influence health-related behaviors and attitudes. The structured and professional nature of traditional news media provides a well-defined linguistic framework for NLP applications, allowing for more accurate detection of bias, framing, and thematic patterns. However, while much progress has been made, these tools still focus primarily on topics such as political reporting or general public health, with less emphasis on mental health-specific topics.
NLP for Mental Health has gained traction as a transformative tool for analyzing linguistic patterns associated with mental health conditions. NLP techniques are used to identify emotional states, assess mental health risks, and personalize interventions based on language use. Recent publications have explored the potential of NLP for emotion recognition, identification of at-risk individuals, and personalization of mental health support15,44. For example, NLP techniques have been used to analyze text-based communication in crisis support services to identify linguistic cues that indicate heightened emotional distress45, and to assess communication patterns in counseling46, helping to refine intervention strategies. Text classification in this domain can be used to categorize individuals’ mental states based on their language use, while emotion detection can identify specific emotional shifts in patients, contributing to more effective interventions and treatment plans.
In clinical settings, NLP models have demonstrated potential for extracting psychopathological cues from text, analyzing therapy transcripts, and providing computational support for traditional diagnostic methods47. For example, the analysis of language in patients with bipolar disorder48 and other mental illnesses49,50 has shown promising applications for early diagnosis, treatment evaluation, and ongoing mental health monitoring. NLP-based systems can effectively detect depression severity from social media posts, aligning with clinical metrics such as PHQ-9 and underscoring the importance of emotion detection and text classification in mental health interventions51. Similarly, in clinical settings, transformer models such as Bio_ClinicalBERT have outperformed traditional methods in extracting personal and family histories of suicidal behavior from EHRs, aiding in risk assessment52.
Large language models applied to Reddit posts have revealed topics related to suicidality, consistent with modern suicide theories, by identifying common patterns of distress53. Beyond diagnostics, AI-driven NLP tools extend into patient engagement, with personalized chatbots improving mental health service access, increasing referrals, and reducing stigma54.
However, significant challenges remain. Issues such as language bias, ethical use of data, and concerns about data quality are major obstacles55. As the field continues to evolve, more research is needed to address these limitations and refine the applicability of NLP models in mental health interventions56.
The application of NLP to news media on mental health enables the systematic analysis of how mental illness is portrayed and how such portrayals may shape public perception and stigma. Given the media’s influence on societal attitudes and health behaviors, this intersection is a key focus for advancing public mental health through informed and balanced reporting.
Sentiment analysis has been applied to mental health news coverage, uncovering the emotional biases that influence public understanding of mental health disorders. For example, media outlets might frame mental illness negatively, associating it with violence or danger, which can reinforce public stigma. Topic modeling in this intersection has helped identify recurring themes in media coverage of mental health57,58, such as the association between mental illness and societal instability, which may perpetuate negative stereotypes. By analyzing the frequency and tone of these topics, NLP can provide insights into how mental health is discussed across different news outlets, shedding light on societal attitudes towards mental health issues12,13. In addition, NER can be used to track specific entities (such as organizations, public figures, or mental health conditions) and how they are discussed in relation to mental health in the media. This can highlight patterns of misinformation or under-reporting of critical mental health issues and influence public understanding and policy development. It has already been successfully applied to extract psychiatric attributes from German mental health records59, using GermanBERT60. Recent work has demonstrated tangible progress in extending NLP capabilities to low-resource settings. For instance, Ronny Mabokela et al.61 developed sentiment classifiers for five Southern African languages–Sepedi, Setswana, Sesotho, isiZulu, and isiXhosa–using multilingual pretrained language models such as AfroLM and mBERT, achieving strong F1 scores through ensemble learning and language-specific tuning. Similarly, Joshi et al.62 introduced Nemotron-Mini-Hindi, a bilingual Hindi-English language model adapted through continued pretraining on synthetic corpora, which significantly improved benchmark performance on Indic tasks. These studies highlight scalable, transferable techniques that can address linguistic inequality in NLP research.
The intersection of NLP in news media and mental health opens up new possibilities for analyzing how mental health issues are framed and perceived by the media. By applying techniques such as sentiment analysis, topic modeling, and misinformation detection, researchers can better understand how news outlets shape public discourse about mental health and how this framing influences societal attitudes and stigma. While these advances are promising, a review of the existing literature reveals several gaps and challenges that must be addressed to fully realize the potential of NLP in this area. These gaps include issues such as language bias, limited data diversity, and a lack of interpretability in LLMs. Addressing these challenges requires targeted recommendations for NLP developers and researchers working at the intersection of mental health and news media.
The related work provides valuable insights into developing and performing models tailored to detect stress, bias, and mental disorders in social media and other text data. It highlights the effectiveness of domain-specific embeddings, the importance of extensive context for long-sequence modeling, and the need for bias-neutralizing tools. However, they also point out significant limitations, such as the reliance on English-language data, potential biases in training data, and the challenges in handling longer text sequences. These insights form the foundation for the following ten recommendations, which aim to address the identified gaps and propel future research and applications toward more robust, fair, and comprehensive NLP solutions in mental health and news media analysis:
It is essential to develop and utilize comprehensive datasets that are representative of the diversity found in news media content. This should include different languages, regions, and media types to ensure models are generalizable across various contexts. Domain-specific datasets from social media and medical health records are also crucial for effectively fine-tuning language models such as Ji et al.33.
Strategies must be implemented to detect and mitigate biases in both training data and models. Addressing bias is critical to achieving objective and fair analysis outcomes in mental health and news media NLP applications.
Rigorous validation and testing of NLP models should be performed to ensure their reliability and accuracy. Models must be tested against diverse real-world scenarios to ensure robustness and applicability.
Pre-trained domain-specific models such as MentalBERT and ClinicalBERT should be leveraged to enhance the relevance and accuracy of analyses, particularly in mental health. Despite their utility, the current lack of explainability in LLMs requires ongoing expert validation of their results.
Multi-dimensional analysis pipelines should be developed. These pipelines should incorporate sentiment analysis tailored to mental health discourse, topic modeling for emerging issues, and NER to identify mental health resources and events.
Improving the interpretability of LLMs is crucial. Techniques should be developed to make these models more transparent and understandable, facilitating a clearer understanding of their decision-making processes.
Greater attention should be given to linguistic diversity. NLP techniques must be designed to handle a broad range of languages and dialects, addressing the current bias toward English-language data and supporting the development of multilingual language models.
Improving reproducibility in NLP research is vital. Standardizing evaluation metrics and enhancing the availability of open data and code will contribute to greater transparency and replicability in the field.
Focusing on multimodal analysis that incorporates text, images, and audio is recommended to provide richer and more accurate insights into mental health and media representation.
An emphasis on ethical AI practices is essential. NLP applications in mental health and news media must be developed with fairness, transparency, and responsibility in mind to ensure they serve the broader public good.
These recommendations serve as a guide for NLP developers aiming to work at the intersection of news media and mental health. They focus on improving the quality, fairness, and transparency of NLP applications.
Building on the challenges and recommendations, the next sections provide a detailed summary of the NLP techniques used in the reviewed related work. By categorizing these techniques by task, we aim to provide domain experts with a clear and organized overview of the approaches most applicable to specific problems in news media and mental health analysis.
Somandepalli et al.63 used multimodal learning to analyze media representations of gender, race, and age. They integrated audio, video, and text data using NER, sentiment analysis, and RNNs with attention mechanisms. Their approach included a mixture-of-experts model for predicting emotional responses and cross-modal autoencoders for classifying TV commercials. This methodology offers potential for systematic analysis of media narratives, particularly in the portrayal of mental health. Zhang64 presented DCLSTM-MLP, a news text classification model that combines Convolutional Neural Network (CNN), LSTM, and MLP. It uses word vectors and word scatter to capture spatiotemporal relationships and word-category associations. Experiments demonstrated DCLSTM-MLP’s superior performance in accuracy, recall, and overall measures compared to existing methods, thus advancing automatic news classification in NLP.
Analysis of Topic Model Networks (ANTMN), introduced by Walter and Ophir65, uses Latent Dirichlet Allocation (LDA)66 for topic modeling, network analysis, and community detection to identify media frames. It constructs networks with topics as nodes and cosine similarity as edges, using algorithms such as Walktrap67 and Louvain68 for frame clustering. Ghasiya and Okamura69 analyzed COVID-19 news using Top2Vec70 and Robustly optimized BERT approach (RoBERTa)71 for sentiment analysis, revealing cross-cultural differences in media coverage and public sentiment. Choi and Um72 used LDA and Structural Topic Modeling (STM) to analyze Korean COVID-19 news, identifying five major themes and highlighting the underrepresentation of psychological effects in media coverage. Lu et al.73 used Sentence-Bidirectional Encoder Representations from Transformers (Sentence-BERT)74, spaCy75, and Top2Vec to analyze economic news during COVID-19, revealing shifts in media focus from health to economic impacts over time. The related work demonstrates the effectiveness of combining topic modeling and sentiment analysis in understanding media narratives and public responses during global crises and highlights the influence of cultural and political contexts on media coverage.
Nemes and Kiss76 analyzed COVID-19 tweets using BERT, RNNs, and Natural Language Toolkit (NLTK) with VADER77. They combined information extraction (IE), NER, and sentiment analysis for contextual understanding. Succar et al.78 used DeBERTa79 for aspect-based sentiment analysis of Twitter data, using transfer entropy and convergent cross mapping (CCM) to assess media-sentiment relationships. Gottipati et al.80 studied media representations of mental disorders using NLP and sentiment analysis. Lin et al.81 developed a BERT-based ensemble model to identify harmful news content. Mittal and De Choudhury82 compared moral framing in mental health discourse between social media and news using a BERT-based framework, demonstrating that Twitter is more aligned with positive moral foundations compared to news articles. The related work showcases advanced NLP techniques for sentiment analysis in different contexts, highlighting their potential for understanding public discourse and media impact on sensitive issues.
Spinde et al.37 developed a bias detection method for German news using Inverse Document Frequency (IDF), bias lexicons, and word embeddings. Acken and Demszky39 analyzed the framing of the 2020 presidential election using Named Entity Recognition (NER) and semantic lexicons83. Lei et al.38 used the Robustly Optimized BERT Approach (RoBERTa) and BiLSTM models with knowledge distillation84 for sentence-level bias analysis. Choubey et al.85 proposed an Integer Linear Programming (ILP) system for event coreference resolution using discourse structures. Doumit and Minai36 combined Latent Dirichlet Allocation (LDA) and Antelope86 to analyze media bias through cognitive networks. Bach et al.40 used fine-tuned BERT embeddings to analyze political news consumption in three European countries, linking browsing patterns to political engagement. These approaches demonstrate advanced NLP techniques for detecting bias, framing, and content structure in news media and provide scalable methods for political communication research and media analysis.
Kaliyar et al.20 proposed FakeBERT, which combines BERT with CNNs for improved fake news detection, achieving 98.90% accuracy on a U.S. presidential election dataset. Nasir et al.21 presented a hybrid CNN-RNN model evaluated on FA-KES87and ISOT datasets, highlighting the importance of generalization techniques88. presents a hybrid approach combining NLP and expert opinion, using BERT-based architectures and blockchain technology for transparent news authenticity assessment. The related work demonstrates the evolution of fake news detection techniques, from pure NLP approaches to hybrid models that integrate human expertise and advanced machine learning architectures.
Murarka et al.50 used the Robustly Optimized BERT Approach (RoBERTa) to classify mental health posts on Reddit, outperforming the LSTM and BERT models. Ji et al.89 reviewed methods for identifying suicidal ideation, highlighting CNNs, LSTMs, and hybrid models such as BERT with Gated Recurrent Units (GRU). Garg34 developed the WELLXPLAIN dataset for wellness dimension classification, comparing various models including Generative Pre-trained Transformer 3 (GPT-3) and MentalBERT. Turcan and McKeown90 presented “Dreaddit” to identify stress in social media using logistic regression and neural models. Ji et al.91 developed MentalXLNet and MentalLongformer for long-sequence modeling in mental health, outperforming BERT and RoBERTa in certain tasks. Choey92 created a system to neutralize biased language in discussions of mental illness, based on the CONCURRENT model93. Ji et al.33 presented MentalBERT and MentalRoBERTa, pre-trained models for detecting mental disorders from social media text. The related work demonstrates advanced NLP applications in mental health assessment and highlights the potential and limitations of different models and datasets.
LLMs have significantly advanced the analysis of mental health narratives by improving existing NLP methods. The related work demonstrates the potential of LLMs for detecting mental health disorders and analyzing social media data53,94,95. LLMs show promise in psychological assessment96 and analysis of online depressive and suicidal behavior97. Fine-tuned models such as MentalBERT, MentalRoBERTa33, MentalXLNet, and MentalLongformer91 provide nuanced text understanding. However, concerns remain about bias in artificial intelligence (AI) models of mental health98. Rigorous evaluation of LLMs against traditional methods is critical to validate their efficacy and reliability. Researchers emphasize the need for collaboration between clinicians and data scientists to address biases and ensure equitable mental health care99.
The review paper provides an in-depth analysis of various NLP techniques used for news media analysis, with a specific focus on mental health-related issues. The introduction highlighted the importance of understanding how mental health is represented in the media and the potential impact on public perception and stigma. The methods section explored various NLP techniques, including sentiment analysis, topic modeling, and fake news detection, and examined their applicability, benefits, and limitations in analyzing news media content. The application of NLP in news media and mental health includes various methods, each of which has its strengths and limitations:
First, various NLP techniques are very effective at processing large amounts of news media content at scale, providing automated analysis that traditional manual methods cannot achieve. This scalability allows for continuous monitoring of media representations, making it possible to identify emerging trends in discussions of mental health. In addition, this automated approach enables real-time monitoring and rapid identification of negative portrayals of mental health, facilitating timely intervention. Public health officials and policymakers can use these insights to promptly address stigmatizing content or misinformation and promote more supportive narratives.
Second, sentiment analysis in NLP helps track public sentiment about mental health over time, providing valuable insights into societal reactions to media coverage. Combined with its real-time capability, sentiment analysis is particularly useful for monitoring the impact of news stories on public perception and mental health stigma.
Third, topic modeling is another key strength of NLP for this domain. It allows the identification of dominant topics in news media. For example, it has been used to analyze the portrayal of mental health during crises such as COVID-1957,72,73, helping researchers uncover how different narratives are framed across regions and time periods. It is worth noting here that topic modeling lacks robust benchmarking tests, making it difficult to assess its consistency and reliability across datasets, largely due to its unsupervised nature and variability in results.
Fourth, NLP’s ability to detect fake news and misinformation using advanced models such as FakeBERT20 and hybrid CNN-RNN architectures21 enhances the credibility of media content. This is critical to ensuring that mental health is accurately represented in the news and minimizing the spread of harmful misinformation.
Fifth, a key benefit of NLP is its ability to provide objective and consistent analysis by minimizing the influence of personal bias. Using machine learning models, NLP reduces the subjectivity that often accompanies manual content analysis, leading to more reliable and standardized results when assessing media portrayals of mental health. However, one must still account for bias that may be inherent in the data itself or introduced into the analysis in hidden ways by LLMs.
Sixth, despite these strengths, a major limitation of NLP lies in the interpretability of LLMs, which often function as “black boxes”. Their decision-making processes are opaque, making it difficult to validate results or ensure the accuracy of predictions, especially when dealing with sensitive topics such as mental health.
Seventh, bias in training data is another significant challenge. NLP models trained on biased datasets run the risk of perpetuating biases in their analysis, potentially reinforcing harmful stereotypes about mental health. Addressing this issue is critical to ensure fair and unbiased media analysis.
Eighth, NLP’s overreliance on English-language data limits its ability to provide comprehensive global insights. Current models have an English bias, which limits their applicability to diverse linguistic contexts. The development of multilingual NLP systems is critical to accurately analyze mental health representations across cultures and languages. The success of ensemble-based PLMs in African languages61 and the continued pretraining of bilingual LLMs62 offer promising strategies to advance NLP inclusivity across both formal and informal media in low-resource contexts.
Ninth, while NLP excels at large-scale automated analysis, it often fails to capture the rich contextual depth provided by traditional qualitative methods. This limitation highlights the need to combine NLP with manual approaches to gain a more nuanced understanding of mental health representations in the media45.
Tenth, the issues of reproducibility and transparency also challenge the efficacy of NLP, which aligns with the findings of Malgaroli et al.15. The lack of standardized metrics, coupled with the limited availability of open data and code, hinders the ability to replicate previous published work and validate findings, which is critical for reliable research on mental health representations in the media.
Before turning to future research directions, we summarize here the major findings that emerge from the reviewed literature:
NLP enables large-scale analysis of mental health discourse: Automated methods make it possible to study news media coverage at scales far beyond manual content analysis, revealing long-term patterns and trends (see Section Introduction, Sect. Discussion).
Emotional framing and stigma can be systematically measured: Sentiment analysis has been used to capture emotional framings of mental health, showing that media often convey negative or fear-related associations (see Section Sentiment Analysis).
Topic modeling highlights dominant themes but lacks benchmarking: Studies have identified recurring topics such as depression, suicide, and public health crises, yet systematic benchmarking across methods and corpora remains limited (see Section Topic Modeling).
Misinformation and credibility assessment are emerging areas: NLP has been applied to detect misinformation in mental health reporting, though applications remain sparse and exploratory (see Section Fake News Detection).
Bias, language coverage, and interpretability remain key challenges: Current work is restricted largely to English corpora, with limited attention to transparency or fairness. Data biases risk reinforcing stereotypes, and the interpretability of large language models remains a pressing issue (see Section Overview of NLP Applications in Mental Health and News Media).
Together, these findings demonstrate both the promise of NLP for understanding mental health discourse in news media and the critical gaps that future work must address.
Future research in NLP for news media analysis should prioritize four key areas to improve the effectiveness and fairness of these methods.
First, improving the interpretability and transparency of large language models (LLMs) is critical. Developing strategies that make these models more explicable will help in understanding their decision-making processes. In addition, combining qualitative and quantitative approaches can leverage the strengths of both methods to provide a more comprehensive analysis. Addressing biases in training data and models is essential to achieving objective results, especially since many current systems are biased toward English data and lack linguistic diversity. Advancing multilingual NLP capabilities will allow better analysis of global news media and promote more inclusive research efforts.
Second, ethical considerations must be at the forefront of NLP research. Emphasizing ethical AI practices will ensure that applications are fair and responsible. In particular, unintended consequences of automated classification–such as the mislabeling of sensitive or stigmatizing content–pose ethical risks. These require mechanisms for uncertainty estimation, human-in-the-loop validation, and continuous retraining. Moreover, interpretability is paramount: real-world deployment of NLP in high-stakes contexts like mental health should be accompanied by model explanation tools and regular expert review. Developing explainable AI techniques will increase the trustworthiness and usability of NLP models. Continuous integration of expert evaluation into the development and validation processes is essential to maintain accuracy and reliability. In addition, addressing privacy and security concerns is critical; researchers should develop methods for anonymizing data without sacrificing analytical value to protect sensitive information.
Third, practical applications of NLP should include the creation of real-time analytics systems that can identify and flag harmful or stigmatizing content in a timely manner. Such systems would facilitate faster responses and promote supportive narratives in media coverage. It is also important to develop user-friendly tools that are accessible to a wide range of users, including journalists and policymakers. Training programs can be established to help users make effective use of these tools. In addition, it is essential to develop comprehensive evaluation frameworks to assess the impact of NLP tools on media portrayals and public perceptions of mental health, including both quantitative and qualitative measures.
Fourth and finally, exploring the broader societal and policy implications of NLP-based media analysis is critical. Research should focus on developing strategies that use insights from NLP analysis to inform public health campaigns and policy decisions that promote accurate representations of mental health in the media. Interdisciplinary collaboration between NLP researchers, mental health professionals, and media experts will be essential to developing well-rounded approaches. By addressing these key areas, future research can significantly improve the effectiveness, fairness, and applicability of NLP techniques in the analysis of news media content related to mental health. This version includes transitional phrases to clearly delineate each point.
This review set out to examine how NLP techniques can help identify, interpret, and improve media portrayals of mental health. We surveyed both traditional and advanced methods–such as topic modeling, contextual embeddings, and emotion detection–and assessed their capacity to capture nuanced representations of psychological constructs in the news. We highlighted key limitations, including language bias, lack of explainability, and insufficient attention to stigmatizing versus supportive framing. To address these challenges, we proposed ten actionable recommendations aimed at guiding the development of more accurate, ethical, and socially aware NLP tools. Looking ahead, interdisciplinary collaboration will be crucial to fully realizing the potential of NLP in promoting more balanced and empathetic mental health discourse across media platforms.
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Stuart, H. Media portrayal of mental illness and its treatments: What effect does it have on people with mental illness?. CNS Drugs 20(2), 99–106. https://doi.org/10.2165/00023210-200620020-00002 (2006) (ISSN 1172-7047).
Corrigan, P. W., Powell, K. J. & Michaels, P. J. The effects of news stories on the stigma of mental illness. Journal of Nervous & Mental Disease 201(3), 179–182. https://doi.org/10.1097/nmd.0b013e3182848c24 (2013) (ISSN 0022-3018).
Graham, M., Morgan, A., Paton, E. & Ross, A. Examining the quality of news media reporting of complex mental illness in relation to violent crime in australia. Int. J. Soc. Psychiatry 69(8), 2110–2120. https://doi.org/10.1177/00207640231194481 (2023) (ISSN 1741-2854).
Quintero Johnson, J. M. & Riles. J. The acted like a crazy person: Exploring the influence of college students’ recall of stereotypic media representations of mental illness. Psychology of Popular Media Culture7(2), 146–163. https://doi.org/10.1037/ppm0000121 (2018).
Link, B. G., Struening, E. L., Neese-Todd, S., Asmussen, S. & Phelan, J. C. Stigma as a barrier to recovery: The consequences of stigma for the self-esteem of people with mental illnesses. Psychiatr. Serv. 52(12), 1621–1626. https://doi.org/10.1176/appi.ps.52.12.1621 (2001) (ISSN 1557-9700).
Evans-Lacko, S., Brohan, E., Mojtabai, R. & Thornicroft, G. Association between public views of mental illness and self-stigma among individuals with mental illness in 14 european countries. Psychol. Med. 42(8), 1741–1752. https://doi.org/10.1017/s0033291711002558 (2011) (ISSN 1469-8978).
Srivastava, K., Chaudhury, S., Bhat, P.S. & Mujawar. S. Media and mental health. Industrial Psychiatry Journal. 27(1), 1. https://doi.org/10.4103/ipj.ipj_73_18 (2018) (ISSN 0972-6748).
Livingston, J. D. & Boyd, J. E. Correlates and consequences of internalized stigma for people living with mental illness: A systematic review and meta-analysis. Social Science & Medicine 71(12), 2150–2161. https://doi.org/10.1016/j.socscimed.2010.09.030 (2010) (ISSN 0277-9536).
Pescosolido, B. A. et al. “A disease like any other’’? a decade of change in public reactions to schizophrenia, depression, and alcohol dependence. Am. J. Psychiatry 167(11), 1321–1330. https://doi.org/10.1176/appi.ajp.2010.09121743 (2010) (ISSN 1535-7228).
Smith. M. Anti-stigma campaigns: Time to change. British Journal of Psychiatry 202(s55): s49–s50. https://doi.org/10.1192/bjp.bp.113.126813 (2013) (ISSN 1472-1465).
Corrigan, P. W., Morris, S. B., Michaels, P. J., Rafacz, J. D. & Rüsch, N. Challenging the public stigma of mental illness: A meta-analysis of outcome studies. Psychiatr. Serv. 63(10), 963–973. https://doi.org/10.1176/appi.ps.201100529 (2012) (ISSN 1557-9700).
Galea, S., Merchant, R. M. & Lurie, N. The mental health consequences of covid-19 and physical distancing: The need for prevention and early intervention. JAMA Intern. Med. 180(6), 817. https://doi.org/10.1001/jamainternmed.2020.1562 (2020) (ISSN 2168-6106).
Purtle, J., Nelson, K. L., Counts, N. Z. & Yudell. M. Population-based approaches to mental health: History, strategies, and evidence. Annu. Rev. Public Health 41(1), 201–221 (2020).
Fried, E. I. What are psychological constructs? on the nature and statistical modelling of emotions, intelligence, personality traits and mental disorders. Health Psychol. Rev. 11(2), 130–134. https://doi.org/10.1080/17437199.2017.1306718 (2017) (ISSN 1743-7202).
Malgaroli, M., Hull, T. D., Zech, J. M., & Althoff. T. Natural language processing for mental health interventions: a systematic review and research framework. Translational Psychiatry 13(1). https://doi.org/10.1038/s41398-023-02592-2 (2023) (ISSN 2158-3188).
Borges do Nascimento, I. J. et al. Infodemics and health misinformation: a systematic review of reviews. Bulletin of the World Health Organization. 100(9), 544–561. https://doi.org/10.2471/blt.21.287654 (2022) (ISSN 0042-9686).
Kisa, S. & Kisa, A. A comprehensive analysis of covid-19 misinformation, public health impacts, and communication strategies: Scoping review. J. Med. Internet Res. 26, e56931. https://doi.org/10.2196/56931 (2024) (ISSN 1438-8871).
Salari, N. et al. Prevalence of stress, anxiety, depression among the general population during the covid-19 pandemic: a systematic review and meta-analysis. Globalization and Health. 16(1) https://doi.org/10.1186/s12992-020-00589-w (2020) (ISSN 1744-8603).
Lakhan, R., Agrawal, A. & Sharma, M. Prevalence of depression, anxiety, and stress during covid-19 pandemic. Journal of Neurosciences in Rural Practice 11, 519–525. https://doi.org/10.1055/s-0040-1716442 (2020) (ISSN 0976-3147).
Kaliyar, R. K., Goswami, A. & Narang. P. Fakebert: Fake news detection in social media with a bert-based deep learning approach. Multimedia Tools and Applications. 80(8), 11765–11788. https://doi.org/10.1007/s11042-020-10183-2 (2021) (ISSN 1573-7721).
Abdul Nasir, J., Khan, O. S. & Varlamis. I. Fake news detection: A hybrid cnn-rnn based deep learning approach. International Journal of Information Management Data Insights 1(1):100007. https://doi.org/10.1016/j.jjimei.2020.100007 (2021) (ISSN 2667-0968).
Schwartz, H. A. et al. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE. 8(9), e73791. https://doi.org/10.1371/journal.pone.0073791 (2013) (ISSN 1932-6203).
Martin F. & Oliver. T. A qualitative content analysis of online public mental health resources for covid-19. Frontiers in Psychiatry, 13. https://doi.org/10.3389/fpsyt.2022.553158 (2022) (ISSN 1664-0640).
Rahman, M. M., Saifuzzaman, M., Ahmed, A., Ferdousi Mahin, M. & Syeda Farjana Shetu. Impact of covid-19 on mental health: A quantitative analysis of anxiety and depression based on regular life and internet use. Current Research in Behavioral Sciences. 2, 100037. https://doi.org/10.1016/j.crbeha.2021.100037 (2021) (ISSN 2666-5182).
Prasad Wasti, S., Simkhada, P., van Teijlingen, E., Sathian, B. & Banerjee. I. The growing importance of mixed-methods research in health. Nepal Journal of Epidemiology, 12(1): 1175–1178. https://doi.org/10.3126/nje.v12i1.43633 (2022) (ISSN 2091-0800).
Palinkas, L. A. et al. Mixed method designs in implementation research. Administration and Policy in Mental Health and Mental Health Services Research 38(1), 44–53. https://doi.org/10.1007/s10488-010-0314-z (2010) (ISSN 1573-3289).
Devlin, J. Chang, M-W. Lee, K. & Toutanova. K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423.
Radford. A. Improving language understanding by generative pre-training. (2018).
Dubey, A. Yang, Z. & Hattab. G. A nested model for ai design and validation. iScience 27(9), 110603. https://doi.org/10.1016/j.isci.2024.110603 (2024) (ISSN 2589-0042).
Tran, K. Bisazza, A. & Monz C. Recurrent memory networks for language modeling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics https://doi.org/10.18653/v1/n16-1036 (2016).
Józefowicz, R. Vinyals, O. Schuster, M. Shazeer, N. M. & Wu.Y. Exploring the limits of language modeling. arXiv:1602.02410 (2016).
Hakkani-Tür, D. et al. Multidomain joint semantic frame parsing using bi-directional rnnlstm. In Interspeech 2016. ISCA, (2016). https://doi.org/10.21437/interspeech.2016-402.
Ji, S. et al. Mentalbert: Publicly available pretrained language models for mental healthcare. In International Conference on Language Resources and Evaluation, (2021a).
Garg, M. Wellxplain: Wellness concept extraction and classification in reddit posts for analysis. Knowl.-Based Syst. 284, 111228. https://doi.org/10.1016/j.knosys.2023.111228 (2024) (ISSN 0950-7051).
O’Halloran, K. L., Pal, G. & Jin, M. Multimodal approach to analysing big social and news media data. Discourse, Context & Media 40, 100467. https://doi.org/10.1016/j.dcm.2021.100467 (2021) (ISSN 2211-6958).
Doumit, S. & Minai. A. Semantic knowledge inference from online news media using an lda-nlp approach. In The 2011 International Joint Conference on Neural Networks. IEEE, (2011). https://doi.org/10.1109/ijcnn.2011.6033626.
Spinde, T. Hamborg, F. & Gipp. B. Media Bias in German News Articles: A Combined Approach, page 581–590. Springer International Publishing, 2020. ISBN 9783030659653. https://doi.org/10.1007/978-3-030-65965-3_41.
Lei, Y., Huang, R., Wang, L. & Beauchamp. N. Sentence-level media bias analysis informed by discourse structures. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022. https://doi.org/10.18653/v1/2022.emnlp-main.682.
Acken, A. & Demszky, D. Analyzing the framing of 2020 presidential candidates in the news. In Proceedings of the The Fourth Widening Natural Language Processing Workshop. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.winlp-1.32 (2020).
Bach, R. L., Kern, C., Bonnay, D. & Kalaora, L. Understanding political news media consumption with digital trace data and natural language processing. J. R. Stat. Soc. Ser. A Stat. Soc. 185, S246–S269. https://doi.org/10.1111/rssa.12846 (2022) (ISSN 1467-985X).
Vipin, C., Harshit, S., Kumar, K. L. S., Vijay, K. & Zaid. M. Checking the truthfulness of news channels using nlp techniques. In 2023 International Conference on the Confluence of Advancements in Robotics, Vision and Interdisciplinary Technology Management (IC-RVITM). IEEE, November 2023. https://doi.org/10.1109/ic-rvitm60032.2023.10435241.
Ng, V., Rees, E.E., Niu, J. & Zaghlool. A. Application of natural language processing algorithms for extracting information from news articles in event-based surveillance. Canada Communicable Disease Report. 186–191. https://doi.org/10.14745/ccdr.v46i06a06 (2020) (ISSN 1481-8531).
Almandouh, M.E., Alrahmawy, M. F., Eisa, M., Elhoseny, M. & Tolba. A. S. Ensemble based high performance deep learning models for fake news detection. Scientific Reports. 14(1), https://doi.org/10.1038/s41598-024-76286-0 (2024) (ISSN 2045-2322).
Calvo, R. A., Milne, D. N., Sazzad Hussain, M. & Christensen, H.. Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering 23(5): 649–685. https://doi.org/10.1017/s1351324916000383 (2017) (ISSN 1469-8110).
Liu, Z. et al. Listening to mental health crisis needs at scale: Using natural language processing to understand and evaluate a mental health crisis text messaging service. Frontiers in Digital Health, 3 (2021).
Althoff, T., Clark, K. & Leskovec, J. Large-scale analysis of counseling conversations: An application of natural language processing to mental health. Transactions of the Association for Computational Linguistics 4, 463–476. https://doi.org/10.1162/tacl_a_00111 (2016) (ISSN 2307-387X).
Glaz, A. L. et al. Machine learning and natural language processing in mental health: Systematic review. Journal of Medical Internet Research. 23(5), e15708. https://doi.org/10.2196/15708 (2021) (ISSN 1438-8871).
Harvey, D., Lobban, F., Rayson, P., Warner, A. & Jones, S. Natural language processing methods and bipolar disorder: Scoping review. JMIR Mental Health 9(4), e35928. https://doi.org/10.2196/35928 (2022) (ISSN 2368-7959).
Zhang, T., Schoene, A. M., Ji, S. & Ananiadou. S. Natural language processing applied to mental illness detection: a narrative review. npj Digital Medicine. 5(1) https://doi.org/10.1038/s41746-022-00589-7 (2022) (ISSN 2398-6352).
Murarka, A. Radhakrishnan, B. & Ravichandran. S. Classification of mental illnesses on social media using RoBERTa. In Eben Holderness, Antonio Jimeno Yepes, Alberto Lavelli, Anne-Lyse Minard, James Pustejovsky, and Fabio Rinaldi, editors, Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis, 59–68, online, (2021). Association for Computational Linguistics.
Salas-Zárate, R. et al. Mental-health: An nlp-based system for detecting depression levels through user comments on twitter (x). Mathematics. 12(13), 1926 https://doi.org/10.3390/math12131926 (2024) (ISSN 2227-7390).
Adekkanattu, P. et al. Deep learning for identifying personal and family history of suicidal thoughts and behaviors from ehrs. npj Digital Medicine. 7(1) https://doi.org/10.1038/s41746-024-01266-7 (2024) (ISSN 2398-6352).
Bauer, B. et al. Using large language models to understand suicidality in a social media-based taxonomy of mental health disorders: Linguistic analysis of reddit posts. JMIR Mental Health 11, e57234–e57234. https://doi.org/10.2196/57234 (2024) (ISSN 2368-7959).
Habicht, J. et al. Closing the accessibility gap to mental health treatment with a personalized self-referral chatbot. Nat. Med. 30(2), 595–602. https://doi.org/10.1038/s41591-023-02766-x (2024) (ISSN 1546-170X).
Crema, C. Attardi, G., Sartiano, D. & Redolfi. A. Natural language processing in clinical neuroscience and psychiatry: A review. Frontiers in Psychiatry, 13, https://doi.org/10.3389/fpsyt.2022.946387 (2022) (ISSN 1664-0640).
Zanwar, S. Wiechmann, D. Qiao, Y. & Kerz. E. SMHD-GER: A large-scale benchmark dataset for automatic mental health detection from social media in German. In Andreas Vlachos and Isabelle Augenstein, editors, Findings of the Association for Computational Linguistics: EACL 2023, pp 1526–1541, Dubrovnik, Croatia, (2023). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-eacl.113.
Leung, Y. T. & Khalvati. F. Exploring covid-19-related stressors: Topic modeling study. J. Med. Internet Res. 24(7), e37142. https://doi.org/10.2196/37142 (2022) (ISSN 1438-8871).
Bello, H. J., Palomar-Ciria, N., Lozano, C., Gutiérrez-Alonso, C. & Baca-García, E. Examining the relationship between covid-19 and suicide in media coverage through natural language processing analysis. The European Journal of Psychiatry 38(1), 100227. https://doi.org/10.1016/j.ejpsy.2023.100227 (2024) (ISSN 0213-6163).
Madan, S. et al. Deep learning-based detection of psychiatric attributes from german mental health records. International Journal of Medical Informatics, 161, 104724. https://doi.org/10.1016/j.ijmedinf.2022.104724 (2022) (ISSN 1386-5056).
Chan, B., Möller, T., Pietsch, M., Soni, T. & Yeung. C. M. Open sourcing german bert. https://deepset.ai/german-bert (2019). Accessed: 2019-09-14.
Ronny Mabokela, K., Primus, M. & Celik. T. Advancing sentiment analysis for low-resourced african languages using pre-trained language models. PLOS One. 20(6), e0325102. https://doi.org/10.1371/journal.pone.0325102 (2025) (ISSN 1932-6203).
Joshi, R. et al. Adapting multilingual LLMs to low-resource languages using continued pre-training and synthetic corpus: A case study for Hindi LLMs. In Ruvan Weerasinghe, Isuri Anuradha, and Deshan Sumanathilaka, editors, Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages, pp 50–57, Abu Dhabi, January 2025. Association for Computational Linguistics. https://aclanthology.org/2025.indonlp-1.6/.
Somandepalli, K. et al. Computational media intelligence: Human-centered machine analysis of media. Proc. IEEE 109(5), 891–910. https://doi.org/10.1109/jproc.2020.3047978 (2021) (ISSN 1558-2256).
Zhang, M. Applications of Deep Learning in News Text Classification. Sci. Program. 2021(1), 6095354. https://doi.org/10.1155/2021/6095354 (2021) (ISSN 1875-919X).
Walter, D. & Ophir, Y. News frame analysis: An inductive mixed-method computational approach. Commun. Methods Meas. 13(4), 248–266. https://doi.org/10.1080/19312458.2019.1639145 (2019) (ISSN 1931-2466).
Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) (ISSN 1532-4435).
Pons, P. & Latapy, M. Computing Communities in Large Networks Using Random Walks, page 284–293 (Springer, Berlin Heidelberg, 2005). 9783540320852. https://doi.org/10.1007/11569596_31.
Blondel, V. D., Guillaume, J-L., Lambiotte, R. & Lefebvre. E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10):P10008. https://doi.org/10.1088/1742-5468/2008/10/p10008 (2008) (ISSN 1742-5468).
Ghasiya, P. & Okamura, K. Investigating covid-19 news across four nations: A topic modeling and sentiment analysis approach. IEEE Access 9, 36645–36656. https://doi.org/10.1109/access.2021.3062875 (2021) (ISSN 2169-3536).
Angelov. D. Top2Vec: Distributed Representations of Topics, 2020. https://arxiv.org/abs/2008.09470. Version Number: 1.
Liu. Y. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
Choi, Y-J. & Um. Y-J. Topic models to analyze disaster-related newspaper articles: Focusing on covid-19. International Journal of Mental Health Promotion. 25(3), 421–431. https://doi.org/10.32604/ijmhp.2023.023255 (2023) (ISSN 1462-3730).
Guang, L. et al. Agenda-setting for covid-19: A study of large-scale economic news coverage using natural language processing. International Journal of Data Science and Analytics 15(3), 291–312. https://doi.org/10.1007/s41060-022-00364-7 (2022) (ISSN 2364-4168).
Reimers, N. & Gurevych. I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, August 2019. arXiv:1908.10084 [cs].
Honnibal, M. & Montani. I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear, (2017).
Nemes, L. & Kiss, A. Information extraction and named entity recognition supported social media sentiment analysis during the covid-19 pandemic. Appl. Sci. 11(22), 11017. https://doi.org/10.3390/app112211017 (2021) (ISSN 2076-3417).
Hutto, C. & Gilbert. E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media. 8(1), 216–225, https://doi.org/10.1609/icwsm.v8i1.14550 (2014) (ISSN 2334-0770).
Succar, R., Ramallo, S., Das, R., Ventura, R. B. & Porfiri. M. nderstanding the role of media in the formation of public sentiment towards the police. Communications Psychology. 2(1). https://doi.org/10.1038/s44271-024-00059-8 (2024) (ISSN 2731-9121).
He, P. Liu, X. Gao, J. & Chen. W. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2020).
Gottipati, S. Chong, M. Kiat, A. & Kawidiredjo, B. Exploring media portrayals of people with mental disorders using nlp. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies. SCITEPRESS - Science and Technology Publications, 2021. https://doi.org/10.5220/0010380007080715.
Lin, S.-Y., Kung, Y.-C. & Leu, F.-Y. Predictive intelligence in harmful news identification by bert-based ensemble learning model with text sentiment analysis. Information Processing & Management 59(2), 102872. https://doi.org/10.1016/j.ipm.2022.102872 (2022) (ISSN 0306-4573).
Mittal, S. & De Choudhury. M. Moral framing of mental health discourse and its relationship to stigma: A comparison of social media and news. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23. ACM, (2023). https://doi.org/10.1145/3544548.3580834.
Mohammad. S. Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 174–184, Melbourne, Australia, July 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1017.
Hinton, G. Vinyals, O. & Dean. J. Distilling the knowledge in a neural network. 2015. https://arxiv.org/abs/1503.02531.
Choubey, P. K. Lee, A. Huang, R. & Wang. L. Discourse as a function of event: Profiling discourse structure in news articles around the main event. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.478 (2020).
Chaumartin. F-R. Antelope, une plate-forme de TAL permettant d’extraire les sens du texte: théorie et applications de l’interface syntaxe-sémantique. PhD thesis, Université Paris-Diderot-Paris VII, 2012.
Abu Salem, F. K. Al Feel, R., Elbassuoni, S., Jaber, M. & Farah. M. Fa-kes: A fake news dataset around the syrian war. Proceedings of the International AAAI Conference on Web and Social Media. 13, 573–582. https://doi.org/10.1609/icwsm.v13i01.3254 (2019) (ISSN 2162-3449).
Islam Mahmud, M. A. et al. Toward news authenticity: Synthesizing natural language processing and human expert opinion to evaluate news. IEEE Access. 11, 11405–11421. https://doi.org/10.1109/access.2023.3241483 (2023) (ISSN 2169-3536).
Ji, S., Li, X., Huang, Z. & Cambria, E. Suicidal ideation and mental disorder detection with attentive relation networks. Neural Comput. Appl. 34(13), 10309–10319. https://doi.org/10.1007/s00521-021-06208-y (2021) (ISSN 1433-3058).
Turcan, E. & McKeown. K. Dreaddit: A Reddit dataset for stress analysis in social media. In Eben Holderness, Antonio Jimeno Yepes, Alberto Lavelli, Anne-Lyse Minard, James Pustejovsky, and Fabio Rinaldi, editors, Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pp 97–107, Hong Kong, November 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-6213.
Ji, S. et al. Domain-specific continued pretraining of language models for capturing long context in mental health. (2023).
Choey. M. From stigma to support: A parallel monolingual corpus and NLP approach for neutralizing mental illness bias. In Ruslan Mitkov and Galia Angelova, editors, Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pp 249–254, Varna, Bulgaria, September 2023. INCOMA Ltd., Shoumen, Bulgaria.
Pryzant, R. et al. Automatically neutralizing subjective bias in text. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):480–489, 2020. ISSN 2159-5399. https://doi.org/10.1609/aaai.v34i01.5385.
Hadar-Shoval, D., Asraf, K., Mizrachi, Y., Haber, Y. & Elyoseph. Z. Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values. JMIR Mental Health, (2024).
van Heerden, A. C., Pozuelo, J. R. & Kohrt. B. Global Mental Health Services and the Impact of Artificial Intelligence-Powered Large Language Models. JAMA psychiatry. (2023).
Kjell, O. N. E., Kjell, K. & Schwartz. H. A. Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment. Psychiatry Research, 333, 115667, (2024).
Malhotra, A. & Jindal. R. Xai Transformer based Approach for Interpreting Depressed and Suicidal User Behavior on Online Social Networks. Cognitive Systems Research (2023).
Straw, I. & Callison-Burch. C. Artificial Intelligence in mental health and the biases of language based models. PLoS ONE (2020).
Demszky, D. et al. Pennebaker. Using large language models in psychology. Nature Reviews Psychology (2023).
The authors would like to thank the German Federal Ministry of Health (BMG) for its partial support through grant No. ZMI5-2523GHP027, provided for the project “Strengthening National Immunization Technical Advisory Groups and their Evidence-based Decision-making in the WHO European Region and Globally” (SENSE), which is part of the Global Health Protection Programme (GHPP).
Open Access funding enabled and organized by Projekt DEAL.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Köckritz, J., İlgen, B., Cohrdes, C. et al. Current applications and future directions in natural language processing for news media and mental health. Sci Rep 15, 32532 (2025). https://doi.org/10.1038/s41598-025-18413-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-18413-z