Portable Electronic Nose with Machine Learning Enhances VOC Detection in Forensic Science

2025-09-15 04:37:31 英文原文

作者：John Chasse

Artificial intelligence (AI) and machine learning (ML) are increasingly transforming analytical science, enabling tools such as electronic nose (e-nose) systems for detecting volatile organic compounds (VOCs).

A recent joint study between Linköping University and the Department of Forensic Genetics and Forensic Toxicology of the National Board of Forensic Medicine (both in Linköping, Sweden) demonstrated a 32-element metal oxide semiconductor (MOS)-based e-nose, integrated with advanced supervised ML algorithms, for forensic applications including distinguishing human vs. animal samples, postmortem vs. antemortem states, and estimating postmortem intervals. Phase-randomized validation, sensor ranking based on discriminative utility, and majority voting ensured robust, reproducible classification across varied biological VOC profiles. Alcohol-based co-solvents improved VOC detection range, and sensor cross-reactivity enhanced classification accuracy. The system offers rapid, cost-effective analysis (10 min per measurement plus minutes for classification), presenting a practical alternative to traditional GC-MS in forensics. Beyond forensics, such AI-driven e-nose technology holds promise for environmental monitoring, disease diagnostics, food quality control, and public safety, leveraging unique organism volatilomes to detect disease states, environmental hazards, and decomposition stages in real time. LCGC International spoke to Donatella Puglisi, associate professor at Linköping University, and corresponding author of the paper (1) that resulted from this work.

How has the field of artificial olfaction evolved since Persaud and Dodd’s foundational work in the 1980s (2), and what limitations did early AI/ML methods face in VOC detection?

The seminal paper by Persaud and Dodd introduced the use of gas sensor arrays and artificial neural networks for odor recognition and classification. Since then, ML-based hardware systems and AI/ML techniques were actively explored for olfactory systems, marking significant research interest in the field. However, persistent challenges, such as limited diversity and stability of sensor arrays, have constrained progress, a limitation that remains relevant today. Additionally, the general inaccessibility of advanced ML methods at the time posed challenges for efficient data processing. Early studies often relied on basic techniques such as k-nearest neighbors, support vector machines, and discriminant analysis, which were less robust in handling the complexity of electronic nose (e-nose) sensor outputs compared to modern ensemble learning methods, such as gentle adaptive boosting (GentleBoost), introduced in the 2000s. Material and computational drawbacks likely limited the accuracy and applicability of early e-nose systems. Nevertheless, early contributions have laid the groundwork for significant advances in recent years, driven by accessible and sophisticated ML techniques that enhance the performance of e-nose data analysis, as investigated in our study.

Today, the current widespread availability of ML methods, bolstered by access to substantial computational power and the ability to perform big data analysis, opens up new opportunities for implementing ML-enhanced e-noses in many crucial areas, such as forensic sciences.

Why was a 32-element MOS sensor array selected for this study, and how does sensor cross-reactivity enhance the e-nose’s forensic capabilities?

Using a 32-element MOS sensor array offers significant advantages over smaller arrays, particularly in advanced odor detection for forensic applications. The main benefit lies in the increased diversity and redundancy provided by the larger array. Each sensor in the array has overlapping but slightly different sensitivities to a wide range of VOCs, and this diversity enables the system to generate highly detailed and distinctive odor signatures. With more sensors contributing to the data, the e-nose becomes more sensitive to subtle differences in complex odor mixtures, which is essential for identifying trace compounds in forensic evidence.

Compared to arrays with fewer sensors, a 32-element system improves classification accuracy and robustness by providing a higher-dimensional input for data analysis techniques like ML or statistical modeling. It also offers better performance in complex or variable environments, where individual sensors might be affected by noise or environmental fluctuations. In contrast, while other technologies like quartz crystal microbalances, surface acoustic wave devices, or electrochemical sensors may provide higher specificity or lower detection limits for certain substances, they tend to be more expensive, delicate, and less suited for field use. MOS sensors are generally more robust, cost-effective, and fast-responding, making them particularly suitable for portable forensic devices.

Although cross-reactivity is usually considered a weakness of MOS-based sensors, here we change perspective and consider it a key strength. While each sensor is not highly selective on its own, the collective pattern of responses across the array forms a unique signature for each odor. This pattern-based recognition enables the e-nose to distinguish between similar but distinct odor sources, even when the individual chemical components are unknown or degraded. In forensic work, where odor profiles are often complex and contaminated, this ability to detect and classify based on the overall signature rather than isolated compounds is particularly valuable. Cross-reactivity, when harnessed properly with advanced algorithms, transforms a potential weakness into a powerful advantage, allowing the e-nose to operate effectively in real-world scenarios with high variability and complexity.

What led to the selection of the Optimizable Ensemble model for classification, and how did it outperform traditional methods like PCA or SVM in your forensic cases?

As we can see in Figure 2a of the article (1), PCA plot of the full dataset showed a significant overlap between data, meaning poor class separation, i.e., limited PCA’s effectiveness. This required the use of more advanced methods for data analysis and evaluation.

To select the most effective machine learning model, we evaluated all 43 classification models in MATLAB’s Classification Learner app using the complete feature set. The Optimizable Ensemble demonstrated superior performance through automated hyperparameter optimization, including ensemble aggregation methods and learning parameters, making it our optimal choice for binary classification. The final selected classifier was the one that minimized the estimated cross-validation loss, ensuring an optimal balance between bias and variance.

How did your team address potential data leakage and overfitting in the ML pipeline, especially with limited and ethically constrained biological samples?

Data leakage is a multifaceted problem that can occur at different levels, namely at the sample and sensor level. Regarding sample-level data leakage, we demonstrated that rigorous control over data distribution prevents leakage. Specifically, we ensured that observations from a single sample (each yielding 32 signals) were not split across cross-validation folds or between training and test datasets during 5-fold cross-validation. Our results showed that models trained with random data splits performed comparably to those with controlled sample distribution, both in validation and testing phases. This indicates that sample-level leakage was not a significant factor affecting model performance, as models were consistently evaluated on unseen data.

Sensor-level data leakage is more challenging to avoid. In theory, one might attempt to split training and testing sets in such a way that sensor data from the same sample are not shared across subsets. However, due to the limited number of sensors (32), implementing such a strategy would impose severe constraints, that we explain in our paper. Sensor-level separation is neither practical nor desirable in the context of sensor ablation experiments. Furthermore, since the sensor exclusion was done prior to data splitting, and was based on an external ranking independent of train/test distributions, the risk of sensor-level leakage influencing model results was minimal.

To eliminate any doubts, we considered a potential indirect channel of data leakage related to our application of the sensor utility algorithm to the entire dataset. The sensor utility algorithm evaluated sensor importance independently of the model training and testing process, providing a ranking that did not influence the data used for model development. Nonetheless, we conducted an additional experiment. We first split the data into 90% training and 10% test subsets, applying the sensor utility algorithm solely to the training dataset. The results, presented in Supporting Information #1 (Figure S4 and S5 for CASE I), revealed an identical pattern of sensor utility rankings as when the algorithm was applied to the entire dataset (see Figure 2 in the paper). Furthermore, as shown in Figure S6 (Supporting Information #1), model performance on the unseen test dataset remained high, despite the test data being excluded from sensor utility evaluation.

We conducted similar experiments for CASE II and CASE III (see results in the article's Supporting Information #1: Figure S4, S5, and S6 for CASE II, and Figure S10, S11, and S12 for CASE III [1]).

These findings show that the sensor utility algorithm is robust, producing consistent sensor rankings regardless of whether it is applied to the full dataset or only the training subset. The high performance on unseen test data further confirms that our methodology minimizes data leakage, ensuring generalizable models. This robustness validates our approach and demonstrates that potential leakage channels do not have a significant impact on model performance.

Your study demonstrated strong classification performance between postmortem and antemortem samples. What were the key VOC features or sensor responses driving this separation?

We extracted 85 features from raw and smoothed-normalized sensor signals, encompassing statistical, time-domain, and frequency-domain characteristics. The complete list, including a description of their meaning, can be found in the article's supporting Information #1, Table S15 and S16 (1).

How do sensor-level similarity coefficients inform sensor optimization and utility, and how might this method generalize to other e-nose configurations?

We conducted a feature importance analysis, identifying the top 10 predictors that significantly influenced classification accuracy (Figure 3f in the article [1])). Notably, individual predictor contributions remained below 7.0%, emphasizing the necessity of incorporating all 85 features for optimal model performance.

In Figure 4 of the article (1), we illustrate an overview of the classification pipeline from signal acquisition to final output, which can be adapted and generalized to other e-nose configurations.

What challenges arise when using pig tissue as a proxy for human decomposition, especially in early PMI estimation, and how did your models account for these differences?

Due to ethical reasons, it was necessary to use an animal proxy to estimate PMIs. Methodologically, our approach remains applicable to other animal tissues and, ultimately, to human decomposition. However, more experiments are needed to understand the challenges of using pig tissues rather than other types of tissues.

Given your results, how do you envision e-nose systems complementing or even replacing traditional forensic tools like GC-MS or canine detection?

Given our results and looking forward, e-nose systems could become an invaluable tool for forensic sciences, especially in field settings or rapid response scenarios. Their ability to quickly detect a broad range of odors, combined with portability, cost-effectiveness, and integration with machine learning for advanced pattern recognition, makes them a promising approach for on-site, early-stage investigations and forensic analysis. However, specialized tools like GC-MS and canine units remain essential and it is unlikely that e-nose systems will completely replace them. Instead, they will likely work alongside these tools, forming a more comprehensive and efficient forensic toolkit.

What potential do you see for portable e-nose systems in real-time forensic fieldwork, such as disaster victim identification or mass casualty events?

Portable e-nose systems hold significant potential for real-time forensic fieldwork, particularly in disaster victim identification and mass casualty events. These systems could quickly scan large areas for decomposition odors or hazardous chemicals, helping first responders identify human remains or assess chemical exposure in real time. E-noses could complement traditional methods by providing rapid, on-site detection, reducing the need for costly and time-consuming laboratory analysis. In mass casualty situations, they could aid in triage by identifying chemical or biological markers and assist with victim profiling. Additionally, e-noses could integrate with other field technologies like drones for wider coverage. While challenges such as sensitivity and environmental factors remain, the portability and speed of e-noses make them valuable tools for enhancing response times, prioritizing search efforts, and improving outcomes in forensic investigations.

How did feature importance analysis (for example, max1stDeriv, complexity, SNR) contribute to understanding model decisions, and do you foresee these features being standardized in future forensic ML workflows?

Feature importance analysis plays a crucial role in understanding how ML models make decisions, particularly in complex systems like forensic odor detection. By evaluating metrics such as max1stDeriv, complexity, and SNR, it becomes possible to interpret which features most significantly contribute to the model’s predictions. These features are indicative of different aspects of the data—max1stDeriv reflects the rate of change in the sensor’s response, complexity measures the variability or richness of the signal, and SNR gauges the clarity of the signal relative to background noise. By analyzing these factors, forensic experts can gain insights into which specific patterns or sensor responses the model relies on for accurate classification, thus enhancing the transparency and trustworthiness of the system.

Standardization in future forensic ML workflows would allow for consistent model interpretation and cross-application compatibility, making it easier to deploy these systems in varied forensic contexts, from crime scene investigation to disaster victim identification. It would also enable better calibration of e-noses and other sensor-based systems, improving the robustness and reliability of forensic decision-making. As ML-driven forensic tools become more prevalent, standardizing feature importance metrics will be essential for ensuring model reliability, interpretability, and general acceptance in legal and ethical frameworks as well as within scientific communities.

With future work aiming to classify specific days postmortem, what technical or data challenges must be overcome to achieve reliable day-level PMI resolution in diverse environments?

Achieving reliable day-level PMI estimation in diverse environments presents several challenges. Variability in environmental conditions, such as temperature and humidity, significantly affects decomposition rates, requiring systems to normalize and correlate environmental data with odor signals. Current sensors need improved sensitivity to detect subtle decomposition compounds over time and distinguish them from background odors. Additionally, the evolving nature of decomposition mixtures complicates precise PMI estimation, as different compounds dominate at various stages. A major hurdle is obtaining diverse, well-labeled datasets for training models, as ethical constraints limit access to controlled data. To address these issues, machine learning models must be robust enough to generalize across different environments, handle time-series data, and integrate multimodal data such as temperature and humidity. Overcoming these challenges will be essential to accurately predicting PMI in real-world forensic scenarios.

References

Shtepliuk I, Montelius K, Eriksson J, Puglisi D. Adaptive Machine Learning for Electronic Nose-Based Forensic VOC Classification. Adv Sci (Weinh). 2025 Jun 24:e04657. DOI: DOI: 10.1002/advs.202504657
Persaud, K.; Dodd. G. Analysis of Discrimination Mechanisms in the Mammalian Olfactory System Using a Model Nose. Nature 1982, 299 (5881), 352-355. DOI: 10.1038/299352a0

Donatella Puglisi is an Associate Professor at Linköping University, Sweden, with a robust academic career widely recognized internationally. With expertise in interdisciplinary applied physics, her work focuses on gas sensor systems, particularly machine learning-enhanced electronic noses for advanced odor detection in a variety of applications such as forensic and health diagnostics, indoor air quality and environmental monitoring. Puglisi has led over 25 high-impact research and innovation projects across various fields, securing over €17 million in funding and contributing significantly to the advancement of gas sensor technologies addressing societal challenges and real-world applications. She has served as an evaluation expert for major research funding bodies, such as the COST Action and Dutch Research Council, evaluator of academic positions and doctoral theses, member of expert panels, working group member, and research collaborator in international contexts. Puglisi has published 67 peer-reviewed papers and two book chapters, and has presented her research at over 100 conferences and workshops. Photo courtesy of Puglisi.

关于《Portable Electronic Nose with Machine Learning Enhances VOC Detection in Forensic Science》的评论

暂无评论

发表评论

摘要

A joint study by Linköping University and the National Board of Forensic Medicine in Sweden demonstrates a 32-element metal oxide semiconductor (MOS)-based e-nose integrated with advanced supervised machine learning algorithms for forensic applications, including distinguishing human vs. animal samples, postmortem vs. antemortem states, and estimating postmortem intervals. The system offers rapid, cost-effective analysis and presents a practical alternative to traditional GC-MS in forensics. Beyond forensics, such AI-driven e-nose technology holds promise for environmental monitoring, disease diagnostics, food quality control, and public safety. Challenges addressed include data leakage and overfitting, sensor optimization, and environmental variability affecting decomposition odors.

Portable Electronic Nose with Machine Learning Enhances VOC Detection in Forensic Science

关于《Portable Electronic Nose with Machine Learning Enhances VOC Detection in Forensic Science》的评论

发表评论

摘要

相关新闻

相关讨论