Computational machine learning estimation of digitoxin solubility in supercritical solvent at different temperatures utilizing ensemble methods

Article
Open access
Published: 10 August 2025

Scientific Reports volume 15, Article number: 29248 (2025) Cite this article

Abstract

The solubility of medications in supercritical solvent is the most important factor that can be determined via appropriate computational tools. This work explores the modeling of digitoxin solubility as the case study in supercritical CO₂ and solvent density utilizing ensemble methods. Temperature and pressure are the input parameters, while solvent density and digitoxin solubility are the output parameters. Several machine learning models along with optimizer were used for correlation of the dataset. Employing AdaBoost as an ensemble method, predictions from Bayesian Ridge Regression (BRR), Gaussian process regression (GPR), and K-nearest neighbors (KNN) are amalgamated. Sailfish Optimizer (SFO) is utilized for hyper-parameter tuning to enhance model performance. Results reveal that AdaBoost combined with ADA-GPR exhibits the lowest Average Absolute Relative Deviation (AARD%) values, with solubility achieving 7.74 and solvent density reaching 2.76, respectively. This underscores the efficacy of ensemble methods and hyper-parameter tuning in accurately predicting complex chemical properties in supercritical CO₂ systems.

Introduction

To enhance the solubility of medicines, size reduction is the most commonly used method which can be implemented for various solid-dosage formulations. Those drugs that are classified as BCS class II, possess low solubility in aqueous solutions. The drugs with low solubility and high permeability are classified in this group according to Biopharmaceutical Classification System (BCS)¹. Size reduction will enhance the solubility and consequently the bioavailability owing to the higher surface energy of drug powder in the nanosize². Therefore, this approach is suitable in the pharmaceutical industry for improving the properties of medications with poor solubility. Another approach for enhancing the solubility is amorphization by which a drug is transformed from crystalline to amorphous state with higher solubility. However, the problem with the amorphous state is its poor stability which recrystallizes over time and loses the properties of amorphous state. As such, it must be stabilized which can be done via distribution of drugs inside a polymeric matrix to prevent mobility and recrystallization^3,4.

For size reduction, supercritical method can be employed, which is a newly developed process for pharmaceutical processing where it has the ability to reduce the size of particles in a continuous model of operation. The method is also mentioned as green processing due to the lack of organic solvent in the size reduction operation, thereby offering sustainable processing^5,6,7. For size reduction using this method, the drug must be dissolved in the solvent at an acceptable value, thus the solubility of drug in supercritical solvent is the limiting step for development of this green process. As such, the solubility of medicine in the supercritical solvent should be estimated via robust methods to ensure the reliability of method in precise determination of solubility prior to the operation. Computational techniques can be used as screening tools for assessment of drugs nanoparticle production in this process. Thermodynamic approach is a reliable tool for estimating drug solubility in supercritical solvents (e.g., CO₂), which is built based on phase equilibria, such as solid-liquid equilibrium^8,9.

Despite the advantages of thermodynamic models for correlating medicines solubility in supercritical solvents, the models are not easy to be applied and there is sometimes complexity in implementing these models. As such, development of models with less complexity and generality would be preferred for estimating drugs solubility in supercritical solvents. Data-driven approach is an alternative for correlation of drugs solubility which rely on the availability of measurements and dataset. Some machine learning and optimizer algorithms have been implemented for estimation of drugs solubility in supercritical carbon dioxide, and reports have shown the models are of great accuracy in this field^10,11,12,13. One of the data-driven models is machine learning (ML) where its domain is currently undergoing significant growth and has been widely employed in various applications, including the analysis of measured data for the purpose of optimization, regression, and prediction¹⁴. AI (Artificial Intelligence) based models and deep learning (DL) have been recently studied for pharmaceutical and energy applications^15,16,17 which can be further explored for solubility analysis of medications.

Machine learning techniques facilitate the creation of accurate and reliable models that can identify patterns within complex datasets and produce predictions based on those patterns^18,19,20. This study utilizes AdaBoost as an ensemble method to incorporate predictions from Gaussian process regression (GPR), Bayesian Ridge Regression (BRR), and K-nearest neighbors (KNN). The Sailfish Optimizer (SFO) is employed for the purpose of hyper-parameter tuning in to enhance the performance of the model.

In the field of Bayesian theory and statistical learning theory, GPR has emerged as a useful machine learning technique. The framework provides a flexible structure for executing probabilistic regression and is widely utilized for tackling regression issues marked by high dimensionality, limited sample sizes, or nonlinearity²¹. Bayesian Ridge Regression models are a combination of Bayesian inference and regression techniques. This method provides more reliable and efficient estimation of coefficients in linear models compared to least squares regression²². The KNN regression model is a data-driven technique employed for the purpose of predicting short-term traffic flow. The algorithm operates by identifying the K most analogous instances to a provided input and calculating their mean as the prediction²³.

So, the current study develops a methodology based on ML for modeling drug solubility in supercritical CO₂ and makes correlation with pressure and temperature. The models included GPR, BRR, and KNN which are optimized using Sailfish Optimizer. Digitoxin is selected in this study for implementing methodology and assessing the reliability of models and optimizer.

Materials and methods

Data of digitoxin solubility

A number of measured data was applied for developing and testing the models which are collected from reference²⁴ and its statistical analysis is presented in Table 1 which are based on raw data. The drug used for the analysis is digitoxin, and its solubility in supercritical CO₂ was collected and used in this work for building ML models and optimizer. The same data was already used by Li et al.²⁵ to build several ML techniques in estimation of digitoxin solubility. Temperature and pressure are the two input features in this dataset owing to their significant impact on the solubility of digitoxin. As the solvent is compressible, the influence of pressure should be evaluated to understand its effect on digitoxin solubility changes. Figure 1 shows histograms of all parameters of the dataset. Thus, the machine learning models were developed utilizing two inputs and two outputs which are solubility of drug and the density of solvent.

Table 1 Statistical overview of dgitoxin solubility data used in this study.

Sailfish optimizer (SFO)

Enhancing the effectiveness of machine learning algorithms hinges crucially on optimizing hyper-parameters. The choice of appropriate hyper-parameters profoundly influences both the accuracy and the generalization capabilities of these algorithms²⁶. In the last several years, there has been escalating interest in nature-inspired optimization methods, owing to their efficacy in fine-tuning hyper-parameters²⁷. The Sailfish Optimizer (SFO) is an algorithm that is influenced by the synchronized swimming behavior observed in sailfish²⁸.

The SFO algorithm is a metaheuristic method inspired by the hunting and cooperative behavior of marine sailfish²⁹. Mathematically, the Sailfish Optimizer functions by iteratively adjusting the hyper-parameters through a series of equations that mimic the motion of the sailfish. The equation governing the update of the position of the i-th sailfish in D-dimensional space can be expressed as follows²⁸:

$$\:{X}_{i}^{t+1}={X}_{i}^{t}+{V}_{i}^{t+1}$$

where $\:{X}_{i}^{t}$ stands for the position of the i-th sailfish at t-th iteration, and$\:{V}_{i}^{t+1}$ denotes the velocity vector of the i-th sailfish at the next iteration²⁸.

ADABOOST

AdaBoost, an abbreviation for “adaptive boosting,” is a widely utilized machine learning approach that surpasses simpler algorithms in predictive accuracy. This method involves adjusting a weighted combination of functions to align with the aggregated data, determining the total error, and then modifying an initial function accordingly. At each step, multiple base models are implemented on the adjusted dataset, and their respective errors are assessed. Consequently, the mistake from one iteration influences the subsequent model in every cycle. The final prediction is derived by assigning appropriate weights to each function and summing their contributions once the adjusted error is minimized to zero^30,31.

Base models

We utilized Gaussian Process Regression (GPR) as one of the base models. A Gaussian process (GP) is a collection of random variables some of which have Gaussian distributions (GDs)^32,33. The covariance and mean functions serve as effective metrics for evaluating the performance of a GP. In the GPR framework, GDs are extended, with the mean represented as a vector and the covariance as a matrix^34,35.

What distinguishes GPR from other regression models is its lack of necessity for a precise specification of a fitting function. Instead, it utilizes statistical models that approximate a random sample from a multidimensional GD to interpret field data³⁴.

The second base model is Bayesian Ridge Regression (BRR). BRR combines Bayesian inference with regression models. This method estimates coefficients in linear models more robustly and efficiently than least squares regression. BRR estimates linear model coefficients using prior data beliefs. A prior distribution encodes prior beliefs, which are integrated with the likelihood function to create the posterior distribution. Estimating linear model coefficients using this distribution allows data estimation. We assume a normal distribution of regression coefficients with a zero mean and an alpha hyper-parameter that determines accuracy (inverse variance).

The likelihood function follows a normal distribution, where the linear regression model predicts the mean and the additional hyperparameter, lambda, determines the variance. The aim is to ascertain the probable values of the regression coefficients β by leveraging existing data and prior information. The subsequent equation defines the posterior distribution of β³⁶:

$$\:p(\beta\:\mid\:X,y,\alpha\:,\lambda\:)=\text{N}(\beta\:\mid\:\mu\:,\varSigma\:)$$

The mean vector represented by $\:\mu\:$ and the covariance matrix $\:\varSigma\:$ in the provided equation denote the statistical properties of the posterior distribution. Analytical calculations are performed using the Bayesian formula to determine the values of these parameters³⁷:

$$\:\mu\:={(\lambda\:\cdot\:{X{\prime\:}X}^{\cdot\:}+\alpha\:\cdot\:I)}^{-1}\cdot\:{X{\prime\:}y}^{\cdot\:}$$

$$\:\varSigma\:={(\lambda\:\cdot\:{X{\prime\:}X}^{\cdot\:}+\alpha\:\cdot\:I)}^{-1}$$

In the context mentioned, $\:X{\prime\:}X$ stands for the result of transposing the matrix of independent variables and then multiplying it by itself, while $\:X{\prime\:}y$ denotes the product of transposing the matrix of independent variables (input) and multiplying it by the matrix of dependent variables (output). Additionally, the symbol $\:I$ denotes the identity matrix. More details are reported elsewhere about the model^38,39,40.

Results and discussion

The models implementation was performed using Python software, 3.8 version, accessible at: https://www.python.org. The hyperparameters of the proposed models were optimized via SFO algorithm as previously outlined. The optimized models were used for evaluation of drug solubility and finding which model can best predict digitoxin solubility in the supercritical solvent. The results obtained for each regression model in terms of Average Absolute Relative Deviation (AARD%), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) are summarized in Tables 2 and 3 for both solubility and density, respectively. The errors between the estimated and measured values have been calculated and used as the main metrics for evaluation of the models’ precision.

The findings reveal that the ADA-GPR model outperforms the other models in terms of solubility and solvent density prediction. Figures 2 and 3 show a comparison of expected and predicted values for both outputs using this model. The ADA-GPR model achieved the lowest AARD% values for both predictions. Specifically, for solubility prediction, ADA-GPR achieved an AARD% of 7.74721E + 00, while for solvent density prediction, it attained an AARD% of 2.76323E + 00. Additionally, the ADA-GPR model demonstrated the lowest RMSE and MAE values, further highlighting its robust performance for estimation of digitoxin solubility.

The comparative analysis of the best model in this work is indicated in Table 4. ADA-GPR model optimized in this study has AARD% of 7.74 which is lower than the traditional thermodynamic models based on Equation of State (EoS) developed by Sheikhi-Kouhsar et al.²⁴.

Table 2 Solubility predictions.

Table 3 Density of solvent predictions.

Table 4 Comparison between this work and previous study for solubility estimation.

The exceptional efficacy demonstrated by the ADA-GPR model can be ascribed to its adeptness in capturing intricate correlations within the dataset and its adaptability in depicting nonlinear patterns. The ADA-BRR model also performed reasonably well, particularly for solubility prediction, but it is behind the ADA-GPR model in terms of fitting accuracy. On the other hand, the ADA-KNN model exhibited the highest AARD% values and the poorest performance among the three models. The final analysis was conducted using the ADA-GPR model based on the available data. Final prediction surfaces are shown in Figs. 4 and 5. Also, Figs. 6, 7, 8 and 9 are the partial dependencies between inputs and outputs. Confirmation is observed for the results obtained in this research through comparing with the previous studies on computation of pharmaceutical solubility, while the variations have been seen to be similar²⁵. The pressure and temperature effects on the solubility and density in this work show agreement with previous works which reported the use of machine learning in drugs solubility correlation^25,41,42,43.

The results confirm the validity of the methodology designed in this work, as the observations match the experimental trends. For the drug solubility versus temperature, a decreasing trend is seen in Fig. 9 for low pressure, while for higher pressure, the solubility increases with temperature. This observation in the drug solubility is related to the cross-over pressure zone which can change the path of solubility variations by enhancing the temperature. As for density, the trend is justified, and one can observe that the density is increased with rising pressure and decreased with increasing T. The trend of pressure is owing to the compressibility of the solvent that is supercritical CO₂^25,44. Thus, it is clearly revealed by the models in this study that the digitoxin solubility in the solvent can be significantly varied by adjusting the pressure as well as temperature to achieve the target value for solubility⁴². A function can be then defined to reversely estimate the values of pressure and temperature for a given digitoxin solubility by the aid of optimized ML model. Furthermore, the combined effects of density variations on the solubility can be analyzed as the models are able to determine density values versus temperature and pressure.

Conclusion

In this research work, we investigated the prediction of solubility of digitoxin and the density of the solvent in supercritical CO₂ utilizing ensemble methods and regression models. We employed the AdaBoost ensemble method to combine the predictions from GPR, BRR, and KNN regression models. Hyper-parameter optimization was conducted utilizing the Sailfish Optimizer (SFO) to optimize the performance of the models. Our findings reveal that the AdaBoost ensemble method combined with GPR yielded the most accurate predictions for both solvent density and solubility, as evidenced by the lowest Average Absolute Relative Deviation (AARD%) values. Specifically, for solubility prediction, the AdaBoost-GPR combination achieved an AARD% of 7.74, while for solvent density prediction, it attained an AARD% of 2.76. These results underscore the effectiveness of ensemble methods and hyper-parameter tuning methods in accurately predicting complex chemical properties in supercritical CO₂ systems. The findings obtained from this study have the potential to contribute valuable knowledge to future research efforts focused on enhancing hyper-parameters and refining predictive modeling methods for analogous chemical systems.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

References

Csicsák, D. et al. The effect of the particle size reduction on the biorelevant solubility and dissolution of poorly soluble drugs with different acid-base character. Pharmaceutics 15(1), 278 (2023).
PubMed PubMed Central Google Scholar
Kumar, R. et al. Particle size reduction techniques of pharmaceutical compounds for the enhancement of their dissolution rate and bioavailability. J. Pharm. Innov. 17(2), 333–352 (2022).
Google Scholar
Rams-Baron, M. et al. Amorphous Drug Solubility and Absorption Enhancement, in Amorphous Drugs: Benefits and Challenges 41–68 (Springer International Publishing, 2018).
Zhuo, X. et al. Mechanisms of drug solubility enhancement induced by β-lactoglobulin-based amorphous solid dispersions. Mol. Pharm. 20(10), 5206–5213 (2023).
PubMed Google Scholar
Abourehab, M. A. S. et al. Theoretical investigations on the manufacture of drug nanoparticles using green supercritical processing: Estimation and prediction of drug solubility in the solvent using advanced methods. J. Mol. Liq. 120559 (2022).
Faris Alotaibi, H. et al. Pharmaceutical nanonization by green supercritical processing: investigation of exemestane anti-estrogenic medicine solubility using machine learning. J. Mol. Liq. 392, 123353 (2023).
Google Scholar
Zhang, Y. Analysis of nanonization and purification of organic compounds via green supercritical processing: model development using advanced hybrid techniques. Case Stud. Therm. Eng. 55, 104159 (2024).
Google Scholar
Faraz, O. et al. Thermodynamic modeling of pharmaceuticals solubility in pure, mixed and supercritical solvents. J. Mol. Liq. 353, 118809 (2022).
Google Scholar
Zarei, A., Haghbakhsh, R. & Raeissi, S. Overview and thermodynamic modelling of deep eutectic solvents as co-solvents to enhance drug solubilities in water. Eur. J. Pharm. Biopharm. 193, 1–15 (2023).
PubMed Google Scholar
Alanazi, M. et al. Development of a novel machine learning approach to optimize important parameters for improving the solubility of an anti-cancer drug within green chemistry solvent. Case Stud. Therm. Eng. 49, 103273 (2023).
Google Scholar
Cenci, F. et al. Predicting drug solubility in organic solvents mixtures: A machine-learning approach supported by high-throughput experimentation. Int. J. Pharm. 660, 124233 (2024).
PubMed Google Scholar
Ghazwani, M. et al. Development of advanced model for understanding the behavior of drug solubility in green solvents: machine learning modeling for small-molecule API solubility prediction. J. Mol. Liq. 386, 122446 (2023).
Google Scholar
Wang, C. et al. Prediction of enhanced drug solubility related to clathrate compositions and operating conditions: machine learning study. Int. J. Pharm. 646, 123458 (2023).
PubMed Google Scholar
Rabbani, Y. et al. Application of artificial neural networks and support vector regression modeling in prediction of magnetorheological fluid rheometery. Colloids Surf., A. 520, 268–278 (2017).
Google Scholar
Togun, H. et al. Advancing organic photovoltaic cells for a sustainable future: the role of artificial intelligence (AI) and deep learning (DL) in enhancing performance and innovation. Sol. Energy. 291, 113378 (2025).
Google Scholar
Halawani, R. F. et al. An advanced heat design-CO₂ capture network for an oxy-biogas fuel combustion cycle combined with a CAES-based method for peak shaving: An artificial intelligent-driven optimization. Renew. Energy. 242, 122474 (2025).
Google Scholar
Saini, J. P. S., Thakur, A. & Yadav, D. AI-driven innovations in pharmaceuticals: optimizing drug discovery and industry operations. RSC Pharm. 2(3), 437–454 (2025).
Google Scholar
Alpaydin, E. Introduction To Machine Learning (MIT Press, 2020).
Graish, M. S. et al. Prediction of the viscosity of iron-CuO/water-ethylene glycol non-Newtonian hybrid nanofluids using different machine learning algorithms. Case Stud. Chem. Environ. Eng. 11, 101180 (2025).
Google Scholar
Hajinajaf, N. et al. Integrated CO₂ capture and nutrient removal by microalgae chlorella vulgaris and optimization using neural network and support vector regression. Waste Biomass Valoriz. 13(12), 4749–4770 (2022).
Google Scholar
Zhikun, H. et al. Overview of Gaussian process regression. Control Decis. 28(8), 1121–1129 (2013).
Google Scholar
Shi, Q., Abdel-Aty, M. & Lee, J. A Bayesian ridge regression analysis of congestion’s impact on urban expressway safety. Accid. Anal. Prev. 88, 124–137 (2016).
Kang, S. K-nearest neighbor learning with graph neural networks. Mathematics 9(8), 830 (2021).
Google Scholar
Sheikhi-Kouhsar, M. et al. Solubility of digitoxin in supercritical CO₂: Experimental study and modeling. Eur. J. Pharm. Sci., 106731. (2024).
Li, M. et al. Employment of artificial intelligence approach for optimizing the solubility of drug in the supercritical CO₂ system. Case Stud. Therm. Eng. 57, 104326 (2024).
Google Scholar
Shang, Y. et al. Artificial neural network hyperparameters optimization for predicting the thermal conductivity of mxene/graphene nanofluids. J. Taiwan Inst. Chem. Eng. 164, 105673 (2024).
Google Scholar
Zhou, H. et al. Combination of group method of data handling neural network with multi-objective Gray Wolf optimizer to predict the viscosity of MWCNT-TiO2 -oil SAE50 nanofluid. Case Stud. Therm. Eng. 64, 105541 (2024).
Google Scholar
Shadravan, S., Naji, H. R. & Bardsiri, V. K. The sailfish optimizer: A novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng. Appl. Artif. Intell. 80, 20–34 (2019).
Google Scholar
Srivastava, A. & Das, D. K. A sailfish optimization technique to solve combined heat and power economic dispatch problem. In 2020 IEEE Students Conference on Engineering & Systems (SCES). (IEEE, 2020).
Schapire, R. E. The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990).
Google Scholar
Freund, Y. & Schapire, R. E. Experiments with a new boosting algorithm. In icml. ( Citeseer, 1996).
Grbić, R., Kurtagić, D. & Slišković, D. Stream water temperature prediction based on Gaussian process regression. Expert Syst. Appl. 40(18), 7407–7414 (2013).
Google Scholar
Ma, X., Xu, F. & Chen, B. Interpolation of wind pressures using Gaussian process regression. J. Wind Eng. Ind. Aerodyn. 188, 30–42 (2019).
Google Scholar
Rasmussen, C. E. Gaussian processes in machine learning. In Summer School on Machine Learning (Springer, 2003).
Song, H. et al. Advancing nanomedicine production via green method: modeling and simulation of pharmaceutical solubility at different temperatures and pressures. J. Mol. Liq. 411, 125806 (2024).
Google Scholar
Williams, P. M. Bayesian regularization and pruning using a Laplace prior. Neural Comput. 7(1), 117–143 (1995).
Google Scholar
Kruschke, J. K. Bayesian data analysis. Wiley Interdisciplinary Reviews: Cogn. Sci. 1(5), 658–676 (2010).
Google Scholar
Kudraszow, N. L. & Vieu, P. Uniform consistency of kNN regressors for functional variables. Stat. Probab. Lett. 83(8), 1863–1870 (2013).
Google Scholar
Cover, T. Estimation by the nearest neighbor rule. IEEE Trans. Inf. Theory. 14(1), 50–55 (1968).
MATH Google Scholar
Chen, C. R. & Three Kartini, U. K-nearest neighbor neural network models for very short-term global solar irradiance forecasting based on meteorological data. Energies. 10(2), 186 (2017).
Li, M. et al. Optimization of drug solubility inside the supercritical CO₂ system via numerical simulation based on artificial intelligence approach. Sci. Rep. 14(1), 22779 (2024).
PubMed PubMed Central Google Scholar
Almehizia, A. A. et al. Numerical optimization of drug solubility inside the supercritical carbon dioxide system using different machine learning models. J. Mol. Liq. 392, 123466 (2023).
Google Scholar
Meng, D. & Liu, Z. Machine learning aided pharmaceutical engineering: model development and validation for estimation of drug solubility in green solvent. J. Mol. Liq. 392, 123286 (2023).
Google Scholar
Aldawsari, M. F., Mahdi, W. A. & Alamoudi, J. A. Data-driven models and comparison for correlation of pharmaceutical solubility in supercritical solvent based on pressure and temperature as inputs. Case Stud. Therm. Eng. 49, 103236 (2023).
Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia for funding this work under researcher supporting project number (PNURSP2025R205).

Funding

This work was supported by Princess Nourah bint Abdulrahman University researchers supporting project number (PNURSP2025R205), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

Department of Pharmaceutical Sciences, College of Pharmacy, Princess Nourah Bint AbdulRahman University, 11671, Riyadh, Saudi Arabia
Hadil Faris Alotaibi
University of Warith Al-Anbiyaa, Kerbala, 56001, Iraq
Waqed H. Hassan
Department of Civil Engineering, College of Engineering, University of Kerbala, Kerbala, 56001, Iraq
Waqed H. Hassan
Al-Manara College for Medical Sciences, Amarah, Maysan, Iraq
Ahmed Kateb Jumaah Al-Nussairi
Faculty of Data Science and Information Technology, INTI International University, Persiaran Perdana BBN, Putra Nilai, 71800, Nilai, Malaysia
Narinderjit Singh Sawaran Singh
Department of Chemical Engineering and Petroleum Refining, Kut University College, Kut, Wasit, 52001, Iraq
Ahmed Salah Al-Shati
Department of Chemistry and Biochemistry, School of Sciences, JAIN (Deemed to be University), Bangalore, Karnataka, India
M. M. Rekha
Department of Biochemistry, IMS and SUM Hospital, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, 751003, India
Subhashree Ray
School of Applied and Life Sciences, Division of Research and Innovation, Uttaranchal University, Dehradun, Uttarakhand, India
Aashna Sinha
Centre for Research Impact & Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, India
Gunjan Garg

Authors

Hadil Faris Alotaibi
Waqed H. Hassan
Ahmed Kateb Jumaah Al-Nussairi
Narinderjit Singh Sawaran Singh
Ahmed Salah Al-Shati
M. M. Rekha
Subhashree Ray
Aashna Sinha
Gunjan Garg

Contributions

Hadil Faris Alotaibi: Writing, Methodology, Investigation, Validation, Supervision, Funding acquisition.Waqed H. Hassan: Writing, Modeling, Investigation, Validation, Software.Ahmed Kateb Jumaah Al-Nussairi: Writing, Methodology, Resources, Software.Narinderjit Singh Sawaran Singh: Writing, Resources, Investigation, Software.Ahmed Salah Al-Shati: Writing, Methodology, Formal analysis, Software.Rekha M. M.: Writing, Methodology, Investigation, Validation, Conceptualization.Subhashree Ray: Writing, Conceptualization, Investigation, Validation, Software.Aashna Sinha: Writing, Methodology, Investigation, Validation, Resources.Gunjan Garg: Writing, Resources, Validation, Conceptualization.All authors reviewed the manuscript.

Corresponding author

Correspondence to Hadil Faris Alotaibi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Alotaibi, H.F., Hassan, W.H., Al-Nussairi, A.K.J. et al. Computational machine learning estimation of digitoxin solubility in supercritical solvent at different temperatures utilizing ensemble methods. Sci Rep 15, 29248 (2025). https://doi.org/10.1038/s41598-025-15049-x

Download citation

Received: 24 June 2025
Accepted: 05 August 2025
Published: 10 August 2025
DOI: https://doi.org/10.1038/s41598-025-15049-x