Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance

Osphanie Mentari; Muhammad Shujaat; Hilal Tayara; Kil Chong

doi:10.2174/0115748936269606231001140647

Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance

作者: Mentari O.¹, Shujaat M.¹, Tayara H.², Chong K.¹
隶属关系:
1. Department of Electronics and Information Engineering, Jeonbuk National University
2. School of International Engineering and Science,, Jeonbuk National University
期: 卷 19, 编号 7 (2024)
页面: 641-650
栏目: Life Sciences
URL: https://jdigitaldiagnostics.com/1574-8936/article/view/643987
DOI: https://doi.org/10.2174/0115748936269606231001140647
ID: 643987

如何引用文章

全文:

详细
作者简介
参考
补充文件
统计

详细

Background:One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing.

Objective:Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores.

Methods:In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation.

Results:The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%.

Conclusion:Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia

关键词

Drug, immune, thrombocytopenia, toxicity, feature selection, machine learning, prediction, molecular descriptors.

参考

Aster RH, Bougie DW. Drug-induced immune thrombocytopenia. N Engl J Med 2007; 357(6): 580-7. doi: 10.1056/NEJMra066469 PMID: 17687133
Arnold DM, Kukaswadia S, Nazi I, et al. A systematic evaluation of laboratory testing for drug‐induced immune thrombocytopenia. J Thromb Haemost 2013; 11(1): 169-76. doi: 10.1111/jth.12052 PMID: 23121994
Vayne C, Guéry EA, Rollin J, Baglo T, Petermann R, Gruel Y. Pathophysiology and diagnosis of drug-induced immune thrombocytopenia. J Clin Med 2020; 9(7): 2212. doi: 10.3390/jcm9072212 PMID: 32668640
Bakchoul T, Marini I. Drug-associated thrombocytopenia. Hematology (Am Soc Hematol Educ Program) 2018; 2018(1): 576-83. doi: 10.1182/asheducation-2018.1.576
George JN, Aster RH. Drug-induced thrombocytopenia: pathogenesis, evaluation, and management. Hematology (Am Soc Hematol Educ Program) 2009; 2009(1): 153-8. doi: 10.1182/asheducation-2009.1.153 PMID: 20008194
Arnold D, Curtis B, Bakchoul T. Recommendations for standardization of laboratory testing for drug-induced immune thrombocytopenia: communication from the ssc of the isth, Journal of thrombosis and haemostasis. JTH 2015; 13(4): 676. PMID: 25604471
van den Bemt PMLA, Meyboom RHB, Egberts ACG. Drug-induced immune thrombocytopenia. Drug Saf 2004; 27(15): 1243-52. doi: 10.2165/00002018-200427150-00007 PMID: 15588119
Curtis BR. Drug-induced immune thrombocytopenia: incidence, clinical features, laboratory testing, and pathogenic mechanisms. Immunohematology 2014; 30(2): 55-65. doi: 10.21307/immunohematology-2019-099 PMID: 25247620
Moroy G, Martiny VY, Vayer P, Villoutreix BO, Miteva MA. Toward in silico structure-based ADMET prediction in drug discovery. Drug Discov Today 2012; 17(1-2): 44-55. doi: 10.1016/j.drudis.2011.10.023 PMID: 22056716
Ferreira LLG, Andricopulo AD. ADMET modeling approaches in drug discovery. Drug Discov Today 2019; 24(5): 1157-65. doi: 10.1016/j.drudis.2019.03.015 PMID: 30890362
Shi T, Yang Y, Huang S, et al. Molecular image-based convolutional neural network for the prediction of ADMET properties. Chemom Intell Lab Syst 2019; 194: 103853. doi: 10.1016/j.chemolab.2019.103853
Cheng F, Shen J, Yu Y, et al. In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. Chemosphere 2011; 82(11): 1636-43. doi: 10.1016/j.chemosphere.2010.11.043 PMID: 21145574
Wang Z, Zhao P, Zhang X, et al. In silico prediction of chemical respiratory toxicity via machine learning. Comput Toxicol 2021; 18: 100155. doi: 10.1016/j.comtox.2021.100155
Basile AO, Yahi A, Tatonetti NP. Artificial intelligence for drug toxicity and safety. Trends Pharmacol Sci 2019; 40(9): 624-35. doi: 10.1016/j.tips.2019.07.005 PMID: 31383376
Thakkar S. chen M, Fang H, Liu Z, Roberts R, Tong W. The Liver Toxicity Knowledge Base (LKTB) and drug-induced liver injury (DILI) classification for assessment of human liver injury. Expert Rev Gastroenterol Hepatol 2018; 12(1): 31-8. doi: 10.1080/17474124.2018.1383154 PMID: 28931315
Chierici M, Francescatto M, Bussola N, Jurman G, Furlanello C. Predictability of drug-induced liver injury by machine learning. Biol Direct 2020; 15(1): 3. doi: 10.1186/s13062-020-0259-4 PMID: 32054490
Xu Y, Dai Z, Chen F, Gao S, Pei J, Lai L. Deep learning for drug induced liver injury. J Chem Inf Model 2015; 55(10): 2085-93. doi: 10.1021/acs.jcim.5b00238 PMID: 26437739
Kuna L, Bozic I, Kizivat T, et al. Models of drug induced liver injury (dili)current issues and future perspectives. Curr Drug Metab 2018; 19(10): 830-8. doi: 10.2174/1389200219666180523095355 PMID: 29788883
Jaganathan K, Tayara H, Chong KT. Prediction of drug-induced liver toxicity using svm and optimal descriptor sets. Int J Mol Sci 2021; 22(15): 8073. doi: 10.3390/ijms22158073 PMID: 34360838
Wang B, Tan X, Guo J, et al. Drug induced immune thrombocytopenia toxicity prediction based on machine learning. Pharmaceutics 2022; 14(5): 943. doi: 10.3390/pharmaceutics14050943 PMID: 35631529
Pomara C, Sessa F, Ciaccio M, et al. Post-mortem findings in vaccine-induced thrombotic thombocytopenia. Haematologica 2021; 106(8): 2291-3. doi: 10.3324/haematol.2021.279075 PMID: 34011138
Fekete G, Fekete L, Ancuceanu R, Ianoși S, Drăgănescu M, Brihan I. Acyclovir induced immune thrombocytopenia: Case report and review of the literature. Exp Ther Med 2020; 20(4): 3417-20. doi: 10.3892/etm.2020.8971 PMID: 32905113
Yap CW. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J Comput Chem 2011; 32(7): 1466-74. doi: 10.1002/jcc.21707 PMID: 21425294
Willighagen E. The chemistry development kit Available from: http://sourceforge.net/projects/cdk (Accessed on 24 August 2022)
Landrum G. The official sources for the rdkit library Available from: https://github.com/rdkit/rdkit (Accessed on 24 August 2022)
Moriwaki H, Tian YS, Kawashita N, Takagi T. Mordred: a molecular descriptor calculator. J Cheminform 2018; 10(1): 4. doi: 10.1186/s13321-018-0258-y PMID: 29411163
Dong J, Cao DS, Miao HY, et al. ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 2015; 7(1): 60. doi: 10.1186/s13321-015-0109-z PMID: 26664458
Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JMC, Finkelstein SN. Missing data in medical databases: Impute, delete or classify? Artif Intell Med 2013; 58(1): 63-72. doi: 10.1016/j.artmed.2013.01.003 PMID: 23428358
García S, Fernández A, Luengo J, Herrera F. A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 2009; 13(10): 959-77. doi: 10.1007/s00500-008-0392-y
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 2020; 21(1): 6. doi: 10.1186/s12864-019-6413-7 PMID: 31898477
Lobo JM, Jiménez-Valverde A, Real R. AUC: a misleading measure of the performance of predictive distribution models. Glob Ecol Biogeogr 2008; 17(2): 145-51. doi: 10.1111/j.1466-8238.2007.00358.x
Todeschini R, Consonni V. Molecular descriptors. Recent Advances in QSAR Studies 2010; pp. 29-102.
Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E. Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des 2006; 12(17): 2111-20. doi: 10.2174/138161206777585274 PMID: 16796559
Cao DS, Xu QS, Hu QN, Liang YZ. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 2013; 29(8): 1092-4. doi: 10.1093/bioinformatics/btt105 PMID: 23493324
Landrum G. Rdkit documentation, Release 1 2013; 4: 1-79.
Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23(19): 2507-17. doi: 10.1093/bioinformatics/btm344
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in python. J Mach Learn Res 2011; 12: 2825-30.
Freedman DA. Statistical models and causal inference: a dialogue with the social sciences. Cambridge University Press 2010.
Sabab Z. Md, Nasrin K, Al Amin B, Tahmina N, Shorif UM. In-depth analysis of machine learning approaches to predict depression. Curr Res Behav Sci 2021; 2(12): 100044.
Pirhaji L, Kargar M, Sheari A, et al. The performances of the chi-square test and complexity measures for signal recognition in biological sequences. J Theor Biol 2008; 251(2): 380-7. doi: 10.1016/j.jtbi.2007.11.021 PMID: 18177672
Wang H, Hu D. Comparison of svm and ls-svm for regression 2005 International conference on neural networks and brain 2005; 1: 279-83. doi: 10.1109/ICNNB.2005.1614615
Sperandei S. Understanding logistic regression analysis. Biochem Med (Zagreb) 2014; 24(1): 12-8. doi: 10.11613/BM.2014.003
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 2003; 43(6): 1947-58. doi: 10.1021/ci034160g PMID: 14632445
Ertuğrul ÖF, Tağluk ME. A novel version of k nearest neighbor: Dependent nearest neighbor. Appl Soft Comput 2017; 55: 480-90. doi: 10.1016/j.asoc.2017.02.020
Nick TG, Campbell KM. Logistic regression. Methods Mol Biol 2007; 404: 273-301. doi: 10.1007/978-1-59745-530-5_14
Butina D. Unsupervised data base clustering based on daylights fingerprint and tanimoto similarity: A fast and automated way to cluster small and large data sets. J Chem Inf Comput Sci 1999; 39(4): 747-50. doi: 10.1021/ci9803381
Chen Y, Cheng F, Sun L, Li W, Liu G, Tang Y. Computational models to predict endocrine-disrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol Environ Saf 2014; 110: 280-7. doi: 10.1016/j.ecoenv.2014.08.026 PMID: 25282305
Van der Maaten L, Hinton G. Visualizing data using t-sne. J Mach Learn Res 2008; 9(11)
Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H. Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Transl Med 2018; 6(11): 216. doi: 10.21037/atm.2018.05.32 PMID: 30023379
Alam W, Tayara H, Chong KT. XG-ac4C: identification of N4-acetylcytidine (ac4C) in mRNA using eXtreme gradient boosting with electron-ion interaction pseudopotentials. Sci Rep 2020; 10(1): 20942. doi: 10.1038/s41598-020-77824-2 PMID: 33262392

补充文件

附件文件

动作

1. JATS XML

下载

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册

Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance

全文:

详细

关键词

作者简介

Osphanie Mentari

Muhammad Shujaat

Hilal Tayara

Kil Chong

参考

补充文件