Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features


Cite item

Full Text

Abstract

Introduction:Protein ubiquitylation is an important post-translational modification (PTM), which is considered to be one of the most important processes regulating cell function and various diseases. Therefore, accurate prediction of ubiquitylation proteins and their PTM sites is of great significance for the study of basic biological processes and the development of related drugs. Researchers have developed some large-scale computational methods to predict ubiquitylation sites, but there is still much room for improvement. Much of the research related to ubiquitylation is cross-species while the life pattern is diversified, and the prediction method always shows its specificity in practical application. This study just aims at the issue of plants and has constructed computational methods for identifying ubiquitylation protein and ubiquitylation sites.

Method:In this work, we constructed two predictive models to identify plant ubiquitylation proteins and sites. First, in the ubiquitylation proteins prediction model, in order to better reflect protein sequence information and obtain better prediction results, the KNN scoring matrix model based on functional domain Gene Ontology (GO) annotation and word embedding model, i.e. Skip-Gram and Continuous Bag of Words (CBOW), are used to extract the features, and the light gradient boosting machine (LGBM) is selected as the ubiquitylation proteins prediction engine.

Results:As a result, accuracy (ACC), Precision, recall rate (Recall), F1_score and AUC are respectively 85.12%, 80.96%, 72.80%, 76.37% and 0.9193 in the 10-fold cross-validations on independent dataset. In the ubiquitylation sites prediction model, Skip-Gram, CBOW and enhanced amino acid composition (EAAC) feature extraction codes were used to extract protein sequence fragment features, and the predicted results on training and independent test data have also achieved good performance.

Conclusion:In a word, the comparison results demonstrate that our models have a decided advantage in predicting ubiquitylation proteins and sites, and it may provide useful insights for studying the mechanisms and modulation of ubiquitination pathways

About the authors

Meng-Yue Guan

School of Information Engineering, Jingdezhen Ceramic University

Email: info@benthamscience.net

Wang-Ren Qiu

School of Information Engineering, Jingdezhen Ceramic University

Author for correspondence.
Email: info@benthamscience.net

Qian-Kun Wang

School of Information Engineering, Jingdezhen Ceramic University

Email: info@benthamscience.net

Xuan Xiao

School of Information Engineering, Jingdezhen Ceramic University

Author for correspondence.
Email: info@benthamscience.net

References

  1. He D, Li M, Damaris RN, Bu C, Xue J, Yang P. Quantitative ubiquitylomics approach for characterizing the dynamic change and extensive modulation of ubiquitylation in rice seed germination. Plant J 2020; 101(6): 1430-47. doi: 10.1111/tpj.14593 PMID: 31677306
  2. Yadav S, Gupta M, Bist AS. Prediction of ubiquitination sites using UbiNets. Adv Fuzzy Syst 2018; 2018: 1-10. doi: 10.1155/2018/5125103
  3. Xu G, Jaffrey SR. The new landscape of protein ubiquitination. Nat Biotechnol 2011; 29(12): 1098-100. doi: 10.1038/nbt.2061 PMID: 22158364
  4. Starita L, Parvin JD. The multiple nuclear functions of BRCA1: Transcription, ubiquitination and DNA repair. Curr Opin Cell Biol 2003; 15(3): 345-50. doi: 10.1016/S0955-0674(03)00042-5 PMID: 12787778
  5. Park HB, Kim JW, Baek KH. Regulation of Wnt signaling through ubiquitination and deubiquitination in cancers. Int J Mol Sci 2020; 21(11): 3904. doi: 10.3390/ijms21113904 PMID: 32486158
  6. Porro A, Berti M, Pizzolato J, et al. FAN1 interaction with ubiquitylated PCNA alleviates replication stress and preserves genomic integrity independently of BRCA2. Nat Commun 2017; 8(1): 1073. doi: 10.1038/s41467-017-01074-6 PMID: 29051491
  7. Stankovic-Valentin N, Melchior F. Control of SUMO and Ubiquitin by ROS: Signaling and disease implications. Mol Aspects Med 2018; 63: 3-17. doi: 10.1016/j.mam.2018.07.002 PMID: 30059710
  8. Corn JE, Vucic D. Ubiquitin in inflammation: The right linkage makes all the difference. Nat Struct Mol Biol 2014; 21(4): 297-300. doi: 10.1038/nsmb.2808 PMID: 24699077
  9. Tung CW, Ho SY. Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics 2008; 9(1): 310. doi: 10.1186/1471-2105-9-310 PMID: 18625080
  10. Tsuchida S, Satoh M, Takiwaki M, Nomura F. Ubiquitination in periodontal disease: A review. Int J Mol Sci 2017; 18(7): 1476. doi: 10.3390/ijms18071476 PMID: 28698506
  11. Chan CH, Jo U, Kohrman A, et al. Posttranslational regulation of Akt in human cancer. Cell Biosci 2014; 4(1): 59. doi: 10.1186/2045-3701-4-59 PMID: 25309720
  12. Schmidt MF, Gan ZY, Komander D, Dewson G. Ubiquitin signalling in neurodegeneration: Mechanisms and therapeutic opportunities. Cell Death Differ 2021; 28(2): 570-90. doi: 10.1038/s41418-020-00706-7 PMID: 33414510
  13. Yamada T, Murata D, Adachi Y, et al. Mitochondrial stasis reveals p62-mediated ubiquitination in Parkin-independent mitophagy and mitigates nonalcoholic fatty liver disease. Cell Metab 2018; 28(4): 588-604.e5. doi: 10.1016/j.cmet.2018.06.014 PMID: 30017357
  14. Lu D, Lin W, Gao X, et al. Direct ubiquitination of pattern recognition receptor FLS2 attenuates plant innate immunity. Science 2011; 332(6036): 1439-42. doi: 10.1126/science.1204903 PMID: 21680842
  15. Marino D, Peeters N, Rivas S. Ubiquitination during plant immune signaling. Plant Physiol 2012; 160(1): 15-27. doi: 10.1104/pp.112.199281 PMID: 22689893
  16. Li F, Zhang Y, Purcell AW, et al. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics 2019; 20(1): 112. doi: 10.1186/s12859-019-2700-1 PMID: 30841845
  17. Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: Prediction of protein phosphorylation sites with deep learning. Bioinformatics 2019; 35(16): 2766-73. doi: 10.1093/bioinformatics/bty1051 PMID: 30601936
  18. Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites. Bioinformatics 2013; 29(13): 1614-22. doi: 10.1093/bioinformatics/btt196 PMID: 23626001
  19. Qiu W, Xu C, Xiao X, Xu D. Computational prediction of ubiquitination proteins using evolutionary profiles and functional domain annotation. Curr Genomics 2019; 20(5): 389-99. doi: 10.2174/1389202919666191014091250 PMID: 32476995
  20. Qiu WR, Sun BQ, Xiao X, Xu D, Chou KC. iPhos-PseEvo: Identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Mol Inform 2017; 36(5-6): 1600010. doi: 10.1002/minf.201600010 PMID: 28488814
  21. Qiu WR, Xu A, Xu ZC, Zhang CH, Xiao X. Identifying acetylation protein by fusing its PseAAC and functional domain annotation. Front Bioeng Biotechnol 2019; 7: 311. doi: 10.3389/fbioe.2019.00311 PMID: 31867311
  22. Qiu W-R, Wang Q-K, Guan M-Y, Jia J-H, Xiao X. Predicting S-nitrosylation proteins and sites by fusing multiple features. Math Biosci Eng 2021; 18(6): 9132-47. doi: 10.3934/mbe.2021450 PMID: 34814339
  23. Qiu WR, Guan MY, Wang QK, Lou LL, Xiao X. Identifying pupylation proteins and sites by incorporating multiple methods. Front Endocrinol 2022; 13: 849549. doi: 10.3389/fendo.2022.849549 PMID: 35557849
  24. Wang H, Wang Z, Li Z, Lee TY. Incorporating deep learning with word embedding to identify plant ubiquitylation sites. Front Cell Dev Biol 2020; 8: 572195. doi: 10.3389/fcell.2020.572195 PMID: 33102477
  25. Siraj A, Lim DY, Tayara H, Chong KT. Ubicomb: A hybrid deep learning model for predicting plant-specific protein ubiquitylation sites. Genes 2021; 12(5): 717. doi: 10.3390/genes12050717 PMID: 34064731
  26. Yin S, Zheng J, Jia C, Zou Q, Lin Z, Shi H. UPFPSR: A ubiquitylation predictor for plant through combining sequence information and random forest. Math Biosci Eng 2022; 19(1): 775-91. doi: 10.3934/mbe.2022035 PMID: 34903012
  27. Xu H, Zhou J, Lin S, Deng W, Zhang Y, Xue Y. PLMD: An updated data resource of protein lysine modifications. J Genet Genomics 2017; 44(5): 243-50. doi: 10.1016/j.jgg.2017.03.007 PMID: 28529077
  28. Boutet E, Lieberherr D, Tognolli M, et al. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. In: Plant Bioinformatics. Springer 2016; pp. 23-54. doi: 10.1007/978-1-4939-3167-5_2
  29. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv 2013; 2013: 13013781.
  30. Yang KK, Wu Z, Bedbrook CN, Arnold FH, Wren J. Learned protein embeddings for machine learning. Bioinformatics 2018; 34(15): 2642-8. doi: 10.1093/bioinformatics/bty178 PMID: 29584811
  31. Liu B. Text sentiment analysis based on CBOW model and deep learning in big data environment. J Ambient Intell Humaniz Comput 2020; 11(2): 451-8. doi: 10.1007/s12652-018-1095-6
  32. The UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Res 2017; 45(D1): D158-69. doi: 10.1093/nar/gkw1099 PMID: 27899622
  33. Hasan MAM, Ahmad S. mLysPTMpred: Multiple lysine PTM site prediction using combination of SVM with resolving data imbalance issue. Nat Sci 2018; 10(9): 370-84. doi: 10.4236/ns.2018.109035
  34. Wang M, Cui X, Li S, et al. DeepMal: Accurate prediction of protein malonylation sites by deep neural networks. Chemom Intell Lab Syst 2020; 207: 104175. doi: 10.1016/j.chemolab.2020.104175
  35. Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of lysine glutarylation using the AdaBoost classifier. J Proteome Res 2021; 20(1): 191-201. doi: 10.1021/acs.jproteome.0c00314 PMID: 33090794
  36. Manavalan B, Shin TH, Kim MO, Lee G. PIP-EL: A new ensemble learning method for improved proinflammatory peptide predictions. Front Immunol 2018; 9: 1783. doi: 10.3389/fimmu.2018.01783 PMID: 30108593
  37. Li F, Chen J, Ge Z, et al. Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief Bioinform 2021; 22(2): 2126-40. doi: 10.1093/bib/bbaa049 PMID: 32363397
  38. Xie R, Li J, Wang J, et al. DeepVF: A deep learning-based hybrid framework for identifying virulence factors using the stacking strategy. Brief Bioinform 2021; 22(3): bbaa125. doi: 10.1093/bib/bbaa125 PMID: 32599617
  39. Tian L, Feng L, Yang L, Guo Y. Stock price prediction based on LSTM and LightGBM hybrid model. J Supercomput 2022; 78(9): 11768-93. doi: 10.1007/s11227-022-04326-5
  40. Liu Y, Yu Z, Chen C, Han Y, Yu B. Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal Biochem 2020; 609: 113903. doi: 10.1016/j.ab.2020.113903 PMID: 32805274
  41. Zhou K, Hu Y, Pan H, et al. Fast prediction of reservoir permeability based on embedded feature selection and LightGBM using direct logging data. Meas Sci Technol 2020; 31(4): 045101. doi: 10.1088/1361-6501/ab4a45
  42. Chen C, Zhang Q, Ma Q, Yu B, Light GBM-PPI. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom Intell Lab Syst 2019; 191: 54-64. doi: 10.1016/j.chemolab.2019.06.003
  43. Liang W, Luo S, Zhao G, Wu H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 2020; 8(5): 765. doi: 10.3390/math8050765
  44. Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 2003; 31(13): 3692-7. doi: 10.1093/nar/gkg600 PMID: 12824396
  45. Zavaljevski N, Stevens FJ, Reifman J. Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics 2002; 18(5): 689-96. doi: 10.1093/bioinformatics/18.5.689 PMID: 12050065
  46. Gordon AD, Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Biometrics 1984; 40(3): 874. doi: 10.2307/2530946
  47. Boulesteix AL, Janitza S, Kruppa J, König IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Discov 2012; 2(6): 493-507. doi: 10.1002/widm.1072
  48. Ahmad MW, Mourshed M, Rezgui Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build 2017; 147: 77-89. doi: 10.1016/j.enbuild.2017.04.038
  49. Noble WS. What is a support vector machine? Nat Biotechnol 2006; 24(12): 1565-7. doi: 10.1038/nbt1206-1565 PMID: 17160063
  50. Cui D, Curry D. Prediction in marketing using the support vector machine. Mark Sci 2005; 24(4): 595-615. doi: 10.1287/mksc.1050.0123
  51. Tong S, Chang E. Support vector machine active learning for image retrieval. Proceedings of the ninth ACM international conference on Multimedia. Ottawa, Ontario, Canada. 2001; pp. 107-8. doi: 10.1145/500141.500159
  52. Wang D, Liang Y, Xu D. Capsule network for protein post-translational modification site prediction. Bioinformatics 2019; 35(14): 2386-94. doi: 10.1093/bioinformatics/bty977 PMID: 30520972
  53. Xu H, Jia P, Zhao Z. Deep4mC: Systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning. Brief Bioinform 2021; 22(3): bbaa099. doi: 10.1093/bib/bbaa099 PMID: 32578842
  54. Soliman NF, Abd Alhalem SM, El-Shafai W, et al. Bidirectional long short-term memory network for taxonomic classification. Intell Autom Soft Comput 2022; 33(1): 103-16. doi: 10.32604/iasc.2022.017691
  55. Graves A. Long short-term memory. In: Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Heidelberg: Springer 2012; 385: pp. 37-45. doi: 10.1007/978-3-642-24797-2_4
  56. Qiao Y, Zhu X, Gong H. BERT-Kcr: Prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models. Bioinformatics 2022; 38(3): 648-54. doi: 10.1093/bioinformatics/btab712 PMID: 34643684
  57. Xiao X, Shao YT, Cheng X, Stamatovic B. iAMP-CA2L: A new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types. Brief Bioinform 2021; 22(6): bbab209. doi: 10.1093/bib/bbab209 PMID: 34086856
  58. Chen W, Chen G, Zhao L, Chen CYC. Predicting drug–target interactions with deep-embedding learning of graphs and sequences. J Phys Chem A 2021; 125(25): 5633-42. doi: 10.1021/acs.jpca.1c02419 PMID: 34142824

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Bentham Science Publishers