Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data


Cite item

Full Text

Abstract

:In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data’s volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.

About the authors

Chandrashekar K.

Department of Biotechnology, RV College of Engineering, Bengaluru-560059, affiliated to Visvesvaraya Technological University (VTU)

Email: info@benthamscience.net

Vidya Niranjan

Department of Biotechnology, RV College of Engineering, affiliated to Visvesvaraya Technological University (VTU)

Author for correspondence.
Email: info@benthamscience.net

Adarsh Vishal

Centre of Excellence Computational Genomics, RV College of Engineering

Email: info@benthamscience.net

Anagha Setlur

Department of Biotechnology, RV College of Engineering, Bengaluru, 560059, Affiliated to Visvesvaraya Technological University (VTU)

Email: info@benthamscience.net

References

  1. Ki CS. Recent advances in the clinical application of next-generation sequencing. Pediatr Gastroenterol Hepatol Nutr 2021; 24(1): 1-6. doi: 10.5223/pghn.2021.24.1.1 PMID: 33505888
  2. D’Agaro E. New advances in NGS technologies. Intechopen 2017. doi: 10.5772/66924
  3. Satam H, Joshi K, Mangrolia U, et al. Next-generation sequencing technology: Current trends and advancements. Biology 2023; 12(7): 997. doi: 10.3390/biology12070997 PMID: 37508427
  4. Kumar S, Banks TW, Cloutier S. SNP discovery through next-generation sequencing and its applications. Int J Plant Genomics 2012; 2012: 1-15. doi: 10.1155/2012/831460 PMID: 23227038
  5. Kim S, Misra A. SNP genotyping: Technologies and biomedical applications. Annu Rev Biomed Eng 2007; 9(1): 289-320. doi: 10.1146/annurev.bioeng.9.060906.152037 PMID: 17391067
  6. Kumar A, Rajendran V, Sethumadhavan R, Shukla P, Tiwari S, Purohit R. Computational SNP analysis: Current approaches and future prospects. Cell Biochem Biophys 2014; 68(2): 233-9. doi: 10.1007/s12013-013-9705-6 PMID: 23852834
  7. He B, Wang L, Wu Q, et al. Clinical application of NGS-based SNP haplotyping for PGT-M of methylmalonic acidemia. Syst Biol Reprod Med 2022; 68(1): 80-8. doi: 10.1080/19396368.2021.2005718 PMID: 34913786
  8. Anaparthy N, Ho YJ, Martelotto L, Hammell M, Hicks J. Single-cell applications of next-generation sequencing. Cold Spring Harb Perspect Med 2019; 9(10): a026898. doi: 10.1101/cshperspect.a026898 PMID: 30617056
  9. Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of deep learning on single-cell RNA sequencing data analysis: A review. Genomics Proteomics Bioinform 2022; 20(5): 814-35. doi: 10.1016/j.gpb.2022.11.011 PMID: 36528240
  10. Jovic D, Liang X, Zeng H, Lin L, Xu F, Luo Y. Single‐cell RNA sequencing technologies and applications: A brief overview. Clin Transl Med 2022; 12(3): e694. doi: 10.1002/ctm2.694 PMID: 35352511
  11. Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12(7): 878. doi: 10.15252/msb.20156651 PMID: 27474269
  12. Liu J, Li J, Wang H, Yan J. Application of deep learning in genomics. Sci China Life Sci 2020; 63(12): 1860-78. doi: 10.1007/s11427-020-1804-5 PMID: 33051704
  13. Shen X, Jiang C, Wen Y, Li C, Lu Q. A brief review on deep learning applications in genomic studies. Front Sys Biol 2022; 2: 877717. doi: 10.3389/fsysb.2022.877717
  14. ENA Browser . Available from: https://www.ebi.ac.uk/ena/browser/home (Cited 2023 Sep 22).
  15. Home - SRA - NCBI Available from: https://www.ncbi.nlm.nih.gov/sra (Cited 2023 Sep 22).
  16. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (Cited 2023 Sep 22).
  17. Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17(1): 13. doi: 10.1186/s13059-016-0881-8 PMID: 26813401
  18. Pedersen BS, Bhetariya PJ, Brown J, et al. Somalier: Rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med 2020; 12(1): 62. doi: 10.1186/s13073-020-00761-2 PMID: 32664994
  19. Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014; 30(15): 2114-20. doi: 10.1093/bioinformatics/btu170 PMID: 24695404
  20. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 2011; 17(1): 10-2. doi: 10.14806/ej.17.1.200
  21. FASTX-Toolkit Available from: http://hannonlab.cshl.edu/fastx_toolkit/ (Cited 2023 Sep 22).
  22. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9(4): 357-9. doi: 10.1038/nmeth.1923 PMID: 22388286
  23. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 2019; 37(8): 907-15. doi: 10.1038/s41587-019-0201-4 PMID: 31375807
  24. Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013; 29(1): 15-21. doi: 10.1093/bioinformatics/bts635 PMID: 23104886
  25. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25(14): 1754-60. doi: 10.1093/bioinformatics/btp324 PMID: 19451168
  26. Musich R, Cadle-Davidson L, Osier MV. Comparison of short-read sequence aligners indicates strengths and weaknesses for biologists to consider. Front Plant Sci 2021; 12: 657240. doi: 10.3389/fpls.2021.657240 PMID: 33936141
  27. Niranjan V. Investigation and identification of somatic and germline variants for colorectal cancer exomes using the NG 2023. Available from: https://www.protocols.io/view/investigation-and-identification-of-somatic-and-ge-cukwwuxe (Cited 2023 Sep 22).
  28. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25(16): 2078-9. doi: 10.1093/bioinformatics/btp352 PMID: 19505943
  29. Yang L. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2014; 30(7): 929-30.
  30. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics 2015; 31(12): 2032-4. doi: 10.1093/bioinformatics/btv098 PMID: 25697820
  31. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010; 20(9): 1297-303. doi: 10.1101/gr.107524.110 PMID: 20644199
  32. Picard Tools - By Broad Institute. Available from: http://broadinstitute.github.io/picard/ (Cited 2023 Sep 22).
  33. McLaren W, Gil L, Hunt SE, et al. The ensembl variant effect predictor. Genome Biol 2016; 17(1): 122. doi: 10.1186/s13059-016-0974-4 PMID: 27268795
  34. Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010; 38(16): e164. doi: 10.1093/nar/gkq603 PMID: 20601685
  35. Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics 2011; 27(15): 2156-8. doi: 10.1093/bioinformatics/btr330 PMID: 21653522
  36. Kopanos C, Tsiolkas V, Kouris A, et al. VarSome: The human genomic variant search engine. Bioinformatics 2019; 35(11): 1978-80. doi: 10.1093/bioinformatics/bty897 PMID: 30376034
  37. Tang Z, Kang B, Li C, Chen T, Zhang Z. GEPIA2: An enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res 2019; 47(W1): W556-60. doi: 10.1093/nar/gkz430 PMID: 31114875
  38. Jang Y, Seo J, Jang I, Lee B, Kim S, Lee S. CaPSSA: Visual evaluation of cancer biomarker genes for patient stratification and survival analysis using mutation and expression data. Bioinformatics 2019; 35(24): 5341-3. doi: 10.1093/bioinformatics/btz516 PMID: 31228188
  39. Padmavathi P, Setlur AS, Chandrashekar K, Niranjan V. A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types. Inform Med Unlocked 2021; 26: 100762. doi: 10.1016/j.imu.2021.100762
  40. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15(12): 550. doi: 10.1186/s13059-014-0550-8 PMID: 25516281
  41. Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26(1): 139-40. doi: 10.1093/bioinformatics/btp616 PMID: 19910308
  42. Pereira WJ, Almeida FM, Conde D, et al. Asc-Seurat: Analytical single-cell Seurat-based web application. BMC Bioinformatics 2021; 22(1): 556. doi: 10.1186/s12859-021-04472-2 PMID: 34794383
  43. Wolf FA, Angerer P, Theis FJ. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol 2018; 19(1): 15. doi: 10.1186/s13059-017-1382-0 PMID: 29409532
  44. Gao J. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 2(5): 401-4.
  45. Shihao S. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Biol Sci 2014.
  46. Trincado JL, Entizne JC, Hysenaj G, et al. SUPPA2: Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol 2018; 19(1): 40. doi: 10.1186/s13059-018-1417-1 PMID: 29571299
  47. Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010; 7(12): 1009-15. doi: 10.1038/nmeth.1528 PMID: 21057496
  48. Ewels P, Magnusson M, Lundin S, Käller M, Multi QC. Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016; 32(19): 3047-8. doi: 10.1093/bioinformatics/btw354 PMID: 27312411
  49. García-Alcalde F, Okonechnikov K, Carbonell J, et al. Qualimap: Evaluating next-generation sequencing alignment data. Bioinformatics 2012; 28(20): 2678-9. doi: 10.1093/bioinformatics/bts503 PMID: 22914218
  50. Lassmann T. SAMStat 2: Quality control for next generation sequencing data. Bioinformatics 2023; 39(1): btad019. doi: 10.1093/bioinformatics/btad019 PMID: 36637208
  51. Chen S, Zhou Y, Chen Y, Gu J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34(17): i884-90. doi: 10.1093/bioinformatics/bty560 PMID: 30423086
  52. Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011; 27(6): 863-4. doi: 10.1093/bioinformatics/btr026 PMID: 21278185
  53. BaseSpace Sequence Hub ⋅ Cloud-based genomic data management. Available from: https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub.html (Cited 2023 Sep 24).
  54. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013; 14(4): R36. doi: 10.1186/gb-2013-14-4-r36 PMID: 23618408
  55. Gonzalez MA, Lebrigio RFA, Van Booven D, et al. GEnomes Management Application (GEM.app): A new software tool for large-scale collaborative genome analysis. Hum Mutat 2013; 34(6): 842-6. doi: 10.1002/humu.22305 PMID: 23463597
  56. Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34(18): 3094-100. doi: 10.1093/bioinformatics/bty191 PMID: 29750242
  57. Wu TD, Watanabe CK. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005; 21(9): 1859-75. doi: 10.1093/bioinformatics/bti310 PMID: 15728110
  58. Hamada M, Ono Y, Asai K, Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics 2017; 33(6): 926-8. doi: 10.1093/bioinformatics/btw742 PMID: 28039163
  59. Srivastava A, Sarkar H, Gupta N, Patro R. RapMap: A rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 2016; 32(12): i192-200. doi: 10.1093/bioinformatics/btw277 PMID: 27307617
  60. Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res 2002; 12(4): 656-64. PMID: 11932250
  61. Colak D, Al-Harazi O, Mustafa OM, et al. RNA-Seq transcriptome profiling in three liver regeneration models in rats: Comparative analysis of partial hepatectomy, ALLPS, and PVL. Sci Rep 2020; 10(1): 5213. doi: 10.1038/s41598-020-61826-1 PMID: 32251301
  62. Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50(W1): W345–51.
  63. BioBam OmicsBox Bioinformatics Software Available from: https://www.biobam.com/omicsbox/ (Cited 2023 Sep 24).
  64. Bioinformatics Software ⋅ QIAGEN Digital Insights. Home - QIAGEN Digital Insights. Available from: https://digitalinsights.qiagen.com/(Cited 2023 Sep 24).
  65. Okonechnikov K, Golosova O, Fursov M. Unipro UGENE: A unified bioinformatics toolkit. Bioinformatics 2012; 28(8): 1166-7. doi: 10.1093/bioinformatics/bts091 PMID: 22368248
  66. Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 2012; 6(2): 80-92. doi: 10.4161/fly.19695 PMID: 22728672
  67. Hinrichs AS, Raney BJ, Speir ML, et al. UCSC data integrator and variant annotation integrator. Bioinformatics 2016; 32(9): 1430-2. doi: 10.1093/bioinformatics/btv766 PMID: 26740527
  68. Bendl J, Stourac J, Salanda O, et al. PredictSNP: Robust and accurate consensus classifier for prediction of disease-related mutations. PLOS Comput Biol 2014; 10(1): e1003440. doi: 10.1371/journal.pcbi.1003440 PMID: 24453961
  69. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 2003; 31(13): 3812-4. doi: 10.1093/nar/gkg509 PMID: 12824425
  70. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods 2010; 7(4): 248-9. doi: 10.1038/nmeth0410-248 PMID: 20354512
  71. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43(7): e47. doi: 10.1093/nar/gkv007 PMID: 25605792
  72. Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012; 7(3): 562-78. doi: 10.1038/nprot.2012.016 PMID: 22383036
  73. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 2015; 33(3): 290-5. doi: 10.1038/nbt.3122 PMID: 25690850
  74. Dingerdissen HM, Bastian F, Vijay-Shanker K. Robinson-Rechavi M, Bell A, Gogate N. OncoMX: A knowledgebase for exploring cancer biomarkers in the context of related cancer and healthy data. JCO Clin Cancer Inform 2020; 6: 00117. doi: 10.1200/CCI.19.00117
  75. Ardabili S, Mosavi A, Ghamisi P, et al. COVID-19 outbreak prediction with machine learning. Algorithms 2020; 13(10): 249. doi: 10.3390/a13100249
  76. Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comp Sci 2021; 2(3): 160. doi: 10.1007/s42979-021-00592-x PMID: 33778771
  77. Hammoudeh A. A Concise Introduction to Reinforcement Learning 2018. Available from: https://www.researchgate.net/publication/323178749_A_Concise_Introduction_to_Reinforcement_Learning
  78. Rong S, Bao-wen Z. The research of regression model in machine learning field. MATEC Web Conf 2018; 176(3): 01033. doi: 10.1051/matecconf/201817601033
  79. le Cessie S, van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C 1992; 41(1): 191-201.
  80. Angelis D, Sofos F, Karakasidis TE. Artificial intelligence in physical sciences: Symbolic regression trends and perspectives. Arch Comput Methods Eng 2023; 30(6): 3845-65. doi: 10.1007/s11831-023-09922-z PMID: 37359747
  81. Zeng P, Song X, Lensen A, Ou Y, Sun Y, Zhang M. Differentiable genetic programming for high-dimensional symbolic regression. arxiv 2023; 2023: 08915.
  82. Patel H, Prajapati P. Study and analysis of decision tree based classification algorithms. Int J Comput Sci Eng 2018; 6: 74-8.
  83. Evgeniou T, Pontil M. Support Vector Machines: Theory and Applications. Springer 2001.
  84. Zhang Y. Support vector machine classification algorithm and its application. In: Information Computing and Applications. Berlin, Heidelberg: Springer 2012. doi: 10.1007/978-3-642-34041-3_27
  85. Zhang Y, Zhu Y, Lin S, Liu X. Application of least squares support vector machine in fault diagnosis. In: Information Computing and Applications. Berlin, Heidelberg: Springer 2011. doi: 10.1007/978-3-642-27452-7_26
  86. Webb GI. Naïve bayes. In: Sammut C, Webb GI, Eds Encyclopedia of Machine Learning. Boston, MA: Springer US 2010; pp. 713-4. doi: 10.1007/978-0-387-30164-8_576
  87. Rao Jetti C, Shaik R, Shaik S. Disease prediction using naïve bayes - Machine learning algorithm. Int J Sci Healthcare Res 2021; 6(4): 17-22. doi: 10.52403/ijshr.20211004
  88. Taunk K, De S, Verma S, Swetapadma A. A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Madurai, India 15-17 May,.. 2019; pp. 1255-60. doi: 10.1109/ICCS45141.2019.9065747
  89. Uddin S, Haque I, Lu H, Moni MA, Gide E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep 2022; 12(1): 6256. doi: 10.1038/s41598-022-10358-x PMID: 35428863
  90. Ahmed M, Seraj R, Islam SMS. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020; 9(8): 1295. doi: 10.3390/electronics9081295
  91. Li Y, Wu H. A clustering method based on K-means algorithm. Phys Procedia 2012; 25: 1104-9. doi: 10.1016/j.phpro.2012.03.206
  92. Georgiou DN, Karakasidis TE, Megaritis AC. A short survey on genetic sequences, chou’s pseudo amino acid composition and its combination with fuzzy set theory. Open Bioinform J 2013; 7(1): 41-8. doi: 10.2174/1875036201307010041
  93. Ali J, Khan R, Ahmad N, Maqsood I. Random forests and decision trees. IJCSI 2012; p. 9.
  94. Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32. doi: 10.1023/A:1010933404324
  95. Pellegrino E, Jacques C, Beaufils N, et al. Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci Rep 2021; 11(1): 21820. doi: 10.1038/s41598-021-01253-y PMID: 34750410
  96. Sarica A, Cerasa A, Quattrone A. Random forest algorithm for the classification of neuroimaging data in alzheimer’s disease: A systematic review. Front Aging Neurosci 2017; 9: 329. doi: 10.3389/fnagi.2017.00329 PMID: 29056906
  97. Aziz N, Akhir E, Aziz APDI, Jaafar J, Hasan MH, Abas A. A study on gradient boosting algorithms for development of AI monitoring and prediction systems. In: International Conference on Computational Intelligence (ICCI). Bandar Seri Iskandar, Malaysia. , 08-09 Oct, 2020. doi: 10.1109/ICCI51257.2020.924784
  98. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot 2013; 7: 21. doi: 10.3389/fnbot.2013.00021 PMID: 24409142
  99. Otchere DA, Ganat TOA, Ojero JO, Tackie-Otoo BN, Taki MY. Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. J Petrol Sci Eng 2022; 208: 109244. doi: 10.1016/j.petrol.2021.109244
  100. Howley T, Madden M, O’Connell ML, Ryder A. The effect of principal component analysis on machine learning accuracy with high dimensional spectral data. In: Knowledge-Based Systems. Elsevier 2005.
  101. Mishra S, Sarkar U, Taraphder S, Datta S, Swain D, Saikhom R. Principal component analysis. Int J Livest Res 2017; 1.
  102. Salem N, Hussein S. Data dimensional reduction and principal components analysis. Procedia Comput Sci 2019; 163: 292-9. doi: 10.1016/j.procs.2019.12.111
  103. Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nat Commun 2019; 10(1): 5416. doi: 10.1038/s41467-019-13056-x PMID: 31780648
  104. Pezoulas VC, Hazapis O, Lagopati N, et al. Machine learning approaches on high throughput NGS data to unveil mechanisms of function in biology and disease. Can Genom Proteom 2021; 18(5): 605-26. doi: 10.21873/cgp.20284 PMID: 34479914
  105. Grossi E, Buscema M. Introduction to artificial neural networks. Eur J Gastroenterol Hepatol 2007; 19(12): 1046-54. doi: 10.1097/MEG.0b013e3282f198a0 PMID: 17998827
  106. Madhiarasan M, Louzazni M. Analysis of artificial neural network: Architecture, types, and forecasting applications. J Electr Comput Eng 2022; 2022: 1-23. doi: 10.1155/2022/5416722
  107. Han SH, Kim KW, Kim S, Youn YC. Artificial neural network: Understanding the basic concepts without mathematics. Dement Neurocognit Disord 2018; 17(3): 83-9. doi: 10.12779/dnd.2018.17.3.83 PMID: 30906397
  108. Artificial Neural Networks Advantages and Disadvantages 2018. Available from: https://www.linkedin.com/pulse/artificial-neural-networks-advantages-disadvantages-maad-m-mijwel
  109. Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J Big Data 2021; 8(1): 53. doi: 10.1186/s40537-021-00444-8 PMID: 33816053
  110. Tang B, Pan Z, Yin K, Khateeb A. Recent advances of deep learning in bioinformatics and computational biology. Front Genet 2019; 10: 214. doi: 10.3389/fgene.2019.00214 PMID: 30972100
  111. Indolia S, Goswami AK, Mishra SP, Asopa P. Conceptual understanding of convolutional neural network- A deep learning approach. Procedia Comput Sci 2018; 132: 679-88. doi: 10.1016/j.procs.2018.05.069
  112. O’Shea K, Nash R. An introduction to convolutional neural networks. ArXiv 2015; 2015.
  113. Kaur M, Mohta A. A review of deep learning with recurrent neural network. In: International Conference on Smart Systems and Inventive Technology (ICSSIT). Tirunelveli, India. 27-29 Nov, 2019. doi: 10.1109/ICSSIT46314.2019.8987837
  114. Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 2020; 404: 132306. doi: 10.1016/j.physd.2019.132306
  115. Abdel-Nasser Sharkawy Principle of neural network and its main types. Review J Adv Appl Comput Math 2020; 7: 8-19. doi: 10.15377/2409-5761.2020.07.2
  116. Poplin R, Chang PC, Alexander D, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 2018; 36(10): 983-7. doi: 10.1038/nbt.4235 PMID: 30247488
  117. Ravasio V, Ritelli M, Legati A, Giacopuzzi E. GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS. Bioinformatics 2018; 34(17): 3038-40. doi: 10.1093/bioinformatics/bty303 PMID: 29668842
  118. Khazeeva G, Sablauskas K, van der Sanden B, et al. DeNovoCNN: A deep learning approach to de novo variant calling in next generation sequencing data. Nucleic Acids Res 2022; 50(17): e97. doi: 10.1093/nar/gkac511 PMID: 35713566
  119. Sahraeian SME, Liu R, Lau B, Podesta K, Mohiyuddin M, Lam HYK. Deep convolutional neural networks for accurate somatic mutation detection. Nat Commun 2019; 10(1): 1041. doi: 10.1038/s41467-019-09027-x PMID: 30833567
  120. Yang X, Xu X, Breuss MW, Antaki D, Ball LL, Chung C. DeepMosaic: Control-independent mosaic single nucleotide variant detection using deep convolutional neural networks bioRxiv 2021; 2021; 382473.
  121. Cai L, Wu Y, Gao J, Deep SV. Accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network. BMC Bioinformatics 2019; 20(1): 665. doi: 10.1186/s12859-019-3299-y PMID: 31830921
  122. Zhou Y, Peng M, Yang B, Tong T, Zhang B, Tang N. scDLC: A deep learning framework to classify large sample single-cell RNA-seq data. BMC Genomics 2022; 23(1): 504. doi: 10.1186/s12864-022-08715-1 PMID: 35831808
  123. Luo R, Wong CL, Wong YS, et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat Mach Intell 2020; 2(4): 220-7. doi: 10.1038/s42256-020-0167-4
  124. Singh A, Bhatia P. Intelli-NGS: Intelligent NGS, a deep neural network-based artificial intelligence to delineate good and bad variant calls from IonTorrent sequencer data bioRxiv 2019; 2019; 879403. doi: 10.1101/2019.12.17.879403
  125. Gupta G, Saini S. DAVI: Deep learning-based tool for alignment and single nucleotide variant identification. Mach Learn: Sci Technol 2020; 1(2): 025013. doi: 10.1088/2632-2153/ab7e19
  126. Grønning AGB, Doktor TK, Larsen SJ, et al. DeepCLIP: Predicting the effect of mutations on protein-RNA binding with deep learning. Nucleic Acids Res 2020; 48(13): gkaa530. doi: 10.1093/nar/gkaa530 PMID: 32558887
  127. Uhl M, Tran VD, Heyl F, Backofen R. RNAProt: An efficient and feature-rich RNA binding protein binding site predictor. Gigascience 2021; 10(8): giab054. doi: 10.1093/gigascience/giab054 PMID: 34406415
  128. Kanzi AM, San JE, Chimukangara B, et al. Next generation sequencing and bioinformatics analysis of family genetic inheritance. Front Genet 2020; 11: 544162. doi: 10.3389/fgene.2020.544162 PMID: 33193618
  129. Williams AG, Thomas S, Wyman SK, Holloway AK. RNA-seq Data: Challenges in and recommendations for experimental design and analysis. Curr Protoc Hum Genet 2014; 83: 11.13.1-11.13.20
  130. Ozsolak F, Milos PM. RNA sequencing: Advances, challenges and opportunities. Nat Rev Genet 2011; 12(2): 87-98. doi: 10.1038/nrg2934 PMID: 21191423
  131. Han Y, Gao S, Muegge K, Zhang W, Zhou B. Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 2015; 9s1(Suppl. 1): S28991. doi: 10.4137/BBI.S28991 PMID: 26609224
  132. van Vliet AHM. Next generation sequencing of microbial transcriptomes: Challenges and opportunities. FEMS Microbiol Lett 2010; 302(1): 1-7. doi: 10.1111/j.1574-6968.2009.01767.x PMID: 19735299
  133. Katta MAVSK, Khan AW, Doddamani D, Thudi M, Varshney RK. NGS-QCbox and raspberry for parallel, automated and rapid quality control analysis of large-scale next generation sequencing (Illumina) data. PLoS One 2015; 10(10): e0139868. doi: 10.1371/journal.pone.0139868 PMID: 26460497
  134. Allen JM, Huang DI, Cronk QC, Johnson KP. aTRAM - automated target restricted assembly method: A fast method for assembling loci across divergent taxa from next-generation sequencing data. BMC Bioinformatics 2015; 16(1): 98. doi: 10.1186/s12859-015-0515-2 PMID: 25887972
  135. Schmidt B, Hildebrandt A. Deep learning in next-generation sequencing. Drug Discov Today 2021; 26(1): 173-80. doi: 10.1016/j.drudis.2020.10.002 PMID: 33059075
  136. Kukurba KR, Montgomery SB. RNA sequencing and analysis Cold Spring Harb Protoc 2015; 2015(11): pdb.top084970.. doi: 10.1101/pdb.top084970 PMID: 25870306
  137. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med 2017; 9(1): 75. doi: 10.1186/s13073-017-0467-4 PMID: 28821273
  138. Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 2011; 27(13): i383-91. doi: 10.1093/bioinformatics/btr247 PMID: 21685096
  139. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21(1): 30. doi: 10.1186/s13059-020-1935-5 PMID: 32033565
  140. Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16(1): 26. doi: 10.1186/s40246-022-00396-x PMID: 35879805
  141. Rukhsar L, Bangyal WH, Ali Khan MS, Ag Ibrahim AA, Nisar K, Rawat DB. Analyzing RNA-seq gene expression data using deep learning approaches for cancer classification. Appl Sci 2022; 12(4): 1850. doi: 10.3390/app12041850
  142. Schmauch B, Romagnoni A, Pronier E, et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat Commun 2020; 11(1): 3877. doi: 10.1038/s41467-020-17678-4 PMID: 32747659

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Bentham Science Publishers