Recommendations for Bioinformatic Tools in lncRNA Research


Cite item

Full Text

Abstract

Long non-coding RNAs (lncRNAs) typically refer to non-protein coding RNAs that are longer than 200 nucleotides. Historically dismissed as junk DNA, over two decades of research have revealed that lncRNAs bind to other macromolecules (e.g., DNA, RNA, and/or proteins) to modulate signaling pathways and maintain organism viability. Their discovery has been significantly aided by the development of bioinformatics tools in recent years. However, the diversity of tools for lncRNA discovery and functional prediction can present a challenge for researchers, especially bench scientists and clinicians. This Perspective article aims to navigate the current landscape of bioinformatic tools suitable for both protein-coding and lncRNA genes. It aims to provide a guide for bench scientists and clinicians to select the appropriate tools for their research questions and experimental designs.

About the authors

Rebecca Distefano

Department of Biolog, University of Copenhagen

Email: info@benthamscience.net

Mirolyuba Ilieva

Center for RNA Medicine, Department of Clinical Medicine, Aalborg University

Email: info@benthamscience.net

Sarah Rennie

Department of Biology, University of Copenhagen

Author for correspondence.
Email: info@benthamscience.net

Shizuka Uchida

Center for RNA Medicine, Department of Clinical Medicine, Aalborg University

Author for correspondence.
Email: info@benthamscience.net

References

  1. Palazzo AF, Koonin EV. Functional long non-coding RNAs Evolve from junk transcripts. Cell 2020; 183(5): 1151-61. doi: 10.1016/j.cell.2020.09.047 PMID: 33068526
  2. Miller HE, Ilieva M, Bishop AJR, Uchida S. Current status of epitranscriptomic marks affecting lncRNA structures and functions. Noncoding RNA 2022; 8(2): 23. doi: 10.3390/ncrna8020023 PMID: 35447886
  3. Statello L, Guo CJ, Chen LL, Huarte M. Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol 2021; 22(2): 96-118. doi: 10.1038/s41580-020-00315-9 PMID: 33353982
  4. Lee H, Zhang Z, Krause HM. Long noncoding RNAs and repetitive elements: Junk or intimate evolutionary partners? Trends Genet 2019; 35(12): 892-902. doi: 10.1016/j.tig.2019.09.006 PMID: 31662190
  5. Shabalina SA, Spiridonov NA. The mammalian transcriptome and the function of non-coding DNA sequences. Genome Biol 2004; 5(4): 105. doi: 10.1186/gb-2004-5-4-105 PMID: 15059247
  6. Ezkurdia I, Juan D, Rodriguez JM, et al. Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes. Hum Mol Genet 2014; 23(22): 5866-78. doi: 10.1093/hmg/ddu309 PMID: 24939910
  7. Zhao L, Wang J, Li Y, et al. NONCODEV6: an updated database dedicated to long non-coding RNA annotation in both animals and plants. Nucleic Acids Res 2021; 49(D1): D165-71. doi: 10.1093/nar/gkaa1046 PMID: 33196801
  8. Snyder M, Iraola-Guzmán S, Saus E, Gabaldón T. Discovery and validation of clinically relevant long non-coding RNAs in colorectal cancer. Cancers (Basel) 2022; 14(16): 3866. doi: 10.3390/cancers14163866 PMID: 36010859
  9. Chakraborty C, Sharma AR, Sharma G, Lee SS. Therapeutic advances of miRNAs: A preclinical and clinical update. J Adv Res 2021; 28: 127-38. doi: 10.1016/j.jare.2020.08.012 PMID: 33364050
  10. Liang L, He X. A narrative review of microRNA therapeutics: understanding the future of microRNA research. Precis Cancer Med 2021; 4: 33. doi: 10.21037/pcm-21-28
  11. Ponting CP, Haerty W. Genome-wide analysis of human long noncoding RNAs: A provocative review. Annu Rev Genomics Hum Genet 2022; 23(1): 153-72. doi: 10.1146/annurev-genom-112921-123710 PMID: 35395170
  12. Chen Y, Li Z, Chen X, Zhang S. Long non-coding RNAs: From disease code to drug role. Acta Pharm Sin B 2021; 11(2): 340-54. doi: 10.1016/j.apsb.2020.10.001 PMID: 33643816
  13. Galaxy C. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50(W1): W345-51.
  14. Quinn TP, Crowley TM, Richardson MF. Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods. BMC Bioinformatics 2018; 19(1): 274. doi: 10.1186/s12859-018-2261-8 PMID: 30021534
  15. Teng M, Love MI, Davis CA, et al. A benchmark for RNA-seq quantification pipelines. Genome Biol 2016; 17(1): 74. doi: 10.1186/s13059-016-0940-1 PMID: 27107712
  16. Han H, Men K. How does normalization impact RNA-seq disease diagnosis? J Biomed Inform 2018; 85: 80-92. doi: 10.1016/j.jbi.2018.07.016 PMID: 30041017
  17. Baruzzo G, Hayer KE, Kim EJ, Di Camillo B, FitzGerald GA, Grant GR. Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 2017; 14(2): 135-9. doi: 10.1038/nmeth.4106 PMID: 27941783
  18. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34(17): i884-90. doi: 10.1093/bioinformatics/bty560 PMID: 30423086
  19. A quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  20. Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013; 29(1): 15-21. doi: 10.1093/bioinformatics/bts635 PMID: 23104886
  21. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 2019; 37(8): 907-15. doi: 10.1038/s41587-019-0201-4 PMID: 31375807
  22. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013; 14(4): R36. doi: 10.1186/gb-2013-14-4-r36 PMID: 23618408
  23. Cunningham F, Allen JE, Allen J, et al. Ensembl 2022. Nucleic Acids Res 2022; 50(D1): D988-95. doi: 10.1093/nar/gkab1049 PMID: 34791404
  24. Weirick T, Militello G, Uchida S. Long non-coding RNAs in endothelial biology. Front Physiol 2018; 9: 522. doi: 10.3389/fphys.2018.00522 PMID: 29867565
  25. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26(1): 139-40. doi: 10.1093/bioinformatics/btp616 PMID: 19910308
  26. Lawrence M, Huber W, Pagès H, et al. Software for computing and annotating genomic ranges. PLOS Comput Biol 2013; 9(8): e1003118. doi: 10.1371/journal.pcbi.1003118 PMID: 23950696
  27. Ginestet C. Elegant graphics for data analysis. Jroyal stat soc ser A 2011; 174: 245-5.
  28. Distefano R, Ilieva M, Madsen JH, Uchida S, Crohn DB. CrohnDB: A web database for expression profiling of protein-coding and long non-coding RNA genes in crohn disease. Computation (Basel) 2023; 11(6): 105. doi: 10.3390/computation11060105
  29. Distefano R, Ilieva M, Madsen JH, et al. T2DB: A web database for long non-coding RNA genes in type II diabetes. Noncoding RNA 2023; 9(3): 30. doi: 10.3390/ncrna9030030 PMID: 37218990
  30. Ilieva M, Dao J, Miller HE, et al. Systematic analysis of long non-coding RNA genes in nonalcoholic fatty liver disease. Noncoding RNA 2022; 8(4): 56. doi: 10.3390/ncrna8040056 PMID: 35893239
  31. Ilieva M, Miller HE, Agarwal A, et al. FibroDB: Expression analysis of protein-coding and long non-coding RNA genes in fibrosis. Noncoding RNA 2022; 8(1): 13. doi: 10.3390/ncrna8010013 PMID: 35202087
  32. Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA 2020; 26(8): 903-9. doi: 10.1261/rna.074922.120 PMID: 32284352
  33. Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 2012; 28(8): 1086-92. doi: 10.1093/bioinformatics/bts094 PMID: 22368243
  34. Xie Y, Wu G, Tang J, et al. SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 2014; 30(12): 1660-6. doi: 10.1093/bioinformatics/btu077 PMID: 24532719
  35. Shumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLOS Comput Biol 2022; 18(6): e1009730. doi: 10.1371/journal.pcbi.1009730 PMID: 35648784
  36. Grabherr MG, Haas BJ, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011; 29(7): 644-52. doi: 10.1038/nbt.1883 PMID: 21572440
  37. Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022; 23(2): bbab563. doi: 10.1093/bib/bbab563 PMID: 35076693
  38. Hölzer M, Marz M. De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience 2019; 8(5): giz039. doi: 10.1093/gigascience/giz039 PMID: 31077315
  39. Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL. Genomewide characterization of non-polyadenylated RNAs. Genome Biol 2011; 12(2): R16. doi: 10.1186/gb-2011-12-2-r16 PMID: 21324177
  40. Zhang Y, Yang L, Chen LL. Life without A tail: New formats of long noncoding RNAs. Int J Biochem Cell Biol 2014; 54: 338-49. doi: 10.1016/j.biocel.2013.10.009 PMID: 24513732
  41. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009; 4(1): 44-57. doi: 10.1038/nprot.2008.211 PMID: 19131956
  42. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009; 37(1): 1-13. doi: 10.1093/nar/gkn923 PMID: 19033363
  43. Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H. gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler. F1000 Res 2020; 9: ELIXIR-709.
  44. Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res 2019; 47(W1): W234-41. doi: 10.1093/nar/gkz240 PMID: 30931480
  45. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13(11): 2498-504. doi: 10.1101/gr.1239303 PMID: 14597658
  46. Kent WJ, Sugnet CW, Furey TS, et al. The human genome browser at UCSC. Genome Res 2002; 12(6): 996-1006. doi: 10.1101/gr.229102 PMID: 12045153
  47. Jonas K, Calin GA, Pichler M. RNA-binding proteins as important regulators of long non-coding RNAs in cancer. Int J Mol Sci 2020; 21(8): 2969. doi: 10.3390/ijms21082969 PMID: 32340118
  48. Yao ZT, Yang YM, Sun MM, et al. New insights into the interplay between long non‐coding RNAs and RNA‐binding proteins in cancer. Cancer Commun (Lond) 2022; 42(2): 117-40. doi: 10.1002/cac2.12254 PMID: 35019235
  49. López-Urrutia E, Bustamante Montes LP, Ladrón de Guevara Cervantes D, Pérez-Plasencia C, Campos-Parra AD. Crosstalk Between long non-coding RNAs, micro-RNAs and mRNAs: Deciphering molecular mechanisms of master regulators in cancer. Front Oncol 2019; 9: 669. doi: 10.3389/fonc.2019.00669 PMID: 31404273
  50. Furió-Tarí P, Tarazona S, Gabaldón T, Enright AJ, Conesa A. spongeScan: A web for detecting microRNA binding elements in lncRNA sequences. Nucleic Acids Res 2016; 44(W1): W176-80. doi: 10.1093/nar/gkw443 PMID: 27198221
  51. Militello G, Weirick T, John D, Döring C, Dimmeler S, Uchida S. Screening and validation of lncRNAs and circRNAs as miRNA sponges. Brief Bioinform 2017; 18(5): 780-8. PMID: 27373735
  52. Bugnon LA, Edera AA, Prochetto S, et al. Secondary structure prediction of long noncoding RNA: review and experimental comparison of existing approaches. Brief Bioinform 2022; 23(4): bbac205. doi: 10.1093/bib/bbac205 PMID: 35692094
  53. Chillón I, Marcia M. The molecular structure of long non-coding RNAs: emerging patterns and functional implications. Crit Rev Biochem Mol Biol 2020; 55(6): 662-90. doi: 10.1080/10409238.2020.1828259 PMID: 33043695
  54. Vicens Q, Kieft JS. Thoughts on how to think (and talk) about RNA structure. Proc Natl Acad Sci USA 2022; 119(17): e2112677119. doi: 10.1073/pnas.2112677119 PMID: 35439059
  55. Schroeder R, Barta A, Semrad K. Strategies for RNA folding and assembly. Nat Rev Mol Cell Biol 2004; 5(11): 908-19. doi: 10.1038/nrm1497 PMID: 15520810
  56. Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA–protein interaction predicting tool based on a capsule network. BMC Bioinformatics 2021; 22(1): 246. doi: 10.1186/s12859-021-04171-y PMID: 33985444
  57. Peng L, Liu F, Yang J, et al. Probing lncRNA–Protein Interactions: Data Repositories, Models, and Algorithms. Front Genet 2020; 10: 1346. doi: 10.3389/fgene.2019.01346 PMID: 32082358
  58. Pinkney HR, Wright BM, Diermeier SD. The lncRNA toolkit: databases and in silico tools for lncRNA analysis. Noncoding RNA 2020; 6(4): 49. doi: 10.3390/ncrna6040049 PMID: 33339309
  59. Rincón-Riveros A, Morales D, Rodríguez JA, Villegas VE, López-Kleine L. Bioinformatic tools for the analysis and prediction of ncRNA interactions. Int J Mol Sci 2021; 22(21): 11397. doi: 10.3390/ijms222111397 PMID: 34768830
  60. Sun S, Yang J, Zhang Z. RNALigands: a database and web server for RNA–ligand interactions. RNA 2022; 28(2): 115-22. doi: 10.1261/rna.078889.121 PMID: 34732566
  61. Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat Struct Mol Biol 2003; 10(12): 980. doi: 10.1038/nsb1203-980 PMID: 14634627
  62. Morgan BS, Sanaba BG, Donlic A, et al. R-BIND: An interactive database for exploring and developing RNA-targeted chemical probes. ACS Chem Biol 2019; 14(12): 2691-700. doi: 10.1021/acschembio.9b00631 PMID: 31589399
  63. Kalvari I, Nawrocki EP, Ontiveros-Palacios N, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res 2021; 49(D1): D192-200. doi: 10.1093/nar/gkaa1047 PMID: 33211869
  64. Li Z, Liu L, Feng C, et al. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res 2023; 51(D1): D186-91. doi: 10.1093/nar/gkac999 PMID: 36330950
  65. Sweeney BA, Petrov AI, Burkov B, et al. RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res 2019; 47(D1): D221-9. doi: 10.1093/nar/gky1034 PMID: 30395267
  66. Stelzer G, Rosen N, Plaschkes I, et al. The genecards suite: From gene data mining to disease genome sequence analyses. Curr Protoc Bioinformatics 2016; 54: 30-3.
  67. Weirick T, Militello G, Ponomareva Y, et al. Logic programming to infer complex RNA expression patterns from RNA-seq data. Brief Bioinform 2018; 19(2): 199-209. PMID: 28011754
  68. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 2011; 17(1): 3. doi: 10.14806/ej.17.1.200
  69. Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011; 27(6): 863-4. doi: 10.1093/bioinformatics/btr026 PMID: 21278185
  70. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014; 30(15): 2114-20. doi: 10.1093/bioinformatics/btu170 PMID: 24695404
  71. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016; 34(5): 525-7. doi: 10.1038/nbt.3519 PMID: 27043002
  72. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009; 25(9): 1105-11. doi: 10.1093/bioinformatics/btp120 PMID: 19289445
  73. Martin J, Bruno VM, Fang Z, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 2010; 11(1): 663. doi: 10.1186/1471-2164-11-663 PMID: 21106091
  74. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15(12): 550. doi: 10.1186/s13059-014-0550-8 PMID: 25516281
  75. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43(7): e47. doi: 10.1093/nar/gkv007 PMID: 25605792
  76. Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2021; 2(3): 100141. doi: 10.1016/j.xinn.2021.100141 PMID: 34557778
  77. Sergushichev AA. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv 2016; 060012.
  78. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102(43): 15545-50. doi: 10.1073/pnas.0506580102 PMID: 16199517
  79. Baik B, Yoon S, Nam D. Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. PLoS One 2020; 15(4): e0232271. doi: 10.1371/journal.pone.0232271 PMID: 32353015
  80. Łabaj PP, Kreil DP. Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls. Biol Direct 2016; 11(1): 66. doi: 10.1186/s13062-016-0169-7 PMID: 27993156
  81. Bushmanova E, Antipov D, Lapidus A, Suvorov V, Prjibelski AD. rnaQUAST: a quality assessment tool for de novo transcriptome assemblies. Bioinformatics 2016; 32(14): 2210-2. doi: 10.1093/bioinformatics/btw218 PMID: 27153654
  82. Chandramohan R, Wu PY, Phan JH, Wang MD. Benchmarking RNA-Seq quantification tools. Annu Int Conf IEEE Eng Med Biol Soc 2013; 2013: 647-50. PMID: 24109770
  83. Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17(1): 13. doi: 10.1186/s13059-016-0881-8 PMID: 26813401
  84. Moreton J, Izquierdo A, Emes RD. Assembly, Assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 2016; 6: 361. doi: 10.3389/fgene.2015.00361 PMID: 26793234
  85. Ilieva M, Uchida S. Perspectives of LncRNAs for therapy. Cell Biol Toxicol 2022; 38(6): 915-7. doi: 10.1007/s10565-022-09779-1 PMID: 36399196
  86. Pan J, Wang R, Shang F, Ma R, Rong Y, Zhang Y. Functional micropeptides encoded by long non-coding RNAs: A comprehensive review. Front Mol Biosci 2022; 9: 817517. doi: 10.3389/fmolb.2022.817517 PMID: 35769907

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Bentham Science Publishers