Vol 19, No 1 (2024)

Life Sciences

pages 1-2 views

Translation of Circular RNAs: Functions of Translated Products and Related Bioinformatics Approaches

Hwang J., Kook T., Paulus S., Park J.

Abstract

Over the past two decades, studies have discovered a special form of alternative splicing (AS) that produces a circular form of RNA. This stands in contrast to normal AS, which produces a linear form of RNA. Although these circRNAs have garnered considerable attention in the scientific community for their biogenesis and functions, the focus of these studies has been on the regulatory role of circRNAs with the assumption that circRNAs are non-coding. As non-coding RNAs, they may regulate mRNA transcription, tumor initiation, and translation by sponging miRNAs and RNA-binding proteins (RBPs). In addition to these regulatory roles of circRNAs, however, recent studies have provided strong evidence for their translation. The translation of circRNAs is expected to have an important role in promoting cancer cell growth and activating molecular pathways related to cancer development. In some cases, the translation of circRNAs is shown to be efficiently driven by an internal ribosome entry site (IRES). The development of a computational tool for identifying and characterizing the translation of circRNAs using high-throughput sequencing and IRES increases identifiable proteins translated from circRNAs. In turn, it has a substantial impact on helping researchers understand the functional role of proteins derived from circRNAs. New web resources for aggregating, cataloging, and visualizing translational information of circRNAs derived from previous studies have been developed. In this paper, general concepts of circRNA, circRNA biogenesis, translation of circRNA, and existing circRNA tools and databases are summarized to provide new insight into circRNA studies.

Current Bioinformatics. 2024;19(1):3-13
pages 3-13 views

Recommendations for Bioinformatic Tools in lncRNA Research

Distefano R., Ilieva M., Rennie S., Uchida S.

Abstract

Long non-coding RNAs (lncRNAs) typically refer to non-protein coding RNAs that are longer than 200 nucleotides. Historically dismissed as junk DNA, over two decades of research have revealed that lncRNAs bind to other macromolecules (e.g., DNA, RNA, and/or proteins) to modulate signaling pathways and maintain organism viability. Their discovery has been significantly aided by the development of bioinformatics tools in recent years. However, the diversity of tools for lncRNA discovery and functional prediction can present a challenge for researchers, especially bench scientists and clinicians. This Perspective article aims to navigate the current landscape of bioinformatic tools suitable for both protein-coding and lncRNA genes. It aims to provide a guide for bench scientists and clinicians to select the appropriate tools for their research questions and experimental designs.

Current Bioinformatics. 2024;19(1):14-20
pages 14-20 views

Computational Methods for Functional Characterization of lncRNAS in Human Diseases: A Focus on Co-Expression Networks

Jha P., Barbeiro M., Lupieri A., Aikawa E., Uchida S., Aikawa M.

Abstract

Treatment of many human diseases involves small-molecule drugs.Some target proteins, however, are not druggable with traditional strategies. Innovative RNA-targeted therapeutics may overcome such a challenge. Long noncoding RNAs (lncRNAs) are transcribed RNAs that do not translate into proteins. Their ability to interact with DNA, RNA, microRNAs (miRNAs), and proteins makes them an interesting target for regulating gene expression and signaling pathways.In the past decade, a catalog of lncRNAs has been studied in several human diseases. One of the challenges with lncRNA studies include their lack of coding potential, making, it difficult to characterize them in wet-lab experiments functionally. Several computational tools have thus been designed to characterize functions of lncRNAs centered around lncRNA interaction with proteins and RNA, especially miRNAs. This review comprehensively summarizes the methods and tools for lncRNA-RNA interactions and lncRNA-protein interaction prediction.We discuss the tools related to lncRNA interaction prediction using commonlyused models: ensemble-based, machine-learning-based, molecular-docking and network-based computational models. In biology, two or more genes co-expressed tend to have similar functions. Coexpression network analysis is, therefore, one of the most widely-used methods for understanding the function of lncRNAs. A major focus of our study is to compile literature related to the functional prediction of lncRNAs in human diseases using co-expression network analysis. In summary, this article provides relevant information on the use of appropriate computational tools for the functional characterization of lncRNAs that help wet-lab researchers design mechanistic and functional experiments.

Current Bioinformatics. 2024;19(1):21-38
pages 21-38 views

miRNA, siRNA, and lncRNA: Recent Development of Bioinformatics Tools and Databases in Support of Combating Different Diseases

Chakraborty C., Bhattacharya M., Ranjan Sharma A.

Abstract

Today, the bioinformatics tool and database development are one of the most significant research areas in computational biology. Computational biologists are developing diverse bioinformatics tools and databases in the various fields of biological science. Nowadays, several non-coding RNAs (ncRNA) have been studied extensively, which act as a mediator of the regulation of gene expression. ncRNA is a functional RNA molecule that is transcribed from the mammalian genome. It also controls the disease regulation pathway. Based on the size, ncRNA can be classified into three categories such as small ncRNA (~18–30 nt), medium ncRNA (~30–200 nt), and long ncRNA (from 200 nt to several hundred kb). The miRNA and siRNAs are two types of ncRNA. Various bioinformatics tools and databases have recently been developed to understand the different ncRNAs (miRNAs, siRNAs, and lncRNAs) disease association. We have illustrated different bioinformatics resources, such as in silico tools and databases, currently available for researching miRNAs, siRNAs, and lncRNAs. Some bioinformatics- based miRNA tools are miRbase, miRecords, miRCancer, miRSystem, miRGator, miRNEST, mirtronPred and miRIAD, etc. Bioinformatics-based siRNA tools are siPRED, siDRM, sIR, siDirect 2.0. Bioinformatics-based lncRNAs tools are lncRNAdb v2, lncRNAtor, LncDisease, iLoc-lncRNA, etc. These tools and databases benefit molecular biologists, biomedical researchers, and computational biologists.

Current Bioinformatics. 2024;19(1):39-60
pages 39-60 views

Representation Learning of Biological Concepts: A Systematic Review

Yang Y., Zuo X., Das A., Xu H., Zheng W.

Abstract

Objective:Representation learning in the context of biological concepts involves acquiring their numerical representations through various sources of biological information, such as sequences, interactions, and literature. This study has conducted a comprehensive systematic review by analyzing both quantitative and qualitative data to provide an overview of this field.

Methods:Our systematic review involved searching for articles on the representation learning of biological concepts in PubMed and EMBASE databases. Among the 507 articles published between 2015 and 2022, we carefully screened and selected 65 papers for inclusion. We then developed a structured workflow that involved identifying relevant biological concepts and data types, reviewing various representation learning techniques, and evaluating downstream applications for assessing the quality of the learned representations.

Results:The primary focus of this review was on the development of numerical representations for gene/DNA/RNA entities. We have found Word2Vec to be the most commonly used method for biological representation learning. Moreover, several studies are increasingly utilizing state-of-the-art large language models to learn numerical representations of biological concepts. We also observed that representations learned from specific sources were typically used for single downstream applications that were relevant to the source.

Conclusion:Existing methods for biological representation learning are primarily focused on learning representations from a single data type, with the output being fed into predictive models for downstream applications. Although there have been some studies that have explored the use of multiple data types to improve the performance of learned representations, such research is still relatively scarce. In this systematic review, we have provided a summary of the data types, models, and downstream applications used in this task.

Current Bioinformatics. 2024;19(1):61-72
pages 61-72 views

Interplay of miRNA-TF-Gene Through a Novel Six-node Feed-forward Loop Identified Inflammatory Genes as Key Regulators in Type-2 Diabetes

Bhat G., Keshav T., Hariharapura R., Fayaz S.M.

Abstract

Background:Intricacy in the pathological processes of type 2 diabetes (T2D) invites a need to understand gene regulation at the systems level. However, deciphering the complex gene modulation requires regulatory network construction,

Objective:The study aims to construct a six-node feed-forward loop (FFL) to analyze all the diverse inter- and intra- interactions between microRNAs (miRNA) and transcription factors (TF) involved in gene regulation.

Methods:The study included 644 genes, 64 TF, and 448 miRNA. A cumulative hypergeometric test was employed to identify the significant miRNA-miRNA and miRNA-TF interaction pairs. In addition, experimentally proven TF-TF pairs were incorporated for the first time in the regulatory network to discern gene regulation. The networks were analyzed to identify crucial genes involved in T2D. Following this, gene ontology was predicted to recognize the biological function that is crucial in T2D.

Results:In T2D, the lowest gene regulation for a composite FFL occurs through a four-node FFL variant1 (TF- miRNA-miRNA-Gene, n=14) and the highest regulation via a five-node FFL variant2 (TF-TF-miRNA-Gene, n=353). However, the maximum gene regulation occurs via six-node miRNA FFL (miRNA-miRNA-TF-TF-gene-gene, n=23987). Subnetworks derived from the six-node miRNATF- gene regulatory networks identified interactions among TP53 and NFkB, hsa-miR-125-5p and hsamiR- 155-5p.

Conclusion:The core regulation occurs through TP53, NFkB, hsa-miR-125-5p, and hsa-miR-155-5p FFL implicating the association of inflammation in the pathogenesis of T2D, which occurs majorly via six-node miRNA FFL. Thus regulatory network provides broader insights into the pathogenesis of T2D and can be extended to study the inflammatory mechanisms in various infections.

Current Bioinformatics. 2024;19(1):73-90
pages 73-90 views

SVM-Root: Identification of Root-Associated Proteins in Plants by Employing the Support Vector Machine with Sequence-Derived Features

Kumar Meher P., Hati S., Sahu T., Pradhan U., Gupta A., Rath S.

Abstract

Background:Root is a desirable trait for modern plant breeding programs, as the roots play a pivotal role in the growth and development of plants. Therefore, identification of the genes governing the root traits is an essential research component. With regard to the identification of root-associated genes/proteins, the existing wet-lab experiments are resource intensive and the gene expression studies are species-specific. Thus, we proposed a supervised learning-based computational method for the identification of root-associated proteins.

Method:The problem was formulated as a binary classification, where the root-associated proteins and non-root-associated proteins constituted the two classes. Four different machine learning algorithms such as support vector machine (SVM), extreme gradient boosting, random forest, and adaptive boosting were employed for the classification of proteins of the two classes. Sequence-derived features such as AAC, DPC, CTD, PAAC, and ACF were used as input for the learning algorithms.

Results:The SVM achieved higher accuracy with the 250 selected features of AAC+DPC+CTD than that of other possible combinations of feature sets and learning algorithms. Specifically, SVM with the selected features achieved overall accuracies of 0.74, 0.73, and 0.73 when evaluated with single 5-fold cross-validation (5F-CV), repeated 5F-CV, and independent test set, respectively.

Conclusions:A web-enabled prediction tool SVM-Root (https://iasri-sg.icar.gov.in/svmroot/) has been developed for the computational prediction of the root-associated proteins. Being the first of its kind, the proposed model is believed to supplement the existing experimental methods and high throughput GWAS and transcriptome studies.

Current Bioinformatics. 2024;19(1):91-102
pages 91-102 views