Vol 19, No 6 (2024)
- Year: 2024
- Articles: 5
- URL: https://jdigitaldiagnostics.com/1574-8936/issue/view/9964
Life Sciences
Bioinformatic Resources for Plant Genomic Research
Abstract
Genome assembly and annotation are crucial steps in plant genomics research as they provide valuable insights into plant genetic makeup, gene regulation, evolutionary history, and biological processes. In the emergence of high-throughput sequencing technologies, a plethora of genome assembly tools have been developed to meet the diverse needs of plant genome researchers. Choosing the most suitable tool to suit a specific research need can be daunting due to the complex and varied nature of plant genomes and reads from the sequencers. To assist informed decision-making in selecting the appropriate genome assembly and annotation tool(s), this review offers an extensive overview of the most widely used genome and transcriptome assembly tools. The review covers the specific information on each tool in tabular data, and the data types it can process. In addition, the review delves into transcriptome assembly tools, plant resource databases, and repositories (12 for Arabidopsis, 9 for Rice, 5 for Tomato, and 8 general use resources), which are vital for gene expression profiling and functional annotation and ontology tools that facilitate data integration and analysis.



A Review of Drug-related Associations Prediction Based on Artificial Intelligence Methods
Abstract
Background:Predicting drug-related associations is an important task in drug development and discovery. With the rapid advancement of high-throughput technologies and various biological and medical data, artificial intelligence (AI), especially progress in machine learning (ML) and deep learning (DL), has paved a new way for the development of drug-related associations prediction. Many studies have been conducted in the literature to predict drug-related associations. This study looks at various computational methods used for drug-related associations prediction with the hope of getting a better insight into the computational methods used.
Methods:The various computational methods involved in drug-related associations prediction have been reviewed in this work. We have first summarized the drug, target, and disease-related mainstream public datasets. Then, we have discussed existing drug similarity, target similarity, and integrated similarity measurement approaches and grouped them according to their suitability. We have then comprehensively investigated drug-related associations and introduced relevant computational methods. Finally, we have briefly discussed the challenges involved in predicting drug-related associations.
Result:We discovered that quite a few studies have used implemented ML and DL approaches for drug-related associations prediction. The key challenges were well noted in constructing datasets with reasonable negative samples, extracting rich features, and developing powerful prediction models or ensemble strategies.
Conclusion:This review presents useful knowledge and future challenges on the subject matter with the hope of promoting further studies on predicting drug-related associations.



A Systematic Review of Medical Expert Systems for Cardiac Arrest Prediction
Abstract
Background::Predicting cardiac arrest is crucial for timely intervention and improved patient outcomes. Machine learning has yielded astounding results by offering tailored prediction analyses on complex data. Despite advancements in medical expert systems, there remains a need for a comprehensive analysis of their effectiveness and limitations in cardiac arrest prediction. This need arises because there are not enough existing studies that thoroughly cover the topic.
Objective::The systematic review aims to analyze the existing literature on medical expert systems for cardiac arrest prediction, filling the gaps in knowledge and identifying key challenges.
Methods::This paper adopts the PRISMA methodology to conduct a systematic review of 37 publications obtained from PubMed, Springer, ScienceDirect, and IEEE, published within the last decade. Careful inclusion and exclusion criteria were applied during the selection process, resulting in a comprehensive analysis that utilizes five integrated layers- research objectives, data collection, feature set generation, model training and validation employing various machine learning techniques.
Results and Conclusion::The findings indicate that current studies frequently use ensemble and deep learning methods to improve machine learning predictions accuracy. However, they lack adequate implementation of proper pre-processing techniques. Further research is needed to address challenges related to external validation, implementation, and adoption of machine learning models in real clinical settings, as well as integrating machine learning with AI technologies like NLP. This review aims to be a valuable resource for both novice and experienced researchers, offering insights into current methods and potential future recommendations.



A Metric to Characterize Differentially Methylated Region Sets Detected from Methylation Array Data
Abstract
Background:Identifying differentially methylated region (DMR) is a basic but important task in epigenomics, which can help investigate the mechanisms of diseases and provide methylation biomarkers for screening diseases. A set of methods have been proposed to identify DMRs from methylation array data. However, it lacks effective metrics to characterize different DMR sets and enable a straight way for comparison.
Methods:In this study, we introduce a metric, DMRn, to characterize DMR sets detected by different methods from methylation array data. To calculate DMRn, firstly, the methylation differences of DMRs are recalculated by incorporating the correlations between probes and their represented CpGs. Then, DMRn is calculated based on the number of probes and the dense of CpGs in DMRs with methylation differences falling in each interval.
Result & Discussion:By comparing the DMRn of DMR sets predicted by seven methods on four scenario, the results demonstrate that DMRn can make an efficient guidance for selecting DMR sets, and provide new insights in cancer genomics studies by comparing the DMR sets from the related pathological states. For example, there are many regions with subtle methylation alteration in subtypes of prostate cancer are altered oppositely in the benign state, which may indicate a possible revision mechanism in benign prostate cancer.
Conclusion:Futhermore, when applied to datasets that underwent different runs of batch effect removal, the DMRn can help to visualize the bias introduced by multi-runs of batch effect removal. The tool for calculating DMRn is available in the GitHub repository(https://github.com/xqpeng/DMRArrayMetric).



RDR100: A Robust Computational Method for Identification of Krüppel-like Factors
Abstract
Background:Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.
Methods:In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers.
Results:The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.
Conclusion:Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.


