DMR_Kmeans: Identifying Differentially Methylated Regions Based on k-means Clustering and Read Methylation Haplotype Filtering
- Authors: Peng X.1, Cui W.2, Kong X.2, Huang Y.3, Li J.4
-
Affiliations:
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University
- School of Computer Science and Engineering, Central South University
- School of Information Science, Guangdong University Of Finances and Economics
- Department of Hematology, Second Xiangya Hospital of Central South University
- Issue: Vol 19, No 5 (2024)
- Pages: 490-501
- Section: Life Sciences
- URL: https://jdigitaldiagnostics.com/1574-8936/article/view/643927
- DOI: https://doi.org/10.2174/0115748936245495230925112419
- ID: 643927
Cite item
Full Text
Abstract
Introduction::Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs.
Methods::In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions.
Result::Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods.
Conclusion::Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods.
About the authors
Xiaoqing Peng
Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University
Email: info@benthamscience.net
Wanxin Cui
School of Computer Science and Engineering, Central South University
Email: info@benthamscience.net
Xiangyan Kong
School of Computer Science and Engineering, Central South University
Email: info@benthamscience.net
Yuannan Huang
School of Information Science, Guangdong University Of Finances and Economics
Email: info@benthamscience.net
Ji Li
Department of Hematology, Second Xiangya Hospital of Central South University
Author for correspondence.
Email: info@benthamscience.net
References
- Kundaje A, Meuleman W, Ernst J, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015; 518(7539): 317-30. doi: 10.1038/nature14248 PMID: 25693563
- Bergman Y, Cedar H. DNA methylation dynamics in health and disease. Nat Struct Mol Biol 2013; 20(3): 274-81. doi: 10.1038/nsmb.2518 PMID: 23463312
- Peng X, Li Y, Kong X, Zhu X, Ding X. Investigating different DNA methylation patterns at the resolution of methylation haplotypes. Front Genet 2021; 12: 697279. doi: 10.3389/fgene.2021.697279 PMID: 34262601
- Gibbs JR, van der Brug MP, Hernandez DG, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 2010; 6(5): e1000952. doi: 10.1371/journal.pgen.1000952 PMID: 20485568
- Bell JT, Pai AA, Pickrell JK, et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 2011; 12(1): R10. doi: 10.1186/gb-2011-12-1-r10 PMID: 21251332
- Song F, Smith JF, Kimura MT, et al. Association of tissue-specific differentially methylated regions (TDMs) with differential gene expression. Proc Natl Acad Sci 2005; 102(9): 3336-41. doi: 10.1073/pnas.0408436102 PMID: 15728362
- Rakyan VK, Down TA, Thorne NP, et al. An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res 2008; 18(9): 1518-29. doi: 10.1101/gr.077479.108 PMID: 18577705
- Yagi S, Hirabayashi K, Sato S, et al. DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expression. Genome Res 2008; 18(12): 1969-78. doi: 10.1101/gr.074070.107 PMID: 18971312
- Delgado-Calle J, Fernández AF, Sainz J, et al. Genome-wide profiling of bone reveals differentially methylated regions in osteoporosis and osteoarthritis. Arthritis Rheum 2013; 65(1): 197-205. doi: 10.1002/art.37753 PMID: 23124911
- Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002; 3(6): 415-28. doi: 10.1038/nrg816 PMID: 12042769
- Irizarry RA, Ladd-Acosta C, Wen B, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 2009; 41(2): 178-86. doi: 10.1038/ng.298 PMID: 19151715
- Moss J, Magenheim J, Neiman D, et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat Commun 2018; 9(1): 5068. doi: 10.1038/s41467-018-07466-6 PMID: 30498206
- Kang S, Li Q, Chen Q, et al. CancerLocator: Non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA. Genome Biol 2017; 18(1): 53. doi: 10.1186/s13059-017-1191-5 PMID: 28335812
- Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet 2017; 49(4): 635-42. doi: 10.1038/ng.3805 PMID: 28263317
- Peng X, Li HD, Wu FX, Wang J. Identifying the tissues-of-origin of circulating cell-free DNAs is a promising way in noninvasive diagnostics. Brief Bioinform 2021; 22(3): bbaa060. doi: 10.1093/bib/bbaa060 PMID: 32427285
- Nunes S, Moreira-Barbosa C, Salta S, et al. Cell-free DNA methylation of selected genes allows for early detection of the major cancers in women. Cancers 2018; 10(10): 357. doi: 10.3390/cancers10100357 PMID: 30261643
- Li W, Li Q, Kang S, et al. CancerDetector: Ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res 2018; 46(15): e89-9. doi: 10.1093/nar/gky423 PMID: 29897492
- Lehmann-Werman R, Neiman D, Zemmour H, et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc Natl Acad Sci 2016; 113(13): E1826-34. doi: 10.1073/pnas.1519286113 PMID: 26976580
- Wang L, Ding Y, Xu J, et al. Identification of DNA N4-methylcytosine sites via fuzzy model on self representation. Appl Soft Comput 2022; 122: 108840. doi: 10.1016/j.asoc.2022.108840
- Wang L, Ding Y, Tiwari P, et al. A deep multiple kernel learning-based higher-order fuzzy inference system for identifying DNA N4-methylcytosine sites. Information Sciences 2023; 630: 40-52. doi: 10.1016/j.ins.2023.01.149
- Xie H, Ding Y, Qian Y, et al. Structured Sparse Regularization based Random Vector Functional Link Networks for DNA N4-methylcytosine sites prediction. Expert Systems with Applications 2024; 235: 121157. doi: 10.1016/j.eswa.2023.121157
- Ding Y, Tiwari P, Zou Q, et al. C-loss based higher order fuzzy inference systems for identifying DNA N4-methylcytosine sites. IEEE Trans Fuzzy Syst 2022; 30(11): 4754-65. doi: 10.1109/TFUZZ.2022.3159103
- Condon DE, Tran PV, Lien YC, et al. Defiant: (DMRs: Easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus. BMC Bioinformatics 2018; 19(1): 31. doi: 10.1186/s12859-018-2037-1 PMID: 29402210
- Catoni M, Tsang JMF, Greco AP, Zabet NR. DMRcaller: A versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts. Nucleic Acids Res 2018; 46(19): e114-4. doi: 10.1093/nar/gky602 PMID: 29986099
- Feng H, Conneely KN, Wu H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res 2014; 42(8): e69-9. doi: 10.1093/nar/gku154 PMID: 24561809
- Dolzhenko E, Smith AD. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinformatics 2014; 15(1): 215. doi: 10.1186/1471-2105-15-215 PMID: 24962134
- Sun D, Xi Y, Rodriguez B, et al. MOABS: Model based analysis of bisulfite sequencing data. Genome Biol 2014; 15(2): R38. doi: 10.1186/gb-2014-15-2-r38 PMID: 24565500
- Hansen KD, Langmead B, Irizarry RA. BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol 2012; 13(10): R83. doi: 10.1186/gb-2012-13-10-r83 PMID: 23034175
- Saito Y, Tsuji J, Mituyama T. Bisulfighter: Accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Res 2014; 42(6): e45-5. doi: 10.1093/nar/gkt1373 PMID: 24423865
- Wu H, Xu T, Feng H, et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res 2015; 43(21): gkv715. doi: 10.1093/nar/gkv715 PMID: 26184873
- Assenov Y, Müller F, Lutsik P, Walter J, Lengauer T, Bock C. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 2014; 11(11): 1138-40. doi: 10.1038/nmeth.3115 PMID: 25262207
- Akalin A, Kormaksson M, Li S, et al. methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 2012; 13(10): R87. doi: 10.1186/gb-2012-13-10-r87 PMID: 23034086
- Warden CD, Lee H, Tompkins JD, et al. COHCAP: An integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis. Nucleic Acids Res 2013; 41(11): e117-7. doi: 10.1093/nar/gkt242 PMID: 23598999
- Stockwell PA, Chatterjee A, Rodger EJ, Morison IM. DMAP: Differential methylation analysis package for RRBS and WGBS data. Bioinformatics 2014; 30(13): 1814-22. doi: 10.1093/bioinformatics/btu126 PMID: 24608764
- Wang Z, Li X, Jiang Y, et al. swDMR: A sliding window approach to identify differentially methylated regions based on whole genome bisulfite sequencing. PLoS One 2015; 10(7): e0132866. doi: 10.1371/journal.pone.0132866 PMID: 26176536
- Hebestreit K, Dugas M, Klein HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 2013; 29(13): 1647-53. doi: 10.1093/bioinformatics/btt263 PMID: 23658421
- Li S, Garrett-Bakelman FE, Akalin A, et al. An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinform 2013; S10. doi: 10.1186/1471-2105-14-S5-S10
- Su J, Yan H, Wei Y, et al. CpG_MPs: Identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data. Nucleic Acids Res 2013; 41(1): e4-4. doi: 10.1093/nar/gks829 PMID: 22941633
- Liu H, Liu X, Zhang S, et al. Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes. Nucleic Acids Res 2016; 44(1): 75-94. doi: 10.1093/nar/gkv1332 PMID: 26635396
- Jühling F, Kretzmer H, Bernhart SH, Otto C, Stadler PF, Hoffmann S. metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res 2016; 26(2): 256-62. doi: 10.1101/gr.196394.115 PMID: 26631489
- Wen Y, Chen F, Zhang Q, Zhuang Y, Li Z. Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics. Bioinformatics 2016; 32(22): 3396-404. doi: 10.1093/bioinformatics/btw497 PMID: 27493194
- MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Oakland, CA, USA. 1967; pp. 281-97.
- Consortium EP. The ENCODE (ENCyclopedia of DNA elements) project. Science 2004; 306(5696): 636-40. doi: 10.1126/science.1105136 PMID: 15499007
- Krueger F, Andrews SR. Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 2011; 27(11): 1571-2. doi: 10.1093/bioinformatics/btr167 PMID: 21493656
- Peng X, Luo H, Kong X, Wang J. Metrics for evaluating differentially methylated region sets predicted from BS-seq data. Brief Bioinform 2022; 23(1): bbab475. doi: 10.1093/bib/bbab475 PMID: 34874989
- Srivastava A, Karpievitch YV, Eichten SR, Borevitz JO, Lister R. HOME: A histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinformatics 2019; 20(1): 253. doi: 10.1186/s12859-019-2845-y PMID: 31096906
Supplementary files
