DMR_Kmeans: Identifying Differentially Methylated Regions Based on k-means Clustering and Read Methylation Haplotype Filtering


Cite item

Full Text

Abstract

Introduction::Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs.

Methods::In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions.

Result::Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods.

Conclusion::Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods.

About the authors

Xiaoqing Peng

Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University

Email: info@benthamscience.net

Wanxin Cui

School of Computer Science and Engineering, Central South University

Email: info@benthamscience.net

Xiangyan Kong

School of Computer Science and Engineering, Central South University

Email: info@benthamscience.net

Yuannan Huang

School of Information Science, Guangdong University Of Finances and Economics

Email: info@benthamscience.net

Ji Li

Department of Hematology, Second Xiangya Hospital of Central South University

Author for correspondence.
Email: info@benthamscience.net

References

  1. Kundaje A, Meuleman W, Ernst J, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015; 518(7539): 317-30. doi: 10.1038/nature14248 PMID: 25693563
  2. Bergman Y, Cedar H. DNA methylation dynamics in health and disease. Nat Struct Mol Biol 2013; 20(3): 274-81. doi: 10.1038/nsmb.2518 PMID: 23463312
  3. Peng X, Li Y, Kong X, Zhu X, Ding X. Investigating different DNA methylation patterns at the resolution of methylation haplotypes. Front Genet 2021; 12: 697279. doi: 10.3389/fgene.2021.697279 PMID: 34262601
  4. Gibbs JR, van der Brug MP, Hernandez DG, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 2010; 6(5): e1000952. doi: 10.1371/journal.pgen.1000952 PMID: 20485568
  5. Bell JT, Pai AA, Pickrell JK, et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 2011; 12(1): R10. doi: 10.1186/gb-2011-12-1-r10 PMID: 21251332
  6. Song F, Smith JF, Kimura MT, et al. Association of tissue-specific differentially methylated regions (TDMs) with differential gene expression. Proc Natl Acad Sci 2005; 102(9): 3336-41. doi: 10.1073/pnas.0408436102 PMID: 15728362
  7. Rakyan VK, Down TA, Thorne NP, et al. An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res 2008; 18(9): 1518-29. doi: 10.1101/gr.077479.108 PMID: 18577705
  8. Yagi S, Hirabayashi K, Sato S, et al. DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expression. Genome Res 2008; 18(12): 1969-78. doi: 10.1101/gr.074070.107 PMID: 18971312
  9. Delgado-Calle J, Fernández AF, Sainz J, et al. Genome-wide profiling of bone reveals differentially methylated regions in osteoporosis and osteoarthritis. Arthritis Rheum 2013; 65(1): 197-205. doi: 10.1002/art.37753 PMID: 23124911
  10. Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002; 3(6): 415-28. doi: 10.1038/nrg816 PMID: 12042769
  11. Irizarry RA, Ladd-Acosta C, Wen B, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 2009; 41(2): 178-86. doi: 10.1038/ng.298 PMID: 19151715
  12. Moss J, Magenheim J, Neiman D, et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat Commun 2018; 9(1): 5068. doi: 10.1038/s41467-018-07466-6 PMID: 30498206
  13. Kang S, Li Q, Chen Q, et al. CancerLocator: Non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA. Genome Biol 2017; 18(1): 53. doi: 10.1186/s13059-017-1191-5 PMID: 28335812
  14. Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet 2017; 49(4): 635-42. doi: 10.1038/ng.3805 PMID: 28263317
  15. Peng X, Li HD, Wu FX, Wang J. Identifying the tissues-of-origin of circulating cell-free DNAs is a promising way in noninvasive diagnostics. Brief Bioinform 2021; 22(3): bbaa060. doi: 10.1093/bib/bbaa060 PMID: 32427285
  16. Nunes S, Moreira-Barbosa C, Salta S, et al. Cell-free DNA methylation of selected genes allows for early detection of the major cancers in women. Cancers 2018; 10(10): 357. doi: 10.3390/cancers10100357 PMID: 30261643
  17. Li W, Li Q, Kang S, et al. CancerDetector: Ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res 2018; 46(15): e89-9. doi: 10.1093/nar/gky423 PMID: 29897492
  18. Lehmann-Werman R, Neiman D, Zemmour H, et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc Natl Acad Sci 2016; 113(13): E1826-34. doi: 10.1073/pnas.1519286113 PMID: 26976580
  19. Wang L, Ding Y, Xu J, et al. Identification of DNA N4-methylcytosine sites via fuzzy model on self representation. Appl Soft Comput 2022; 122: 108840. doi: 10.1016/j.asoc.2022.108840
  20. Wang L, Ding Y, Tiwari P, et al. A deep multiple kernel learning-based higher-order fuzzy inference system for identifying DNA N4-methylcytosine sites. Information Sciences 2023; 630: 40-52. doi: 10.1016/j.ins.2023.01.149
  21. Xie H, Ding Y, Qian Y, et al. Structured Sparse Regularization based Random Vector Functional Link Networks for DNA N4-methylcytosine sites prediction. Expert Systems with Applications 2024; 235: 121157. doi: 10.1016/j.eswa.2023.121157
  22. Ding Y, Tiwari P, Zou Q, et al. C-loss based higher order fuzzy inference systems for identifying DNA N4-methylcytosine sites. IEEE Trans Fuzzy Syst 2022; 30(11): 4754-65. doi: 10.1109/TFUZZ.2022.3159103
  23. Condon DE, Tran PV, Lien YC, et al. Defiant: (DMRs: Easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus. BMC Bioinformatics 2018; 19(1): 31. doi: 10.1186/s12859-018-2037-1 PMID: 29402210
  24. Catoni M, Tsang JMF, Greco AP, Zabet NR. DMRcaller: A versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts. Nucleic Acids Res 2018; 46(19): e114-4. doi: 10.1093/nar/gky602 PMID: 29986099
  25. Feng H, Conneely KN, Wu H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res 2014; 42(8): e69-9. doi: 10.1093/nar/gku154 PMID: 24561809
  26. Dolzhenko E, Smith AD. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinformatics 2014; 15(1): 215. doi: 10.1186/1471-2105-15-215 PMID: 24962134
  27. Sun D, Xi Y, Rodriguez B, et al. MOABS: Model based analysis of bisulfite sequencing data. Genome Biol 2014; 15(2): R38. doi: 10.1186/gb-2014-15-2-r38 PMID: 24565500
  28. Hansen KD, Langmead B, Irizarry RA. BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol 2012; 13(10): R83. doi: 10.1186/gb-2012-13-10-r83 PMID: 23034175
  29. Saito Y, Tsuji J, Mituyama T. Bisulfighter: Accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Res 2014; 42(6): e45-5. doi: 10.1093/nar/gkt1373 PMID: 24423865
  30. Wu H, Xu T, Feng H, et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res 2015; 43(21): gkv715. doi: 10.1093/nar/gkv715 PMID: 26184873
  31. Assenov Y, Müller F, Lutsik P, Walter J, Lengauer T, Bock C. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 2014; 11(11): 1138-40. doi: 10.1038/nmeth.3115 PMID: 25262207
  32. Akalin A, Kormaksson M, Li S, et al. methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 2012; 13(10): R87. doi: 10.1186/gb-2012-13-10-r87 PMID: 23034086
  33. Warden CD, Lee H, Tompkins JD, et al. COHCAP: An integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis. Nucleic Acids Res 2013; 41(11): e117-7. doi: 10.1093/nar/gkt242 PMID: 23598999
  34. Stockwell PA, Chatterjee A, Rodger EJ, Morison IM. DMAP: Differential methylation analysis package for RRBS and WGBS data. Bioinformatics 2014; 30(13): 1814-22. doi: 10.1093/bioinformatics/btu126 PMID: 24608764
  35. Wang Z, Li X, Jiang Y, et al. swDMR: A sliding window approach to identify differentially methylated regions based on whole genome bisulfite sequencing. PLoS One 2015; 10(7): e0132866. doi: 10.1371/journal.pone.0132866 PMID: 26176536
  36. Hebestreit K, Dugas M, Klein HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 2013; 29(13): 1647-53. doi: 10.1093/bioinformatics/btt263 PMID: 23658421
  37. Li S, Garrett-Bakelman FE, Akalin A, et al. An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinform 2013; S10. doi: 10.1186/1471-2105-14-S5-S10
  38. Su J, Yan H, Wei Y, et al. CpG_MPs: Identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data. Nucleic Acids Res 2013; 41(1): e4-4. doi: 10.1093/nar/gks829 PMID: 22941633
  39. Liu H, Liu X, Zhang S, et al. Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes. Nucleic Acids Res 2016; 44(1): 75-94. doi: 10.1093/nar/gkv1332 PMID: 26635396
  40. Jühling F, Kretzmer H, Bernhart SH, Otto C, Stadler PF, Hoffmann S. metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res 2016; 26(2): 256-62. doi: 10.1101/gr.196394.115 PMID: 26631489
  41. Wen Y, Chen F, Zhang Q, Zhuang Y, Li Z. Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics. Bioinformatics 2016; 32(22): 3396-404. doi: 10.1093/bioinformatics/btw497 PMID: 27493194
  42. MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Oakland, CA, USA. 1967; pp. 281-97.
  43. Consortium EP. The ENCODE (ENCyclopedia of DNA elements) project. Science 2004; 306(5696): 636-40. doi: 10.1126/science.1105136 PMID: 15499007
  44. Krueger F, Andrews SR. Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 2011; 27(11): 1571-2. doi: 10.1093/bioinformatics/btr167 PMID: 21493656
  45. Peng X, Luo H, Kong X, Wang J. Metrics for evaluating differentially methylated region sets predicted from BS-seq data. Brief Bioinform 2022; 23(1): bbab475. doi: 10.1093/bib/bbab475 PMID: 34874989
  46. Srivastava A, Karpievitch YV, Eichten SR, Borevitz JO, Lister R. HOME: A histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinformatics 2019; 20(1): 253. doi: 10.1186/s12859-019-2845-y PMID: 31096906

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Bentham Science Publishers