CapsEnhancer: An Effective Computational Framework for Identifying Enhancers Based on Chaos Game Representation and Capsule NetworkClick to copy article linkArticle link copied!
- Lantian YaoLantian YaoKobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaSchool of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, ChinaMore by Lantian Yao
- Peilin XiePeilin XieKobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaMore by Peilin Xie
- Jiahui GuanJiahui GuanSchool of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaMore by Jiahui Guan
- Chia-Ru ChungChia-Ru ChungDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 320317, TaiwanMore by Chia-Ru Chung
- Yixian HuangYixian HuangSchool of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaMore by Yixian Huang
- Yuxuan PangYuxuan PangDivision of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, JapanMore by Yuxuan Pang
- Huacong WuHuacong WuSchool of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaMore by Huacong Wu
- Ying-Chih Chiang*Ying-Chih Chiang*Email: [email protected]Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaSchool of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, ChinaMore by Ying-Chih Chiang
- Tzong-Yi Lee*Tzong-Yi Lee*Email: [email protected]Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, TaiwanCenter for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, TaiwanMore by Tzong-Yi Lee
Abstract
Enhancers are a class of noncoding DNA, serving as crucial regulatory elements in governing gene expression by binding to transcription factors. The identification of enhancers holds paramount importance in the field of biology. However, traditional experimental methods for enhancer identification demand substantial human and material resources. Consequently, there is a growing interest in employing computational methods for enhancer prediction. In this study, we propose a two-stage framework based on deep learning, termed CapsEnhancer, for the identification of enhancers and their strengths. CapsEnhancer utilizes chaos game representation to encode DNA sequences into unique images and employs a capsule network to extract local and global features from sequence “images”. Experimental results demonstrate that CapsEnhancer achieves state-of-the-art performance in both stages. In the first and second stages, the accuracy surpasses the previous best methods by 8 and 3.5%, reaching accuracies of 94.5 and 95%, respectively. Notably, this study represents the pioneering application of computer vision methods to enhancer identification tasks. Our work not only contributes novel insights to enhancer identification but also provides a fresh perspective for other biological sequence analysis tasks.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Introduction
no. | method | feature encoding | algorithm | year | reference |
---|---|---|---|---|---|
1 | iEnhancer-2L | PseKNC | SVM | 2016 | (14) |
2 | EnhancerPred | BPB, NC, PseNC | SVM | 2016 | (15) |
3 | iEnhancer-EL | Kmer, subsequence profile, PseKNC | ensemble learning | 2018 | (17) |
4 | iEnhancer-ECNN | one-hot encoding, Kmers | CNN | 2019 | (20) |
5 | iEnhancer-XG | k-spectrum profile, mismatch k-tuple, subsequence profile, PSSM, PseDNCb | XGBoost | 2021 | (23) |
6 | BERT-Enhancer | BERT encoding | CNN | 2021 | (21) |
7 | iEnhancer-EBLSTM | Kmers | BiLSTM | 2021 | (22) |
8 | iEnhancer-RF | NBP, DBP, ANF, NCP, ENAC, XY K-GAP | RF | 2021 | (16) |
9 | iEnhancer-RD | Kmers, PseKNC, KPCV | DNN | 2021 | (24) |
10 | spEnhancer | Kmers | BiLSTM | 2021 | (25) |
11 | Enhancer-FRL | ANF, CKSNAP, DAC, ENAC, Kmers, NCP,PseTIIP, SCPseDNC, SCPseTNC, TACC | SVM, RF, KNN, naive Bayesian, LightGBM | 2022 | (18) |
12 | iEnhancer-BERT | BERT encoding | CNN | 2022 | (26) |
13 | iEnhancer-ELM | Kmer, BERT encoding | MLP | 2023 | (27) |
14 | iEnhancer-DCSA | Word2vec | dual-scale CNN, spatial attention | 2023 | (28) |
15 | iEnhancer-SKNN | Kmer, PseDNC, PCPseDNC and Z-Curve9 | ensemble learning | 2023 | (29) |
16 | NEPERS | PSTNPss, PSTNPdss, CKSNAP, NCP | deep forest | 2023 | (19) |
(1) | We designed a two-stage computational framework called CapsEnhancer to identify enhancers and their strengths. The first stage of CapsEnhancer focuses on enhancer recognition, distinguishing between enhancer and nonenhancer. The second stage involves predicting enhancer strength, specifically discerning between strong and weak enhancers. | ||||
(2) | CapsEnhancer uses CGR encoding to represent each DNA sequence as an image. Through this encoding method, it can effectively represent Kmers and their frequencies. | ||||
(3) | CapsEnhancer employs a capsule network-based architecture to learn local and global features from the “images” transformed from DNA sequences. CapsEnhancer represents the pioneering adoption of computer vision strategies for enhancer identification. | ||||
(4) | Experimental results demonstrate that CapsEnhancer attains state-of-the-art performance in the two-stage task. In comparison to previous methods, CapsEnhancer exhibits significant improvements, achieving an 8% increase in accuracy during the first stage and a 3.5% improvement in the second stage. Beyond providing a robust solution for enhancer identification, our framework introduces a novel perspective for other biological sequence analysis tasks. |
Materials and Methods
Benchmark Data Set
Architecture Overview of CapsEnhancer
notation | description |
---|---|
N | the size of the FCGR images |
m | dimension of each primary capsule |
n | dimension of each type capsule |
ui | primary capsules |
Vj | type capsules |
W | weight matrix in capsule networks |
ci,j | coupling coefficients |
p | prediction probability |
CGR Encoding
Capsule Network
Performance Assessment
Results and Discussion
Performance Comparison with Existing Methods
First Stage: Enhancer Versus Nonenhancer
method | accuracy (%) | sensitivity (%) | specificity (%) | MCC | AUC (%) |
---|---|---|---|---|---|
iEnhancer-2L | 73.0 | 71.0 | 75.0 | 0.460 | 80.6 |
EnhancerPred | 74.0 | 73.5 | 74.5 | 0.480 | 80.1 |
iEnhancer-EL | 74.8 | 71.0 | 78.5 | 0.496 | 81.7 |
iEnhancer-ECNN | 76.9 | 78.5 | 75.2 | 0.537 | 83.2 |
iEnhancer-XG | 75.8 | 74.0 | 77.5 | 0.515 | |
Enhancer-FRL | 78.0 | 80.5 | 75.5 | 0.561 | 85.7 |
BERT-Enhancer | 75.6 | 80.0 | 71.2 | 0.514 | |
iEnhancer-EBLSTM | 77.2 | 75.5 | 79.5 | 0.534 | 83.5 |
iEnhancer-RF | 79.8 | 78.5 | 81.0 | 0.595 | 86.0 |
iEnhancer-RD | 78.8 | 81.0 | 76.5 | 0.576 | 84.4 |
spEnhancer | 77.3 | 83.0 | 71.5 | 0.579 | 82.4 |
iEnhancer-DCSA | 82.5 | 79.5 | 85.5 | 0.651 | 85.6 |
NEPERS | 86.3 | 86.5 | 86.0 | 0.725 | 94.8 |
CapsEnhancer (ours) | 94.5 | 93.0 | 96.0 | 0.890 | 98.0 |
Second Stage: Strong Enhancer Versus Weak Enhancer
method | accuracy (%) | sensitivity (%) | specificity (%) | MCC | AUC (%) |
---|---|---|---|---|---|
iEnhancer-2L | 60.5 | 47.0 | 74.0 | 0.218 | 66.8 |
EnhancerPred | 55.0 | 45.0 | 65.0 | 0.102 | 57.9 |
iEnhancer-EL | 61.0 | 54.0 | 68.0 | 0.222 | 68.0 |
iEnhancer-ECNN | 67.8 | 79.1 | 56.4 | 0.368 | 74.8 |
iEnhancer-XG | 63.5 | 70.0 | 57.0 | 0.272 | |
Enhancer-FRL | 73.5 | 98.0 | 49.0 | 0.539 | 87.2 |
BERT-Enhancer | |||||
iEnhancer-EBLSTM | 65.8 | 81.2 | 53.6 | 0.324 | 68.8 |
iEnhancer-RF | 85.0 | 93.0 | 77.0 | 0.709 | 97.0 |
iEnhancer-RD | 70.5 | 84.0 | 57.0 | 0.426 | 79.2 |
spEnhancer | 62.0 | 91.0 | 33.0 | 0.370 | 62.5 |
iEnhancer-DCSA | 91.5 | 98.0 | 85.0 | 0.837 | 96.6 |
NEPERS | 89.0 | 94.0 | 84.0 | 0.784 | 95.1 |
CapsEnhancer (ours) | 95.0 | 99.0 | 91.0 | 0.903 | 99.2 |
Effectiveness of the Capsule Network Architecture
Ablation Experiment
stage | method | accuracy (%) | sensitivity (%) | specificity (%) | MCC | AUC (%) |
---|---|---|---|---|---|---|
first stage | without CapsNet | 77.3 | 79.0 | 75.5 | 0.55 | 82.8 |
CapsEnhancer | 94.5 | 93.0 | 96.0 | 0.80 | 98.0 | |
second stage | without CapsNet | 86.0 | 98.0 | 74.0 | 0.72 | 91.8 |
CapsEnhancer | 95.0 | 99.0 | 91.0 | 0.93 | 99.2 |
Feature Analysis
Conclusions
Key Points
We proposed a two-stage framework, CapsEnhancer, based on deep learning, for accurate prediction of enhancers and their strength.
CapsEnhancer employs CGR encoding to represent each DNA sequence as an image. Through this encoding methodology, it enables effective representation of Kmers and their frequencies.
CapsEnhancer utilizes an architecture based on capsule networks to learn both local and global features from DNA “images”. Capsule networks overcome the limitations of traditional CNNs by capturing spatial relationships among features in DNA “images”, thereby enhancing the model’s performance.
The framework proposed in our study employs computer vision strategies to process biosequence data, complemented by the integration of a next-generation neural network, the capsule network. This presents a novel approach and perspective for tasks of biosequence data analysis.
Data Availability
Availability and implementationCapsEnhancer and data sets of this study are available at https://github.com/Cpillar/CapsEnhancer.
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.4c00546.
GC content distribution of the data sets, FCGR images for two example sequences and PR curves for the ablation experiment, hyperparameters of CapsEnhancer, a case study, and description of the dynamic routing algorithm (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
The authors sincerely appreciate the Kobilka Institute of Innovative Drug Discovery, The Chinese University of Hong Kong (Shenzhen), and the “Center for intelligent Drug Systems and Smart Biodevices” from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan. Y.-C.C. thanks The Royal Society for Newtown International Fellowship Alumni 2023 (AL\31027).
References
This article references 51 other publications.
- 1Basith, S.; Hasan, M. M.; Lee, G.; Wei, L.; Manavalan, B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Briefings Bioinf. 2021, 22, bbab252, DOI: 10.1093/bib/bbab252Google ScholarThere is no corresponding record for this reference.
- 2Corradin, O.; Scacheri, P. Enhancer variants: evaluating functions in common disease. Genome Med. 2014, 6 (10), 85, DOI: 10.1186/s13073-014-0085-3Google ScholarThere is no corresponding record for this reference.
- 3Levine, M. Transcriptional enhancers in animal development and evolution. Curr. Biol. 2010, 20, R754– R763, DOI: 10.1016/j.cub.2010.06.070Google ScholarThere is no corresponding record for this reference.
- 4Zhang, L.; Yang, Y.; Chai, L.; Li, Q.; Liu, J.; Lin, H.; Liu, L. A deep learning model to identify gene expression level using cobinding transcription factor signals. Briefings Bioinf. 2022, 23, bbab501, DOI: 10.1093/bib/bbab501Google ScholarThere is no corresponding record for this reference.
- 5Heinz, S.; Romanoski, C. E.; Benner, C.; Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 2015, 16, 144– 154, DOI: 10.1038/nrm3949Google ScholarThere is no corresponding record for this reference.
- 6Furlong, E. E.; Levine, M. Developmental enhancers and chromosome topology. Science 2018, 361, 1341– 1345, DOI: 10.1126/science.aau0320Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhslOrurrJ&md5=c34d974a2b7dd2257a3c1c9546f7984dDevelopmental enhancers and chromosome topologyFurlong, Eileen E. M.; Levine, MichaelScience (Washington, DC, United States) (2018), 361 (6409), 1341-1345CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)A review. Developmental enhancers mediate on/off patterns of gene expression in specific cell types at particular stages during metazoan embryogenesis. They typically integrate multiple signals and regulatory determinants to achieve precise spatiotemporal expression. Such enhancers can map quite far-one megabase or more-from the genes they regulate. How remote enhancers relay regulatory information to their target promoters is one of the central mysteries of genome organization and function. A variety of contrasting mechanisms have been proposed over the years, including enhancer tracking, linking, looping, and mobilization to transcription factories. We argue that extreme versions of these mechanisms cannot account for the transcriptional dynamics and precision seen in living cells, tissues, and embryos. We describe emerging evidence for dynamic three-dimensional hubs that combine different elements of the classical models.
- 7Schoenfelder, S.; Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 2019, 20, 437– 455, DOI: 10.1038/s41576-019-0128-0Google ScholarThere is no corresponding record for this reference.
- 8Bauer, D. E.; Orkin, S. H. Hemoglobin switching’s surprise: the versatile transcription factor BCL11A is a master repressor of fetal hemoglobin. Curr. Opin. Genet. Dev. 2015, 33, 62– 70, DOI: 10.1016/j.gde.2015.08.001Google ScholarThere is no corresponding record for this reference.
- 9Chen, X.; Xu, H.; Yuan, P.; Fang, F.; Huss, M.; Vega, V. B.; Wong, E.; Orlov, Y. L.; Zhang, W.; Jiang, J. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 2008, 133, 1106– 1117, DOI: 10.1016/j.cell.2008.04.043Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXnsF2gurw%253D&md5=607c91e79f165fd417659c688a9725daIntegration of external signaling pathways with the core transcriptional network in embryonic stem cellsChen, Xi; Xu, Han; Yuan, Ping; Fang, Fang; Huss, Mikael; Vega, Vinsensius B.; Wong, Eleanor; Orlov, Yuriy L.; Zhang, Weiwei; Jiang, Jianming; Loh, Yuin-Han; Yeo, Hock Chuan; Yeo, Zhen Xuan; Narang, Vipin; Govindarajan, Kunde Ramamoorthy; Leong, Bernard; Shahab, Atif; Ruan, Yijun; Bourque, Guillaume; Sung, Wing-Kin; Clarke, Neil D.; Wei, Chia-Lin; Ng, Huck-HuiCell (Cambridge, MA, United States) (2008), 133 (6), 1106-1117CODEN: CELLB5; ISSN:0092-8674. (Cell Press)Transcription factors (TFs) and their specific interactions with targets are crucial for specifying gene-expression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunopptn. coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biol. as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding sites identifies important features of the transcriptional regulatory networks that define ES-cell identity.
- 10May, D.; Blow, M. J.; Kaplan, T.; McCulley, D. J.; Jensen, B. C.; Akiyama, J. A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C. Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 2012, 44, 89– 93, DOI: 10.1038/ng.1006Google ScholarThere is no corresponding record for this reference.
- 11Visel, A.; Blow, M. J.; Li, Z.; Zhang, T.; Akiyama, J. A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C.; Chen, F. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 2009, 457, 854– 858, DOI: 10.1038/nature07730Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhslCntLg%253D&md5=2780a5b9f819181972b13e0c8ecc7a4bChIP-seq accurately predicts tissue-specific activity of enhancersVisel, Axel; Blow, Matthew J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Ren, Bing; Rubin, Edward M.; Pennacchio, Len A.Nature (London, United Kingdom) (2009), 457 (7231), 854-858CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunopptn. with the enhancer-assocd. protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their assocd. activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biol. and disease on a genome-wide scale.
- 12Pennacchio, L. A.; Bickmore, W.; Dean, A.; Nobrega, M. A.; Bejerano, G. Enhancers: five essential questions. Nat. Rev. Genet. 2013, 14, 288– 295, DOI: 10.1038/nrg3458Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXktVKhu7c%253D&md5=d86155e239743ebb1711130132dc521fEnhancers: five essential questionsPennacchio, Len A.; Bickmore, Wendy; Dean, Ann; Nobrega, Marcelo A.; Bejerano, GillNature Reviews Genetics (2013), 14 (4), 288-295CODEN: NRGAAM; ISSN:1471-0056. (Nature Publishing Group)A review. Although enhancers are crucial and widespread gene-regulatory elements, we are far from a complete understanding of how they function or their importance in areas such as disease and evolution. Five prominent researchers discuss some of the key outstanding questions in enhancer biol.
- 13Ku, C. S.; Naidoo, N.; Wu, M.; Soong, R. Studying the epigenome using next generation sequencing. J. Med. Genet. 2011, 48, 721– 730, DOI: 10.1136/jmedgenet-2011-100242Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs1KktrzP&md5=aa3db24a99915dcb230fb5f340a2d778Studying the epigenome using next generation sequencingKu, Chee Seng; Naidoo, Nasheen; Wu, Mengchu; Soong, RichieJournal of Medical Genetics (2011), 48 (11), 721-730CODEN: JMDGAE; ISSN:0022-2593. (BMJ Publishing Group)A review. The advances in next generation sequencing (NGS) technologies have had a significant impact on epigenomic research. The arrival of NGS technologies has enabled a more powerful sequencing based method-i.e., ChlP-Seq-to interrogate whole genome histone modifications, improving on the conventional microarray based method (ChlP-chip). Similarly, the first human DNA methylome was mapped using NGS technologies. More importantly, studies of DNA methylation and histone modification using NGS technologies have yielded new discoveries and improved our knowledge of human biol. and diseases. The concept that cytosine methylation was restricted to CpG dinucleotides has only been recently challenged by new data generated from sequencing the DNA methylome. Approx. 25% of all cytosine methylation identified in stem cells was in a non-CG context. The non-CG methylation was more enriched in gene bodies and depleted in protein binding sites and enhancers. The recent developments of third generation sequencing technologies have shown promising results of directly sequencing methylated nucleotides and having the ability to differentiate between 5-methylcytosine and 5-hydroxymethylcytosine. The importance of 5-hydroxymethylcytosine remains largely unknown, but it has been found in various tissues. 5-Hydroxymethylcytosine was particularly enriched at promoters and in intragenic regions (gene bodies) but was largely absent from non-gene regions in DNA from human brain frontal lobe tissue. The presence of 5-hydroxymethylcytosine in gene bodies was more pos. correlated with gene expression levels. The importance of studying 5-methylcytosine and 5-hydroxymethylcytosine sep. for their biol. roles will become clearer when more efficient methods to distinguish them are available.
- 14Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K.-C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 2016, 32, 362– 369, DOI: 10.1093/bioinformatics/btv604Google ScholarThere is no corresponding record for this reference.
- 15Jia, C.; He, W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci. Rep. 2016, 6, 38741, DOI: 10.1038/srep38741Google ScholarThere is no corresponding record for this reference.
- 16Lim, D. Y.; Khanal, J.; Tayara, H.; Chong, K. T. iEnhancer-RF: identifying enhancers and their strength by enhanced feature representation using random forest. Chemom. Intell. Lab. Syst. 2021, 212, 104284, DOI: 10.1016/j.chemolab.2021.104284Google ScholarThere is no corresponding record for this reference.
- 17Liu, B.; Li, K.; Huang, D.-S.; Chou, K.-C. iEnhancer-EL identifying enhancers and their strength with ensemble learning approach. Bioinformatics 2018, 34, 3835– 3842, DOI: 10.1093/bioinformatics/bty458Google ScholarThere is no corresponding record for this reference.
- 18Wang, C.; Zou, Q.; Ju, Y.; Shi, H. Enhancer-FRL: improved and robust identification of enhancers and their activities using feature representation learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 967– 975, DOI: 10.1109/TCBB.2022.3204365Google ScholarThere is no corresponding record for this reference.
- 19Gill, M.; Ahmed, S.; Kabir, M.; Hayat, M. A novel predictor for the analysis and prediction of enhancers and their strength via multi-view features and deep forest. Information 2023, 14, 636, DOI: 10.3390/info14120636Google ScholarThere is no corresponding record for this reference.
- 20Nguyen, Q. H.; Nguyen-Vo, T.-H.; Le, N. Q. K.; Do, T. T.; Rahardja, S.; Nguyen, B. P. iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks. BMC Genom. 2019, 20, 951, DOI: 10.1186/s12864-019-6336-3Google ScholarThere is no corresponding record for this reference.
- 21Le, N. Q. K.; Ho, Q.-T.; Nguyen, T.-T.-D.; Ou, Y.-Y. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Briefings Bioinf. 2021, 22, bbab005, DOI: 10.1093/bib/bbab005Google ScholarThere is no corresponding record for this reference.
- 22Niu, K.; Luo, X.; Zhang, S.; Teng, Z.; Zhang, T.; Zhao, Y. iEnhancer-EBLSTM: identifying enhancers and strengths by ensembles of bidirectional long short-term memory. Front. Genet. 2021, 12, 665498, DOI: 10.3389/fgene.2021.665498Google ScholarThere is no corresponding record for this reference.
- 23Cai, L.; Ren, X.; Fu, X.; Peng, L.; Gao, M.; Zeng, X. iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor. Bioinformatics 2021, 37, 1060– 1067, DOI: 10.1093/bioinformatics/btaa914Google ScholarThere is no corresponding record for this reference.
- 24Yang, H.; Wang, S.; Xia, X. iEnhancer-RD: identification of enhancers and their strength using RKPK features and deep neural networks. Anal. Biochem. 2021, 630, 114318, DOI: 10.1016/j.ab.2021.114318Google ScholarThere is no corresponding record for this reference.
- 25Mu, X.; Wang, Y.; Duan, M.; Liu, S.; Li, F.; Wang, X.; Zhang, K.; Huang, L.; Zhou, F. A novel position-specific encoding algorithm (SeqPose) of nucleotide sequences and its application for detecting enhancers. Int. J. Mol. Sci. 2021, 22, 3079, DOI: 10.3390/ijms22063079Google ScholarThere is no corresponding record for this reference.
- 26Luo, H.; Chen, C.; Shan, W.; Ding, P.; Luo, L. iEnhancer-BERT: a novel transfer learning architecture based on DNA-Language model for identifying enhancers and their strength. In International Conference on Intelligent Computing , 2022; pp 153– 165. DOI: 10.1007/978-3-031-13829-4_13 .Google ScholarThere is no corresponding record for this reference.
- 27Li, J.; Wu, Z.; Lin, W.; Luo, J.; Zhang, J.; Chen, Q.; Chen, J. iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. Bioinform. Adv. 2023, 3, vbad043, DOI: 10.1093/bioadv/vbad043Google ScholarThere is no corresponding record for this reference.
- 28Wang, W.; Wu, Q.; Li, C. iEnhancer-DCSA: identifying enhancers via dual-scale convolution and spatial attention. BMC Genom. 2023, 24, 393, DOI: 10.1186/s12864-023-09468-1Google ScholarThere is no corresponding record for this reference.
- 29Wu, H.; Liu, M.; Zhang, P.; Zhang, H. iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information. Briefings Funct. Genomics 2023, 22, 302– 311, DOI: 10.1093/bfgp/elac057Google ScholarThere is no corresponding record for this reference.
- 30Ng, P. dna2vec: consistent vector representations of variable-length k-mers. arXiv 2017, arXiv:1701.06279Google ScholarThere is no corresponding record for this reference.
- 31Iuchi, H.; Matsutani, T.; Yamada, K.; Iwano, N.; Sumi, S.; Hosoda, S.; Zhao, S.; Fukunaga, T.; Hamada, M. Representation learning applications in biological sequence analysis. Comput. Struct. Biotechnol. J. 2021, 19, 3198– 3208, DOI: 10.1016/j.csbj.2021.05.039Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXitFKisrjL&md5=3fc5035b7e5c9ac8df53cc3b7f4d538cRepresentation learning applications in biological sequence analysisIuchi, Hitoshi; Matsutani, Taro; Yamada, Keisuke; Iwano, Natsuki; Sumi, Shunsuke; Hosoda, Shion; Zhao, Shitao; Fukunaga, Tsukasa; Hamada, MichiakiComputational and Structural Biotechnology Journal (2021), 19 (), 3198-3208CODEN: CSBJAC; ISSN:2001-0370. (Elsevier B.V.)A review. Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amt. of rapidly generated biol. (DNA/RNA/protein) sequencing data remains a crit. hurdle. To tackle this issue, the application of natural language processing (NLP) to biol. sequence anal. has received increased attention. In this method, biol. sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biol. sequences. Vectorized biol. sequences can then be applied for function and structure estn., or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biol. research, in the present study, we have reviewed the existing knowledge in representation learning for biol. sequence anal.
- 32Wen, J.; Liu, Y.; Shi, Y.; Huang, H.; Deng, B.; Xiao, X. A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network. BMC Bioinf. 2019, 20, 469, DOI: 10.1186/s12859-019-3039-3Google ScholarThere is no corresponding record for this reference.
- 33Li, W.; Guo, Y.; Wang, B.; Yang, B. Learning spatiotemporal embedding with gated convolutional recurrent networks for translation initiation site prediction. Pattern Recogn. 2023, 136, 109234, DOI: 10.1016/j.patcog.2022.109234Google ScholarThere is no corresponding record for this reference.
- 34Jeffrey, H. J. Chaos game representation of gene structure. Nucleic Acids Res. 1990, 18, 2163– 2170, DOI: 10.1093/nar/18.8.2163Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3cXksFeltrk%253D&md5=2832019a98de2fa2987a2f70e5f32340Chaos game representation of gene structureJeffrey, H. JoelNucleic Acids Research (1990), 18 (8), 2163-70CODEN: NARHAD; ISSN:0305-1048.This paper presents a new method for representing DNA sequences. It permits the representation and investigation of patterns in sequences, visually revealing previously unknown structures. Based on a technique from chaotic dynamics, the method produces a picture of a gene sequence which displays both local and global patterns. The pictures have a complex structure which varies depending on the sequence. The method is termed Chaos Game Representation (CGR). CGR raises a new set of questions about the structure of DNA sequences, and is a new tool for investigating gene structure.
- 35Löchel, H. F.; Eger, D.; Sperlea, T.; Heider, D. Deep learning on chaos game representation for proteins. Bioinformatics 2020, 36, 272– 279, DOI: 10.1093/bioinformatics/btz493Google ScholarThere is no corresponding record for this reference.
- 36LaLonde, R.; Bagci, U. Capsules for object segmentation. arXiv 2018, arXiv:1804.04241Google ScholarThere is no corresponding record for this reference.
- 37Dong, Z.; Lin, S. Research on image classification based on capsnet. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) , 2019; pp 1023– 1026.Google ScholarThere is no corresponding record for this reference.
- 38Guo, Y.; Zhou, D.; Ruan, X.; Cao, J. Variational gated autoencoder-based feature extraction model for inferring disease-miRNA associations based on multiview features. Neural Network. 2023, 165, 491– 505, DOI: 10.1016/j.neunet.2023.05.052Google ScholarThere is no corresponding record for this reference.
- 39Guo, Y.; Zhou, D.; Li, P.; Li, C.; Cao, J. Context-aware poly (a) signal prediction model via deep spatial–temporal neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 8241– 8253, DOI: 10.1109/tnnls.2022.3226301Google ScholarThere is no corresponding record for this reference.
- 40Wang, X.; Guan, Z.; Qian, W.; Cao, J.; Wang, C.; Ma, R. STFuse: infrared and visible image fusion via semisupervised transfer learning. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1– 14, DOI: 10.1109/tnnls.2023.3328060Google ScholarThere is no corresponding record for this reference.
- 41Sabour, S.; Frosst, N.; Hinton, G. E. Dynamic routing between capsules. In Advances in Neural Information Processing Systems , 2017; Vol. 30.Google ScholarThere is no corresponding record for this reference.
- 42Yao, L.; Pang, Y.; Wan, J.; Chung, C.-R.; Yu, J.; Guan, J.; Leung, C.; Chiang, Y.-C.; Lee, T.-Y. ABPCaps: a novel capsule network-based method for the prediction of antibacterial peptides. Appl. Sci. 2023, 13, 6965, DOI: 10.3390/app13126965Google ScholarThere is no corresponding record for this reference.
- 43Huang, Y.; Huang, H.-Y.; Chen, Y.; Lin, Y.-C.-D.; Yao, L.; Lin, T.; Leng, J.; Chang, Y.; Zhang, Y.; Zhu, Z. A robust drug–target interaction prediction framework with capsule network and transfer learning. Int. J. Mol. Sci. 2023, 24, 14061, DOI: 10.3390/ijms241814061Google ScholarThere is no corresponding record for this reference.
- 44Wang, D.; Liang, Y.; Xu, D. Capsule network for protein post-translational modification site prediction. Bioinformatics 2019, 35, 2386– 2394, DOI: 10.1093/bioinformatics/bty977Google Scholar44https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXjvF2lsbc%253D&md5=824e5d94dc1f2ddb865ac97e1fc57be1Capsule network for protein post-translational modification site predictionWang, Duolin; Liang, Yanchun; Xu, DongBioinformatics (2019), 35 (14), 2386-2394CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Computational methods for protein post-translational modification (PTM) site prediction provide a useful approach for studying protein functions. The prediction accuracy of the existing methods has significant room for improvement. A recent deep-learning architecture, Capsule Network (CapsNet), which can characterize the internal hierarchical representation of input data, presents a great opportunity to solve this problem, esp. using small training data. Results: We proposed a CapsNet for predicting protein PTM sites, including phosphorylation, Nlinked glycosylation, N6-acetyllysine, methyl-arginine, S-palmitoyl-cysteine, pyrrolidonecarboxylic- acid and SUMOylation sites. The CapsNet outperformed the baseline convolutional neural network architecture MusiteDeep and other well-known tools in most cases and provided promising results for practical use, esp. in learning from small training data. The capsule length also gives an accurate est. for the confidence of the PTM prediction. We further demonstrated that the internal capsule features could be trained as a motif detector of phosphorylation sites when no kinase-specific phosphorylation labels were provided. In addn., CapsNet generates robust representations that have strong discriminant power in distinguishing kinase substrates from different kinase families. Our study sheds some light on the recognition mechanism of PTMs and applications of CapsNet on other bioinformatic problems.
- 45Khanal, J.; Tayara, H.; Zou, Q.; To Chong, K. DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network. Briefings Bioinf. 2022, 23, bbab492, DOI: 10.1093/bib/bbab492Google ScholarThere is no corresponding record for this reference.
- 46Shang, J.; Peng, C.; Tang, X.; Sun, Y. PhaVIP: Phage VIrion protein classification based on chaos game representation and vision transformer. arXiv 2023, arXiv:2301.12422Google ScholarThere is no corresponding record for this reference.
- 47Löchel, H. F.; Heider, D. Chaos game representation and its applications in bioinformatics. Comput. Struct. Biotechnol. J. 2021, 19, 6263– 6271, DOI: 10.1016/j.csbj.2021.11.008Google Scholar47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB2cblt1Ojsg%253D%253D&md5=8e8167b2552e037e8c046d3dcf24e119Chaos game representation and its applications in bioinformaticsLochel Hannah Franziska; Heider DominikComputational and structural biotechnology journal (2021), 19 (), 6263-6271 ISSN:2001-0370.Chaos game representation (CGR), a milestone in graphical bioinformatics, has become a powerful tool regarding alignment-free sequence comparison and feature encoding for machine learning. The algorithm maps a sequence to 2-dimensional space, while an extension of the CGR, the so-called frequency matrix representation (FCGR), transforms sequences of different lengths into equal-sized images or matrices. The CGR is a generalized Markov chain and includes various properties, which allow a unique representation of a sequence. Therefore, it has a broad spectrum of applications in bioinformatics, such as sequence comparison and phylogenetic analysis and as an encoding of sequences for machine learning. This review introduces the construction of CGRs and FCGRs, their applications on DNA and proteins, and gives an overview of recent applications and progress in bioinformatics.
- 48Kingma, D. P.; Ba, J. Adam: a method for stochastic optimization. arXiv 2014, arXiv:1412.6980Google ScholarThere is no corresponding record for this reference.
- 49Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems , 2019; Vol. 32.Google ScholarThere is no corresponding record for this reference.
- 50Kishk, A.; Elzizy, A.; Galal, D.; Razek, E. A.; Fawzy, E.; Ahmed, G.; Gawish, M.; Hamad, S.; El-Hadidi, M. A hybrid machine learning approach for the phenotypic classification of metagenomic colon cancer reads based on kmer frequency and biomarker profiling. In 2018 9th Cairo International Biomedical Engineering Conference (CIBEC) , 2018; pp 118– 121.Google ScholarThere is no corresponding record for this reference.
- 51Yin, B.; Balvert, M.; Zambrano, D.; Schönhuth, A.; Bohte, S. An image representation based convolutional network for DNA classification. arXiv 2018, arXiv:1806.04931Google ScholarThere is no corresponding record for this reference.
Cited By
This article has not yet been cited by other publications.
Article Views
Altmetric
Citations
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
References
This article references 51 other publications.
- 1Basith, S.; Hasan, M. M.; Lee, G.; Wei, L.; Manavalan, B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Briefings Bioinf. 2021, 22, bbab252, DOI: 10.1093/bib/bbab252There is no corresponding record for this reference.
- 2Corradin, O.; Scacheri, P. Enhancer variants: evaluating functions in common disease. Genome Med. 2014, 6 (10), 85, DOI: 10.1186/s13073-014-0085-3There is no corresponding record for this reference.
- 3Levine, M. Transcriptional enhancers in animal development and evolution. Curr. Biol. 2010, 20, R754– R763, DOI: 10.1016/j.cub.2010.06.070There is no corresponding record for this reference.
- 4Zhang, L.; Yang, Y.; Chai, L.; Li, Q.; Liu, J.; Lin, H.; Liu, L. A deep learning model to identify gene expression level using cobinding transcription factor signals. Briefings Bioinf. 2022, 23, bbab501, DOI: 10.1093/bib/bbab501There is no corresponding record for this reference.
- 5Heinz, S.; Romanoski, C. E.; Benner, C.; Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 2015, 16, 144– 154, DOI: 10.1038/nrm3949There is no corresponding record for this reference.
- 6Furlong, E. E.; Levine, M. Developmental enhancers and chromosome topology. Science 2018, 361, 1341– 1345, DOI: 10.1126/science.aau03206https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhslOrurrJ&md5=c34d974a2b7dd2257a3c1c9546f7984dDevelopmental enhancers and chromosome topologyFurlong, Eileen E. M.; Levine, MichaelScience (Washington, DC, United States) (2018), 361 (6409), 1341-1345CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)A review. Developmental enhancers mediate on/off patterns of gene expression in specific cell types at particular stages during metazoan embryogenesis. They typically integrate multiple signals and regulatory determinants to achieve precise spatiotemporal expression. Such enhancers can map quite far-one megabase or more-from the genes they regulate. How remote enhancers relay regulatory information to their target promoters is one of the central mysteries of genome organization and function. A variety of contrasting mechanisms have been proposed over the years, including enhancer tracking, linking, looping, and mobilization to transcription factories. We argue that extreme versions of these mechanisms cannot account for the transcriptional dynamics and precision seen in living cells, tissues, and embryos. We describe emerging evidence for dynamic three-dimensional hubs that combine different elements of the classical models.
- 7Schoenfelder, S.; Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 2019, 20, 437– 455, DOI: 10.1038/s41576-019-0128-0There is no corresponding record for this reference.
- 8Bauer, D. E.; Orkin, S. H. Hemoglobin switching’s surprise: the versatile transcription factor BCL11A is a master repressor of fetal hemoglobin. Curr. Opin. Genet. Dev. 2015, 33, 62– 70, DOI: 10.1016/j.gde.2015.08.001There is no corresponding record for this reference.
- 9Chen, X.; Xu, H.; Yuan, P.; Fang, F.; Huss, M.; Vega, V. B.; Wong, E.; Orlov, Y. L.; Zhang, W.; Jiang, J. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 2008, 133, 1106– 1117, DOI: 10.1016/j.cell.2008.04.0439https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXnsF2gurw%253D&md5=607c91e79f165fd417659c688a9725daIntegration of external signaling pathways with the core transcriptional network in embryonic stem cellsChen, Xi; Xu, Han; Yuan, Ping; Fang, Fang; Huss, Mikael; Vega, Vinsensius B.; Wong, Eleanor; Orlov, Yuriy L.; Zhang, Weiwei; Jiang, Jianming; Loh, Yuin-Han; Yeo, Hock Chuan; Yeo, Zhen Xuan; Narang, Vipin; Govindarajan, Kunde Ramamoorthy; Leong, Bernard; Shahab, Atif; Ruan, Yijun; Bourque, Guillaume; Sung, Wing-Kin; Clarke, Neil D.; Wei, Chia-Lin; Ng, Huck-HuiCell (Cambridge, MA, United States) (2008), 133 (6), 1106-1117CODEN: CELLB5; ISSN:0092-8674. (Cell Press)Transcription factors (TFs) and their specific interactions with targets are crucial for specifying gene-expression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunopptn. coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biol. as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding sites identifies important features of the transcriptional regulatory networks that define ES-cell identity.
- 10May, D.; Blow, M. J.; Kaplan, T.; McCulley, D. J.; Jensen, B. C.; Akiyama, J. A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C. Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 2012, 44, 89– 93, DOI: 10.1038/ng.1006There is no corresponding record for this reference.
- 11Visel, A.; Blow, M. J.; Li, Z.; Zhang, T.; Akiyama, J. A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C.; Chen, F. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 2009, 457, 854– 858, DOI: 10.1038/nature0773011https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhslCntLg%253D&md5=2780a5b9f819181972b13e0c8ecc7a4bChIP-seq accurately predicts tissue-specific activity of enhancersVisel, Axel; Blow, Matthew J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Ren, Bing; Rubin, Edward M.; Pennacchio, Len A.Nature (London, United Kingdom) (2009), 457 (7231), 854-858CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunopptn. with the enhancer-assocd. protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their assocd. activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biol. and disease on a genome-wide scale.
- 12Pennacchio, L. A.; Bickmore, W.; Dean, A.; Nobrega, M. A.; Bejerano, G. Enhancers: five essential questions. Nat. Rev. Genet. 2013, 14, 288– 295, DOI: 10.1038/nrg345812https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXktVKhu7c%253D&md5=d86155e239743ebb1711130132dc521fEnhancers: five essential questionsPennacchio, Len A.; Bickmore, Wendy; Dean, Ann; Nobrega, Marcelo A.; Bejerano, GillNature Reviews Genetics (2013), 14 (4), 288-295CODEN: NRGAAM; ISSN:1471-0056. (Nature Publishing Group)A review. Although enhancers are crucial and widespread gene-regulatory elements, we are far from a complete understanding of how they function or their importance in areas such as disease and evolution. Five prominent researchers discuss some of the key outstanding questions in enhancer biol.
- 13Ku, C. S.; Naidoo, N.; Wu, M.; Soong, R. Studying the epigenome using next generation sequencing. J. Med. Genet. 2011, 48, 721– 730, DOI: 10.1136/jmedgenet-2011-10024213https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs1KktrzP&md5=aa3db24a99915dcb230fb5f340a2d778Studying the epigenome using next generation sequencingKu, Chee Seng; Naidoo, Nasheen; Wu, Mengchu; Soong, RichieJournal of Medical Genetics (2011), 48 (11), 721-730CODEN: JMDGAE; ISSN:0022-2593. (BMJ Publishing Group)A review. The advances in next generation sequencing (NGS) technologies have had a significant impact on epigenomic research. The arrival of NGS technologies has enabled a more powerful sequencing based method-i.e., ChlP-Seq-to interrogate whole genome histone modifications, improving on the conventional microarray based method (ChlP-chip). Similarly, the first human DNA methylome was mapped using NGS technologies. More importantly, studies of DNA methylation and histone modification using NGS technologies have yielded new discoveries and improved our knowledge of human biol. and diseases. The concept that cytosine methylation was restricted to CpG dinucleotides has only been recently challenged by new data generated from sequencing the DNA methylome. Approx. 25% of all cytosine methylation identified in stem cells was in a non-CG context. The non-CG methylation was more enriched in gene bodies and depleted in protein binding sites and enhancers. The recent developments of third generation sequencing technologies have shown promising results of directly sequencing methylated nucleotides and having the ability to differentiate between 5-methylcytosine and 5-hydroxymethylcytosine. The importance of 5-hydroxymethylcytosine remains largely unknown, but it has been found in various tissues. 5-Hydroxymethylcytosine was particularly enriched at promoters and in intragenic regions (gene bodies) but was largely absent from non-gene regions in DNA from human brain frontal lobe tissue. The presence of 5-hydroxymethylcytosine in gene bodies was more pos. correlated with gene expression levels. The importance of studying 5-methylcytosine and 5-hydroxymethylcytosine sep. for their biol. roles will become clearer when more efficient methods to distinguish them are available.
- 14Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K.-C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 2016, 32, 362– 369, DOI: 10.1093/bioinformatics/btv604There is no corresponding record for this reference.
- 15Jia, C.; He, W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci. Rep. 2016, 6, 38741, DOI: 10.1038/srep38741There is no corresponding record for this reference.
- 16Lim, D. Y.; Khanal, J.; Tayara, H.; Chong, K. T. iEnhancer-RF: identifying enhancers and their strength by enhanced feature representation using random forest. Chemom. Intell. Lab. Syst. 2021, 212, 104284, DOI: 10.1016/j.chemolab.2021.104284There is no corresponding record for this reference.
- 17Liu, B.; Li, K.; Huang, D.-S.; Chou, K.-C. iEnhancer-EL identifying enhancers and their strength with ensemble learning approach. Bioinformatics 2018, 34, 3835– 3842, DOI: 10.1093/bioinformatics/bty458There is no corresponding record for this reference.
- 18Wang, C.; Zou, Q.; Ju, Y.; Shi, H. Enhancer-FRL: improved and robust identification of enhancers and their activities using feature representation learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 967– 975, DOI: 10.1109/TCBB.2022.3204365There is no corresponding record for this reference.
- 19Gill, M.; Ahmed, S.; Kabir, M.; Hayat, M. A novel predictor for the analysis and prediction of enhancers and their strength via multi-view features and deep forest. Information 2023, 14, 636, DOI: 10.3390/info14120636There is no corresponding record for this reference.
- 20Nguyen, Q. H.; Nguyen-Vo, T.-H.; Le, N. Q. K.; Do, T. T.; Rahardja, S.; Nguyen, B. P. iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks. BMC Genom. 2019, 20, 951, DOI: 10.1186/s12864-019-6336-3There is no corresponding record for this reference.
- 21Le, N. Q. K.; Ho, Q.-T.; Nguyen, T.-T.-D.; Ou, Y.-Y. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Briefings Bioinf. 2021, 22, bbab005, DOI: 10.1093/bib/bbab005There is no corresponding record for this reference.
- 22Niu, K.; Luo, X.; Zhang, S.; Teng, Z.; Zhang, T.; Zhao, Y. iEnhancer-EBLSTM: identifying enhancers and strengths by ensembles of bidirectional long short-term memory. Front. Genet. 2021, 12, 665498, DOI: 10.3389/fgene.2021.665498There is no corresponding record for this reference.
- 23Cai, L.; Ren, X.; Fu, X.; Peng, L.; Gao, M.; Zeng, X. iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor. Bioinformatics 2021, 37, 1060– 1067, DOI: 10.1093/bioinformatics/btaa914There is no corresponding record for this reference.
- 24Yang, H.; Wang, S.; Xia, X. iEnhancer-RD: identification of enhancers and their strength using RKPK features and deep neural networks. Anal. Biochem. 2021, 630, 114318, DOI: 10.1016/j.ab.2021.114318There is no corresponding record for this reference.
- 25Mu, X.; Wang, Y.; Duan, M.; Liu, S.; Li, F.; Wang, X.; Zhang, K.; Huang, L.; Zhou, F. A novel position-specific encoding algorithm (SeqPose) of nucleotide sequences and its application for detecting enhancers. Int. J. Mol. Sci. 2021, 22, 3079, DOI: 10.3390/ijms22063079There is no corresponding record for this reference.
- 26Luo, H.; Chen, C.; Shan, W.; Ding, P.; Luo, L. iEnhancer-BERT: a novel transfer learning architecture based on DNA-Language model for identifying enhancers and their strength. In International Conference on Intelligent Computing , 2022; pp 153– 165. DOI: 10.1007/978-3-031-13829-4_13 .There is no corresponding record for this reference.
- 27Li, J.; Wu, Z.; Lin, W.; Luo, J.; Zhang, J.; Chen, Q.; Chen, J. iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. Bioinform. Adv. 2023, 3, vbad043, DOI: 10.1093/bioadv/vbad043There is no corresponding record for this reference.
- 28Wang, W.; Wu, Q.; Li, C. iEnhancer-DCSA: identifying enhancers via dual-scale convolution and spatial attention. BMC Genom. 2023, 24, 393, DOI: 10.1186/s12864-023-09468-1There is no corresponding record for this reference.
- 29Wu, H.; Liu, M.; Zhang, P.; Zhang, H. iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information. Briefings Funct. Genomics 2023, 22, 302– 311, DOI: 10.1093/bfgp/elac057There is no corresponding record for this reference.
- 30Ng, P. dna2vec: consistent vector representations of variable-length k-mers. arXiv 2017, arXiv:1701.06279There is no corresponding record for this reference.
- 31Iuchi, H.; Matsutani, T.; Yamada, K.; Iwano, N.; Sumi, S.; Hosoda, S.; Zhao, S.; Fukunaga, T.; Hamada, M. Representation learning applications in biological sequence analysis. Comput. Struct. Biotechnol. J. 2021, 19, 3198– 3208, DOI: 10.1016/j.csbj.2021.05.03931https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXitFKisrjL&md5=3fc5035b7e5c9ac8df53cc3b7f4d538cRepresentation learning applications in biological sequence analysisIuchi, Hitoshi; Matsutani, Taro; Yamada, Keisuke; Iwano, Natsuki; Sumi, Shunsuke; Hosoda, Shion; Zhao, Shitao; Fukunaga, Tsukasa; Hamada, MichiakiComputational and Structural Biotechnology Journal (2021), 19 (), 3198-3208CODEN: CSBJAC; ISSN:2001-0370. (Elsevier B.V.)A review. Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amt. of rapidly generated biol. (DNA/RNA/protein) sequencing data remains a crit. hurdle. To tackle this issue, the application of natural language processing (NLP) to biol. sequence anal. has received increased attention. In this method, biol. sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biol. sequences. Vectorized biol. sequences can then be applied for function and structure estn., or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biol. research, in the present study, we have reviewed the existing knowledge in representation learning for biol. sequence anal.
- 32Wen, J.; Liu, Y.; Shi, Y.; Huang, H.; Deng, B.; Xiao, X. A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network. BMC Bioinf. 2019, 20, 469, DOI: 10.1186/s12859-019-3039-3There is no corresponding record for this reference.
- 33Li, W.; Guo, Y.; Wang, B.; Yang, B. Learning spatiotemporal embedding with gated convolutional recurrent networks for translation initiation site prediction. Pattern Recogn. 2023, 136, 109234, DOI: 10.1016/j.patcog.2022.109234There is no corresponding record for this reference.
- 34Jeffrey, H. J. Chaos game representation of gene structure. Nucleic Acids Res. 1990, 18, 2163– 2170, DOI: 10.1093/nar/18.8.216334https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3cXksFeltrk%253D&md5=2832019a98de2fa2987a2f70e5f32340Chaos game representation of gene structureJeffrey, H. JoelNucleic Acids Research (1990), 18 (8), 2163-70CODEN: NARHAD; ISSN:0305-1048.This paper presents a new method for representing DNA sequences. It permits the representation and investigation of patterns in sequences, visually revealing previously unknown structures. Based on a technique from chaotic dynamics, the method produces a picture of a gene sequence which displays both local and global patterns. The pictures have a complex structure which varies depending on the sequence. The method is termed Chaos Game Representation (CGR). CGR raises a new set of questions about the structure of DNA sequences, and is a new tool for investigating gene structure.
- 35Löchel, H. F.; Eger, D.; Sperlea, T.; Heider, D. Deep learning on chaos game representation for proteins. Bioinformatics 2020, 36, 272– 279, DOI: 10.1093/bioinformatics/btz493There is no corresponding record for this reference.
- 36LaLonde, R.; Bagci, U. Capsules for object segmentation. arXiv 2018, arXiv:1804.04241There is no corresponding record for this reference.
- 37Dong, Z.; Lin, S. Research on image classification based on capsnet. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) , 2019; pp 1023– 1026.There is no corresponding record for this reference.
- 38Guo, Y.; Zhou, D.; Ruan, X.; Cao, J. Variational gated autoencoder-based feature extraction model for inferring disease-miRNA associations based on multiview features. Neural Network. 2023, 165, 491– 505, DOI: 10.1016/j.neunet.2023.05.052There is no corresponding record for this reference.
- 39Guo, Y.; Zhou, D.; Li, P.; Li, C.; Cao, J. Context-aware poly (a) signal prediction model via deep spatial–temporal neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 8241– 8253, DOI: 10.1109/tnnls.2022.3226301There is no corresponding record for this reference.
- 40Wang, X.; Guan, Z.; Qian, W.; Cao, J.; Wang, C.; Ma, R. STFuse: infrared and visible image fusion via semisupervised transfer learning. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1– 14, DOI: 10.1109/tnnls.2023.3328060There is no corresponding record for this reference.
- 41Sabour, S.; Frosst, N.; Hinton, G. E. Dynamic routing between capsules. In Advances in Neural Information Processing Systems , 2017; Vol. 30.There is no corresponding record for this reference.
- 42Yao, L.; Pang, Y.; Wan, J.; Chung, C.-R.; Yu, J.; Guan, J.; Leung, C.; Chiang, Y.-C.; Lee, T.-Y. ABPCaps: a novel capsule network-based method for the prediction of antibacterial peptides. Appl. Sci. 2023, 13, 6965, DOI: 10.3390/app13126965There is no corresponding record for this reference.
- 43Huang, Y.; Huang, H.-Y.; Chen, Y.; Lin, Y.-C.-D.; Yao, L.; Lin, T.; Leng, J.; Chang, Y.; Zhang, Y.; Zhu, Z. A robust drug–target interaction prediction framework with capsule network and transfer learning. Int. J. Mol. Sci. 2023, 24, 14061, DOI: 10.3390/ijms241814061There is no corresponding record for this reference.
- 44Wang, D.; Liang, Y.; Xu, D. Capsule network for protein post-translational modification site prediction. Bioinformatics 2019, 35, 2386– 2394, DOI: 10.1093/bioinformatics/bty97744https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXjvF2lsbc%253D&md5=824e5d94dc1f2ddb865ac97e1fc57be1Capsule network for protein post-translational modification site predictionWang, Duolin; Liang, Yanchun; Xu, DongBioinformatics (2019), 35 (14), 2386-2394CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Computational methods for protein post-translational modification (PTM) site prediction provide a useful approach for studying protein functions. The prediction accuracy of the existing methods has significant room for improvement. A recent deep-learning architecture, Capsule Network (CapsNet), which can characterize the internal hierarchical representation of input data, presents a great opportunity to solve this problem, esp. using small training data. Results: We proposed a CapsNet for predicting protein PTM sites, including phosphorylation, Nlinked glycosylation, N6-acetyllysine, methyl-arginine, S-palmitoyl-cysteine, pyrrolidonecarboxylic- acid and SUMOylation sites. The CapsNet outperformed the baseline convolutional neural network architecture MusiteDeep and other well-known tools in most cases and provided promising results for practical use, esp. in learning from small training data. The capsule length also gives an accurate est. for the confidence of the PTM prediction. We further demonstrated that the internal capsule features could be trained as a motif detector of phosphorylation sites when no kinase-specific phosphorylation labels were provided. In addn., CapsNet generates robust representations that have strong discriminant power in distinguishing kinase substrates from different kinase families. Our study sheds some light on the recognition mechanism of PTMs and applications of CapsNet on other bioinformatic problems.
- 45Khanal, J.; Tayara, H.; Zou, Q.; To Chong, K. DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network. Briefings Bioinf. 2022, 23, bbab492, DOI: 10.1093/bib/bbab492There is no corresponding record for this reference.
- 46Shang, J.; Peng, C.; Tang, X.; Sun, Y. PhaVIP: Phage VIrion protein classification based on chaos game representation and vision transformer. arXiv 2023, arXiv:2301.12422There is no corresponding record for this reference.
- 47Löchel, H. F.; Heider, D. Chaos game representation and its applications in bioinformatics. Comput. Struct. Biotechnol. J. 2021, 19, 6263– 6271, DOI: 10.1016/j.csbj.2021.11.00847https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB2cblt1Ojsg%253D%253D&md5=8e8167b2552e037e8c046d3dcf24e119Chaos game representation and its applications in bioinformaticsLochel Hannah Franziska; Heider DominikComputational and structural biotechnology journal (2021), 19 (), 6263-6271 ISSN:2001-0370.Chaos game representation (CGR), a milestone in graphical bioinformatics, has become a powerful tool regarding alignment-free sequence comparison and feature encoding for machine learning. The algorithm maps a sequence to 2-dimensional space, while an extension of the CGR, the so-called frequency matrix representation (FCGR), transforms sequences of different lengths into equal-sized images or matrices. The CGR is a generalized Markov chain and includes various properties, which allow a unique representation of a sequence. Therefore, it has a broad spectrum of applications in bioinformatics, such as sequence comparison and phylogenetic analysis and as an encoding of sequences for machine learning. This review introduces the construction of CGRs and FCGRs, their applications on DNA and proteins, and gives an overview of recent applications and progress in bioinformatics.
- 48Kingma, D. P.; Ba, J. Adam: a method for stochastic optimization. arXiv 2014, arXiv:1412.6980There is no corresponding record for this reference.
- 49Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems , 2019; Vol. 32.There is no corresponding record for this reference.
- 50Kishk, A.; Elzizy, A.; Galal, D.; Razek, E. A.; Fawzy, E.; Ahmed, G.; Gawish, M.; Hamad, S.; El-Hadidi, M. A hybrid machine learning approach for the phenotypic classification of metagenomic colon cancer reads based on kmer frequency and biomarker profiling. In 2018 9th Cairo International Biomedical Engineering Conference (CIBEC) , 2018; pp 118– 121.There is no corresponding record for this reference.
- 51Yin, B.; Balvert, M.; Zambrano, D.; Schönhuth, A.; Bohte, S. An image representation based convolutional network for DNA classification. arXiv 2018, arXiv:1806.04931There is no corresponding record for this reference.
Supporting Information
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.4c00546.
GC content distribution of the data sets, FCGR images for two example sequences and PR curves for the ablation experiment, hyperparameters of CapsEnhancer, a case study, and description of the dynamic routing algorithm (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.