ACS Publications. Most Trusted. Most Cited. Most Read
Delving into Eukaryotic Origins of Replication Using DNA Structural Features
My Activity
  • Open Access
Article

Delving into Eukaryotic Origins of Replication Using DNA Structural Features
Click to copy article linkArticle link copied!

  • Venkata Rajesh Yella*
    Venkata Rajesh Yella
    Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur 522502, Andhra Pradesh, India
    *Email: [email protected]
  • Akkinepally Vanaja
    Akkinepally Vanaja
    Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur 522502, Andhra Pradesh, India
    KL College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
  • Umasankar Kulandaivelu
    Umasankar Kulandaivelu
    KL College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
  • Aditya Kumar*
    Aditya Kumar
    Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
    *Email: [email protected]
    More by Aditya Kumar
Open PDFSupporting Information (2)

ACS Omega

Cite this: ACS Omega 2020, 5, 23, 13601–13611
Click to copy citationCitation copied!
https://doi.org/10.1021/acsomega.0c00441
Published June 1, 2020

Copyright © 2020 American Chemical Society. This publication is licensed under CC-BY-NC-ND.

Abstract

Click to copy section linkSection link copied!

DNA replication in eukaryotes is an intricate process, which is precisely synchronized by a set of regulatory proteins, and the replication fork emanates from discrete sites on chromatin called origins of replication (Oris). These spots are considered as the gateway to chromosomal replication and are stereotyped by sequence motifs. The cognate sequences are noticeable in a small group of entire origin regions or totally absent across different metazoans. Alternatively, the use of DNA secondary structural features can provide additional information compared to the primary sequence. In this article, we report the trends in DNA sequence-based structural properties of origin sequences in nine eukaryotic systems representing different families of life. Biologically relevant DNA secondary structural properties, namely, stability, propeller twist, flexibility, and minor groove shape were studied in the sequences flanking replication start sites. Results indicate that Oris in yeasts show lower stability, more rigidity, and narrow minor groove preferences compared to genomic sequences surrounding them. Yeast Oris also show preference for A-tracts and the promoter element TATA box in the vicinity of replication start sites. On the contrary, Drosophila melanogaster, humans, and Arabidopsis thaliana do not have such features in their Oris, and instead, they show high preponderance of G-rich sequence motifs such as putative G-quadruplexes or i-motifs and CpG islands. Our extensive study applies the DNA structural feature computation to delve into origins of replication across organisms ranging from yeasts to mammals and including a plant. Insights from this study would be significant in understanding origin architecture and help in designing new algorithms for predicting DNA trans-acting factor recognition events.

Copyright © 2020 American Chemical Society

Note

This paper was originally published ASAP on June 1, 2020. Due to a production error, there were mistakes in Table 1. The corrected version was reposted on June 2, 2020.

Introduction

Click to copy section linkSection link copied!

The genetic information between generations is preserved by the mechanism of DNA replication, and it forms the basis for heredity. The precision in the process is achieved by activating it only once during each cycle of cell division. (1) The invigoration of DNA replication initiation is accomplished through two successive regulated steps: origin licensing and origin activation. (2) The first step, origin licensing, occurs in the G1 phase, where highly conserved replication initiation proteins are sequentially loaded on a DNA sequence known as origins of replication (Oris) to form the pre-replicative complex (preRC). (3) Next, origin activation occurs through the S phase when additional proteins are recruited to the preRC. The unwinding and the synthesis of two daughter DNAs start simultaneously after the origin activation. A diverse set of factors such as regulatory proteins, remodelers, replicator sequences, small noncoding RNA, epigenetic mechanisms, chromatin configuration and domains, nuclear envelopes, and subcellular compartmentalization culminate the replication mechanisms temporally and spatially. (2) Oris, the genetic determinants of the cell, form the anchoring site for the replication machinery and are known to have signature sequence context.
In consonance with the classic replicon model, the replication is initiated through recognition of cis-acting elements by trans-acting initiator proteins. (4) However, the cis-activation is strictly applicable to bacteria and a few lower eukaryotes, or the code is not yet clear in complex genomes. In metazoans, it has been revealed that various proteins involved in the orchestration of replication are conserved, and on the other hand, the genetic determinants are rapidly evolving. In Saccharomyces cerevisiae, Oris have AT-rich autonomously replicating sequences (ARSs), which encompass 11–17 nucleotide consensus motifs. (5) In S. pombe, the ARSs are not observed, but long AT tracts can play their role instead. (6) Origins of a yeast species S. japonicus have sequences with GC-rich content. (7) In comparison, metazoan Oris are preponderant in genomic regions with high GC content, (8) such as CpG islands and G-rich elements, the G-quadruplex-forming sequences. Further, it has been reported that an origin G-rich repeated element (OGRE), which can favor the G-quadruplex, was identified in the majority of D. melanogaster and mammalian Oris. (8) The metazoan initiation sites may have some common genetic determinants, but robust consensus sequence signals are not displayed widely. Concisely, research indicates that primary DNA sequences at Oris can vary in various families of life, and the current understanding of Oris is still far from satisfactory.
The replication process, which involves intricate DNA-protein recognition events (access and orchestration of replication machinery) and melting of origin DNA, occurs in the context of the three-dimensional structure of DNA. Hence, it is pivotal to study the Oris as the descriptors of the DNA structure. B-DNA is the most common form of structure subsisting in physiological conditions. It can display extensive structural polymorphism both at a gross level 3D structure (non B-DNA) and local scales (DNA structural features). Literature has reported more than 20 noncanonical DNA secondary structures, (9) while biologically relevant structures include cruciform DNA, G-quadruplexes, intercalated motifs, hairpins, triple helices (H-DNA), and slipped structures. Recent work suggests that DNA conformation may diverge from the canonical B-form in approximately ∼13% (or 394.2 mb) of the human genome. (10) Quantitative studies on various X-ray crystal structures of free B-DNA illustrated that naked DNA contours a significant amount of conformational space, (11,12) and sequence-dependent perturbations have been understood. (13) The sequence-dependent fluctuations in local helicoidal parameters (rotational and translational), which lead to variations at a gross level, occur in DNA melting, DNA-protein recognition, nucleosome organization, chromatin configuration, and genome integrity. Extensive experimental measurements, theoretical simulations, and computational studies have led to the establishment of various DNA structural features, namely, duplex stability, intrinsic curvature, protein-induced bendability, groove shape, topography, (11,14−23) and DNA crookedness. (24) Further studies on the phylogenetics of conserved regions/regulatory elements showed that the local topography and DNA shape are found to be conserved and evolutionarily constrained compared to the DNA base sequences in various vertebrates. (25) Various tools based on DNA structural features were designed for large-scale applications in genomics during the last decade. (26) In our earlier works, we have extensively studied DNA structural parameters to characterize prokaryotic and eukaryotic promoters, (21,25) to understand DNA transcription factor recognition (27) and conservation of DNA structural properties in promoter regions, (28) to predict promoter regions in genomes, (19) to delineate TATA-containing and TATA-less promoters, (17) to link the DNA structure with gene expression variability, (16,18) etc. Fewer studies have been done on characterization of origins of replication in S. cerevisiae, (29)D. melanogaster, and humans. (30) These reports are limited to one or two organisms with smaller genomic regions surrounding the replication start sites. The current study focuses on DNA structural features in the vicinity of origin start sites in nine different eukaryotic systems including species of yeasts, humans, and plants.

Results and Discussion

Click to copy section linkSection link copied!

The origins of replication in eukaryotic systems, Saccharomyces cerevisiae, Kluyveromyces lactis, Candida glabrata, Pichia pastoris, Schizosaccharomyces pombe, Drosophila melanogaster, mice, humans, and Arabidopsis thaliana, are examined, and they are regarded as model organisms for discerning eukaryotic replication at different levels and aspects. The systems vary in their genomic GC content (36–42%) (Table 1) and nucleotide base composition and are widely investigated, and experimentally inferred replisome information is available. (31) Here, we performed the computation of sequence composition and sequence-dependent structural features in the Oris of these systems to understand their similarities and differences. Our analysis has included long flanking regions such as −5000 to +5000 relative to the starting genomic loci for origins of replication listed in the DeOri database. (31) All throughout this manuscript, we refer to the position “0” as the Ori start site and these regions as Ori regions or Ori sequences. The strategy implemented for this work is outlined in Figure 1.

Figure 1

Figure 1. Analysis outline for computation of DNA structural features or motifs in origins of replication in the eukaryotic genomes. Experimentally mapped endogenous replication initiation sites are retrieved from the DeOri database (http://tubic.org/deori/). (31) Various different physiologically relevant DNA structural features and motifs, including stability, propeller twist, minor groove shape, G-quadruplexes, i-motifs, etc., were computed using lookup tables of di/tri/tetra nucleotide descriptors or regular expression patterns.

Table 1. Genomic Features of Data Sets and Compositional Analysis (Most Occurring Di, Tri, Tetra, and Heptamers) of Oris in Eukaryotesa
      most represented oligonucleotides
name of organismgenome size (in mb)no. of Chrno. of Ori sitesgenome GC %GC % of Ori regionditritetrahepta
Saccharomyces cerevisiae12.071635738.1536.05AA (11.59)AAA (4.49)AAAA (1.85)AAAAAAA (0.217)
TT (11.53)TTT (4.43)TTTT (1.79)TTTTTTT (0.198)
AT (9.77)AAT (3.23)AAAT (1.22)ATATATA (0.097)
TA (8.56)ATT (3.21)ATTT (1.21)TATATAT (0.091)
TG (6.20)ATA (3.05)ATAT (1.14)AAAAAAT (0.077)
Kluyveromyces lactis10.68614438.7635.02TT (11.57)AAA (4.16)AAAA (1.49)AAAAAAA (0.133)
AA (11.34)TTT (4.16)TTTT (1.44)TTTTTTT (0.116)
AT (10.12)AAT (3.32)ATAT (1.24)ATATATA (0.105)
TA (8.95)ATT (3.29)ATTT (1.21)TATATAT (0.101)
TG (6.42)TAT (3.21)AAAT (1.17)AAATAAA (0.063)
Candida glabrata strain CBS1384.811325639.0333.85AA (11.61)AAA (4.29)AAAA (1.64)AAAAAAA (0.128)
TT (11.41)TTT (4.20)TTTT (1.61)TTTTTTT (0.116)
AT (10.73)TAT (3.61)ATAT (1.41)ATGTTTT (0.102)
TA (9.82)ATA (3.58)AAAT (1.31)ACCAAAA (0.087)
TG (6.57)AAT (3.54)TATT (1.25)TTTTTAT (0.084)
Pichia pastoris9.35429441.1339.51AA (10.16)AAA (3.42)AAAA (1.21)AAAAAAA (0.091)
TT (10.08)TTT (3.40)TTTT (1.19)TTTTTTT (0.088)
AT (8.39)AAT (2.75)AAAT (0.89)AAAAAAT (0.046)
TA (6.92)ATT (2.68)AATT (0.86)ATTTTTT (0.045)
GA (6.66)TTG (2.40)ATTT (0.85)TCTTTTT (0.043)
Schizosaccharomyces pombe12.59334536.0630.79AA (14.27)TTT (6.26)TTTT (2.81)TTTTTTT (0.459)
TT (14.25)AAA (6.24)AAAA (2.78)AAAAAAA (0.428)
AT (10.55)AAT (4.12)AAAT (1.75)TTTATTT (0.162)
TA (9.80)ATT (4.09)ATTT (1.74)ATTTTTT (0.160)
TG (5.53)TAA (3.59)TAAA (1.59)AAATAAA (0.160)
Drosophila melanogaster (S2)137.554715642.2943.8TT (9.40)TTT (3.38)TTTT (1.23)AAAAAAA (0.140)
AA (9.37)AAA (3.37)AAAA (1.23)TTTTTTT (0.131)
AT (7.59)ATT (2.53)ATTT (0.97)TTTATTT (0.071)
CA (6.84)AAT (2.51)AAAT (0.96)AAAAATA (0.062)
TG (6.84)TTG (2.16)AATT (0.83)TTTTATT (0.061)
Arabidopsis thaliana119.165153336.0541.53AA (9.75)AAA (3.33)AAAA (1.15)AAAAAAA (0.099)
TT (9.64)TTT (3.28)TTTT (1.14)TTTTTTT (0.098)
AT (7.70)AGA (2.41)AAGA (0.87)AAGAAGA (0.059)
GA (7.13)TCT (2.39)TCTT (0.87)TCTTCTT (0.059)
TC (7.13)GAA (2.32)AGAA (0.85)AGAAGAA (0.057)
mouse (P19)2716.962024124250.38CT (7.97)CTG (2.65)TTTT (0.87)TTTTTTT (0.167)
AG (7.87)CAG (2.61)AAAA (0.82)AAAAAAA (0.153)
TG (7.65)TTT (2.45)CTGG (0.78)TGTGTGT (0.100)
CA (7.58)AAA (2.37)CCAG (0.78)ACACACA (0.099)
CC (7.21)CCT (2.33)CCTG (0.78)GTGTGTG (0.096)
human (MCF7)3259.562394,1954157.76GG (10.33)GGG (3.61)CAGG (1.20)CCCTCCC (0.064)
CC (10.32)CCC (3.61)CCTG (1.20)GGGAGGG (0.063)
CT (8.17)CAG (3.31)CTGG (1.15)GGCTGGG (0.062)
AG (8.17)CTG (3.30)CCCC (1.14)GGGCAGG (0.062)
TG (8.05)CCT (3.00)GGGG (1.14)CCCAGCC (0.062)
a

Origins sequences were downloaded from the DeOri database for computing the GC percent and k-mer calculations (k = 2, 3, 4 and 7). The numbers in the parenthesis indicate the absolute percentage frequency of oligonucleotides observed in the data sets. Five most occurring words are displayed in the table. The frequency of k-mer depends on GC percentage and also the arrangement of nucleotide steps which is characteristic of Ori regions. Different cell- types dataset word composition for D. melanogaster, mouse and human is shown in Supplementary Table 2.

Ori Regions Display Signature Structural Profiles

In recent years, intensive experimental and computational analysis has been carried out on the sequence-dependent secondary structural properties of regulatory genomic sequences. Also, studies have been carried out on DNA secondary structure or shape analysis of origins of replication in yeast (29,32) and D. melanogaster and humans. (30) It has been observed that common DNA shape signatures in D. melanogaster and humans are marked by elevated propeller twist, roll angles, and minor groove width and reduced helical twist. (30) The studies are on few data sets and one or two data systems. The current study focuses on the characterization of the DNA structure in the vicinity of origins of replication. To explore the structural properties, we first aligned sequences encompassing origins of replication, relative to their Ori start sites [−5000 nt to +5000 nt relative to Ori where 0 indicates the genomic beginning locus for Ori sequences compiled in the DeOri database] and then computed the structural features of DNA sequences using lookup tables. We calculated the structural features for every k-mer (k = 2–4) in each DNA sequence as described in Materials and also in our previous work. (18,33) The averaged structural profile based on the nucleotide position can be considered as a consensus numerical signature or structural profile for a given organism. (34)Figure 2 displays the signature features, DNA duplex stability, melting temperature, propeller twist, bendability (DNase 1 and NPP models), and groove shape (minor groove width) of Ori regions of S. cerevisiae, K. lactis, S. pombe, D. melanogaster, humans, and A. thaliana.

Figure 2

Figure 2. DNA structural profiles of S. cerevisiae, K. lactis, S. pombe, D. melanogaster, human, and A. thaliana Ori sequences. The x-axis in all the plots represents the sequences spanning from the −5000 to +5000 region with respect to Ori start sites. The rows indicate the property, while the columns represent genomes. Average free energy, normalized melting temperature, propeller twist, flexibility (two models, DNase 1 sensitivity and nucleosome positioning preference), and minor groove width were shown. The models of normalized melting temperature, DNase 1 sensitivity, and nucleosome positioning preference measure the properties in arbitrary units. Blue-colored error bars indicate the standard error of the mean property values. Experimentally identified genomic locations of Ori start sites are retrieved from the DeOri database (http://tubic.org/deori/). The y-axis for each structural property is maintained with equal ranges.

In S. cerevisiae and K. lactis, the low negative free-energy value is observed from the Ori start sites spanning the region up to 1000 nucleotides relative to the Ori start site or more extended region in S. cerevisiae, and a sharp free-energy maximum is observed around the nucleotides (298 and 327) (Figure 2). Meanwhile, in S. pombe, the free-energy profile is typically with a radical departure from highly stable to less stable sequences from the vicinity of Ori start sites. The highly stable region spans up to the −1000 region from 0, while the less stable region is extended to 2000 nucleotides. To understand the unexpected behavior of S. pombe, we have computed the word composition in the abovementioned regions separately. Composition analysis revealed that the region −1000 to −1 shows the preponderance of steps CCACCG, GCGGTC, GACCAC, CTGGGC, CGGGCC, and CTGGCG at least 4 times more compared to the region 0 to 2000. In contrary, the latter region displays higher preference for TTTTTT, TATTTA, AAAAAA, and AATTTA at least 4 times compared to the former region. Overall, yeasts display a low-stability region in the vicinity of Ori regions. In D. melanogaster, humans, and Arabidopsis, the trends of the free-energy profiles are quite reversed with high-stability peaks in the region downstream of Oris. The melting temperature profiles in Figure 2 are similar to free-energy profiles. It should be noted that lower DNA stability or melting temperature is mainly influenced by AT/GC composition. AT-rich sequences are intrinsically prone to melting, and in our study, we have observed these regions at Oris in lower eukaryotes. The results in this study are comparable to our previous work on free-energy profiles. The low-stability region or maxima in core promoters is a characteristic structural feature of all classes of bacteria and eukaryotes. (18,19,21,23,28) However, in Oris, the low-stability region spans over a broad area up to 1500 nucleotides relative to Ori start sites in yeast (while in promoters, it spans to only −200 to −300 nucleotides relative to Ori start sites), (16) and the trends are not observed in drosophila, plants, and mammals. Melting of the dsDNA origin is essential for propagation of replication fork. Several initiator proteins and helicases orchestrate this process. (35) The exact mechanism of DNA melting and unwinding is not clearly understood due to lack of high-resolution structures. (36) The AT-rich regions can enhance easy replication. However, this principle only applies to prokaryotes (37) and lower eukaryotes (Figure 2).
Another dinucleotide property, the propeller twist, displays alike profiles for free energy and melting temperature. The propeller minima are observed at 255, 328, and 628 for S. cerevisiae, K. lactis, and S. pombe, respectively. Meanwhile, in D. melanogaster, high propeller angles are observed in the immediate downstream of the Ori start site (Figure 2). The propeller twist angle is the rotation of nucleotide bases in a base pair and influences the rigidity of DNA. Sequences with higher negative propeller twist values are more rigid (A-tracts). The thermodynamic dinucleotide models, like stability and melting temperature, and the conformational property, propeller twist, revealed here the differences between Ori regions and the surrounding regions in six eukaryotes. A recent study has utilized six helicoidal properties for predicting origins in S. cerevisiae based on the significant differences between Ori and non-Ori sequences. (38) Researchers have reported a prediction accuracy of 84% with the tool PseKNC for S. cerevisiae Ori sequences. (39) Meanwhile, another study was developed for human Oris for Hela cell types. (40) So, we have compared our results by plotting the six rotational and translational features for six systems to see whether the tool can be applied globally (Supplementary figure 1). The trends are consistent with the three dinucleotide features studied. In S. cerevisiae and K. lactis, Oris display lower roll, tilt, slide, and shift compared to the flanking sequences. In contrast, quite opposite trends in the profiles are observed in D. melanogaster and humans. Here, we suggest that these tools can be applied for all species by understanding the differences of these properties across species with additional strategies for implementation in Oris of flies, mammals, plants, etc.
The trinucleotide bendability models (DNase 1 and NPP) can predict flexibility of DNA in the context of genomic-scale experiments. The two models revealed that the Ori regions in yeasts are rigid compared to surrounding sequences, while mammal and plant Oris are highly flexible. Earlier work by Chen et al. on 270 replication origins in S. cerevisiae showed that replication origins are significantly rigid relative to neighboring genomic DNA. (32) Our result on S. cerevisiae is consistent with their work. It is known that rigid DNA in genomes can enhance the sliding of DNA binding proteins. (34,41,42) The proteins of replication machinery may utilize the property of DNA rigidity for scanning the genomes or efficient orchestration of replication machinery at Oris in lower eukaryotes. The common theme of regulatory regions such as promoters and origins is that they have nucleosome-free regions or the DNA in these sequences is less conductive for nucleosome formation. (43)
Further, the DNA shape feature, groove shape, reveals that yeasts and fungi prefer narrow minor grooved sequences in the vicinity of Oris. Contrastingly in D. melanogaster, humans, and A. thaliana, wider minor grooves are predicted near the Oris with longer sequences. Here, we have used minor groove preferences over larger regions of DNA. Our results on minor groove width and propeller twist are consistent with earlier published results on D. melanogaster. (30) It has been observed that common DNA shape signatures in D. melanogaster and humans are marked by elevated propeller twist, roll angles, and minor groove width and reduced helical twist. (30) Altogether, Oris in lower organisms are less stable and rigid and prefer narrow minor grooves, while humans and Arabidopsis show quite opposite trends with high GC content, high stability, and flexible and wider minor groove sequence preference. These observations could be due to the prevalence of CpG islands and GC-rich sequence motifs in these genomic regions. (2) The structural features observed for Oris can be comparable to promoter features reported in earlier research studies with an exception where CpG islands are not observed in promoters of D. melanogaster. (17,44) However, one key difference in profiles of Oris and promoters is that the structural feature signatures can extend up to 5000 nucleotides surrounding Ori start sites, while the signals can extend up to 1000 nt flanking transcription start sites in mammals. (19) In summary, the unique structural signatures demarcate Oris from surrounding genomic regions in eukaryotes.
The DNA replication initiation program is highly flexible, origins may be different in various cell lineages, and cell type-specific origins display unique epigenetic signatures. (2) So, it is necessary to understand the structural features of cell type Oris in eukaryotes. Here, we have also carried out a separate structural feature computation for various cell types in D. melanogaster, mice, and humans. The data sets retrieved from DeOri constitute three cell types for D. melanogaster (Kc, Bg3, and S2), three for the mouse (ES, MEF, and P19), and three for humans (K562, MCF7, and Hela). The Ori sequences of three different cell types in the same species display similar structural profiles. However, we cannot conclude the commonalities in mice and humans as the data sets in human Hela and all three cell types in the mouse are too small for statistical comparisons.

Ori Sequences Are Enriched with Characteristic Sequence and Structural Motifs

We also revisited the earlier studied features such as GC content and sequence word composition to supplement the structural property preferences. The similarities and distinctions in the structural signatures of Ori regions in the above-shown systems can be ascribable to varying nucleotide base compositions along the sequence or due to selective preference for a few oligonucleotides. Table 1 lists the preponderant word frequencies or k-mers (k = 2, 3, 4, and 7) in the sequences, in between start and end positions of origins of replication (listed in the DeOri database), for the nine systems. Word compositions for various cell types in D. melanogaster, mice, and humans have been also carried out (Supplementary table 2). The Oris have typical nucleotide composition with preference for AT-rich k-mers in lower eukaryotes and plants (Table 1 and Supplementary table 2). The dinucleotides (AA and TT), trinucleotides (AAA and TTT), tetranucleotides (AAAA and TTTT), and the heptanucleotides (AAAAAAA, TTTTTTT) are over-represented in the Oris of S. cerevisiae, K. lactis, Candida albicans, S. pombe, and D. melanogaster, while in the case of humans, they are enriched with G- or C-rich heptamer sequences, for instance, CCCTCCC, GGGAGGG, GGCTGGG, GGGCAGG, and GGGTGGG. Our results are in line with recent work reported by Lin’s group. (45) The authors extensively investigated sequence motifs in Oris using the MEME tool and reported that CpG-rich sequence motifs were observed in humans, mice, and A. thaliana, while three yeasts, K. lactis, P. pastoris, and S. pombe, and D. melanogaster display preferences for AT-rich motifs. It should be noted that though D. melanogaster has a similar composition to yeasts, the trends of structural property profiles are in congruence with that of humans (Figure 2). At a closer inspection of word composition, we observed long repeats of CA or TG and TA steps. The cell type-specific composition analysis also reveals common trends (Supplementary table 2). In D. melanogaster, the heptamers with CA steps are observed in all cell types (S2, Bg3, and K2). Mouse data sets (MEF, P19, ES1, and ES2) have A-tracts and CA-containing oligonucleotides. Human cell types MCF7 and K562 have similar word composition with GGGAGGG or its complementary sequence CCCTCCC being enriched, while the Hela data sets (Hela1 and Hela2) show abundance for A-tracts. The cell type similarities and differences are also consistent with earlier published results. (45) However, it should be noted that the ES1 and ES2 data sets for mouse and human Hela data sets are too small statistically or in a genome-wide scale to derive strong conclusions. (45) The high incidence of AT-rich sequences in Oris of lower eukaryotes is emulated in their lower DNA duplex stability, higher propeller angles, and rigidity. Higher eukaryotes, like humans in this data set, seem to be enriched with G-quadruplexes forming G4-motifs, i-motifs, and oligo G-tracts, besides A-tracts. In continuity, we have analyzed for the preponderance of various structural motifs along with CpG islands in detail (Table 2 and Figure 3).

Figure 3

Figure 3. Positional distribution of (a) A-tracts, (b) G-tracts, G-quadruplexes, and intercalated motifs, and (c) CpG islands in Ori regions of various eukaryotes. The regular expressions “A7 or T7”, “G7 or C7”, “G3–5N1–7G3–5N1–7G3–5N1–7G3–5” and “C3–5N1–7C3–5N1–7C3–5N1–7C3–5” are searched in the −5000 to +5000 region relative to origin start sites and summed for each 200 nucleotide bin for defining A-tracts, G-tracts, G-quadruplexes, and intercalated motifs. In yeasts (S. cerevisiae, K. lactis and S. pombe), A-tracts are prevalent in the vicinity of Oris, while in D. melanogaster and humans, G-tracts, G-quadruplexes, and i-motifs are preferred. CpG islands are observed in D. melanogaster, humans, and A. thaliana. CpG islands in −5000 to +5000 regions are searched using “CpG island searcher” program with a 500 nt window. (46)

Table 2. Propensity of Well Characterized Sequence Motifs in Oris in Eukaryotesa
organismi-motif densityG-quad densityA-tractsG-tractsARSTATA box
S. cerevisiae0.010.020.990.110.260.95
K. lactis0.000.010.940.080.130.89
P. pastoris0.010.020.940.040.070.75
C. glabrata0.090.050.960.350.210.93
S. pombe0.010.011.000.080.330.97
D. melanogaster0.190.200.960.330.280.92
A. thaliana0.020.020.980.040.210.88
mouse0.640.660.920.700.090.61
human0.570.570.860.340.100.49
a

Densities of i-motifs, G-quadruplexes, A-tracts, G-tracts, autonomously replicating sequences (ARS), and TATA boxes were shown in the table. One thousand mer sequences downstream to the Ori start sites were considered in this table.

The replication process involves the generation of ssDNA, which can provide an opportunity for the formation of secondary structure elements such as i-motifs, G-quadruplexes, and cruciform DNA. The structures may affect both the fidelity and processability of the polymerization reaction. It is not yet clear how the organisms handle the genome instability and how regions are conserved in metazoans. H-DNA can induce the stalling of the replication machinery. (47,48) Here, we report the preponderance of structurally constrained B-DNA sequence motifs (A-tracts and G-tracts) and non-B-DNA-forming sequence motifs (G-quadruplexes and i-motifs). The occurrence of some well characterized sequence elements, like oligo-A or G-tracts and G4 motifs, in the Ori regions of nine eukaryotic organisms are listed in Table 2. It is clearly seen that Oris of S. cerevisiae, S. pombe, and D. melanogaster are highly enriched in oligo-A tracts while moderately enriched in TATA box-like sequences (Table 2). G-tracts, another structural motif, are observed in D. melanogaster and humans along with putative G-quadruplex and i-motif sequences. Further, the earlier established feature of CpG islands in D. melanogaster and human origins of replication is now revealed in A. thaliana. Altogether, the composition and motif search analysis reveal that the motif preferences in origins of replication of different systems are dissimilar, yeasts being AT-rich, particularly A-tracts, while mammals have a high preference for GC-rich motifs. Though the GC composition is different in various eukaryotes, the common principle of conservation of antinucleosomal sequences (A-tracts, G-tracts, and G4 motifs) is ubiquitous in eukaryotic origins.

Eukaryotic Origins of Replication May Be Linked to Promoter Regions

The promoters are crucial for transcription, and their activity is conferred by the stereotypical sequence motifs Inr (initiator element), TATA box, BRE (TFIIB recognition element), DPE (downstream promoter element), etc. at a well-defined location relative to the transcription initial sites. (20) The origins of replication have a similar chromatin environment and share some genetic features to that of transcription-activating sequences or promoters. (2) Mounting evidence showed that the Oris are inclined to sequence positions in the vicinity of transcriptional start sites (TSSs). (2,49) The commonly noticed links between eukaryotic replication and transcription are due to shared nucleosome-depleted regions. (50) In metazoans, the Oris are concentrated near the core promoter regions. (8) Further, it can be due to preferential association with CpG islands in both promoter regions and origins of replication. (51,52) In yeast, they are associated with ARS and antinucleosomal sequences and precisely positioned nucleosomes (+1 and −1 nucleosome). (53) In yeasts, the distance between Ori start sites and transcription start sites is less than 500 nucleotides in 31.46% of the sequences studied. (54) So, we have addressed the link between Oris and promoters by analyzing distribution of consensus transcription factor binding sequences or promoter elements in the vicinity of Ori start sites. We have searched for the known core promoter elements in the Ori regions. We observed that there are no common trends on relation between Oris and promoters in all the systems studied. However, few promoter elements are prominent in majority of the systems (Supplementary figure 3). Figure 4 shows the density of general transcription factor binding sites in yeasts and A. thaliana. The distribution of TATA boxes [consensus site - TATAWAWR] in S. cerevisiae, K. lactis, P. pastoris, and S. pombe is shown in blue color, and the distribution of BREu [SSRCGCC], DCE-I [CTTC], DCE-III [AGC], and Pause-button [KCGRWCG] of A. thaliana is shown in green-colored bar plots. A preponderance of TATA boxes is observed in all species of yeasts in our data set, with peak occurrence approximately at positions 200, 400, 800, and 600 for S. cerevisiae, K. lactis, P. pastoris, and S. pombe respectively (Figure 4a). However, it should be noted that the Ori regions in yeasts are AT–rich, and natural enrichment of TATA boxes can be observed. The plant genome, A. thaliana, displays typical results in connection with Oris and promoters (Figure 4b). The promoter elements, BREu, DCE-I, DCE-III, and Pause–button, are overly represented in these regions. From this result, we speculate that the TATA box-containing genes are associated with origins of replication in yeasts. However, the apparent link can be observed in A. thaliana, and few core promoter elements have been preponderantly found, suggesting that promoters and origins of replication are linked together.

Figure 4

Figure 4. Positional distribution of promoter sequence elements in Ori regions in (a) yeasts and (b) A. thaliana. The plot shows preponderance of the TATA box [TATAWAWR] in Ori sequences [−5000 to +5000 relative to 0 Ori start sites] in the lower yeast species S. cerevisiae, K. lactis, P. pastoris, and S. pombe. Plots with green-colored bars indicate the occurrence of promoter elements BREu [SSRCGCC], DCE-I [CTTC], DCE-III [AGC], and Pause-button [KCGRWCG] in A. thaliana. The IUPAC nucleotide code is K = G or T, R = A or G, and W = A or T. Promoter sequence motif information was retrieved from the literature (eukaryotic core promoters and the functional basis of transcription initiation). Positional distribution of promoter sequence elements in Ori regions in all systems are also displayed in Supplementary figure 3.

Conclusions

Click to copy section linkSection link copied!

Our comprehensive work focuses on unveiling DNA structural features in the origins of replication of eukaryotic systems and concludes that eukaryotic Oris have characteristic signature structural profiles. We observed that Oris of lower eukaryotes are more meltable and rigid compared to surrounding sequences. The complex replication process depends on the interaction between cis-regulatory modules and a set of regulatory proteins. The structural signals may help in the interaction to make DNA nucleosome-free (anti-nucleosomal sequences such as A-tracts) and easy to melt (reduced free energy for DNA melting). This work is the conceptual update to the current knowledge of Ori sequences in the region where the replication fork emanates. The molecular mechanisms regulating DNA replication may be highly conserved, but the secondary structural elements of Oris vary from yeast, invertebrates to vertebrates, and plants. Further, the CG-rich sequence motifs, which act as hot spots for DNA methylation in higher eukaryotes, suggest that the epigenetic features may modulate the replication mechanism precisely. Our approach can warrant a better understanding of mechanisms involved in the replication. Further unraveling the DNA structure in dormant, constitutive, and facultative Oris will be an outlook from this work.

Materials and Methods

Click to copy section linkSection link copied!

Origins of Replication Data Sets

Experimentally mapped endogenous replication initiation sites (Table 1) are retrieved from DeOri version 6 (http://tubic.org/deori/). (31) The database features the eukaryotic DNA replication origins identified by genome-wide experimental studies. The genomic locations of Oris for Saccharomyces cerevisiae, Kluyveromyces lactis, Candida glabrata strain CBS138, Pichia pastoris, Schizosaccharomyces pombe, Drosophila melanogaster, mice, humans, and Arabidopsis thaliana are retrieved from the database. It should be noted that the current experimental method such as Chip-Seq, SNS-seq (sequencing of RNA-primed short nascent DNA strand), and replication bubble and Okazaki fragment-based methods cannot determine the replication start sites precisely, or the resolution of the methods varies from few bases to kilobases. (49,55) They can only limit some small regions, which contain Oris. (38) In this work, we have chosen the starting locus of the Ori regions provided by DeOri and refer to them as Ori start sites.
The genome locations are mapped to the genomes, and sequences of −5000 to 5000 nucleotides relative to Ori start sites (position 0 is the genomic start location provided by DeOri) are extracted for the analysis. The numbers of sequences used in this study are 357, 144, 256, 294, 345, 7156, 2412, 94,195, and 1533 for Saccharomyces cerevisiae, Kluyveromyces lactis, Candida glabrata strain CBS138, Pichia pastoris, Schizosaccharomyces pombe, Drosophila melanogaster, mice, humans, and Arabidopsis thaliana, respectively. The data set covers the various families of life in eukaryotes and can thus be used for conclusive representations. Whole-genome sequences for S. cerevisiae, Kluyveromyces lactis, Candida glabrata strain CBS138, Pichia pastoris, Schizosaccharomyces pombe, Drosophila melanogaster, and Arabidopsis thaliana were retrieved from NCBI data bank (https://www.ncbi.nlm.nih.gov/genome). Mouse (mm8) and human (hg19) genomes were downloaded from the UCSC Genome site (http://genome.ucsc.edu/). (56) Further, we have also included tissue-specific Ori data sets for D. melanogaster (Kc, Bg3, and S2), mice (MEF, P19, ES1, and ES2), and humans (MCF7, K562, Hela1, and Hela2) in our analysis. The sequence length of −5000 to 5000 relative to Ori start sites was chosen based on empirical evidence observed in DNA structural features such as free energy and flexibility relative to the Ori in humans. It was observed that the span of signature regions in humans extends beyond 4000 nucleotides in both sides of the Ori. Hence, for comparative analysis, we selected the same region for all the organisms in our data set.

DNA Structural Profile Enumeration

The initiation of replication involves the search of proper Ori sequences by the replication machinery proteins, orchestration of different trans factors, DNA-protein recognition, formation of stable complexes, and finally, the open complex formation. Here, we used k-mer (k = 2–4) nucleotide descriptors to relate various processes of replication. DNA stability and melting temperature models can explain the sequence preferences for open complex formation, DNA bending models may explain sequence search and orchestration, and propeller twist and minor groove models explain DNA-protein recognition. The propeller twist can also explain the rigidity of DNA.

DNA Stability and Melting Temperature Models

DNA duplex stability or free energy of the fragment of DNA depends on hydrogen bonds between bases and the stacking interaction between consecutive bases and can be computed by summing the free energy of the constituent dinucleotides. (20) The melting temperature of a DNA fragment directly depends on DNA stability. A dinucleotide descriptor based on the collection of melting studies of 108 oligonucleotides (57) has been used for computing DNA stability. Further, another model based on normalized dinucleotide empirical melting temperature descriptors (58) was also utilized for comparison.

DNA Bendability Models

Bending flexibility or bendability of a sequence is the anisotropic bending of DNA under the influence of DNA-binding factors such as proteins. The bending propensity of sequences was computed using genome context-derived trinucleotide descriptors, the DNase 1 sensitivity model (59) and nucleosome positioning preference (NPP) model. (60) Higher negative values from the DNase 1 sensitivity model or a less positive number from nucleosome positioning preference (NPP) indicates more rigidity of a given DNA fragment.

Propeller Twist and Minor Groove Width

The propeller twist is the inherent or induced non-planarity of a base pair quantified as the relative angle of rotation in between paired bases about their common y-axis. DNA sequences with higher negative propeller twist values are more rigid (A-tracts). The propeller twist angle values based on X-ray crystal structures (12) are retrieved from DiProDB (dinucleotide property database) (61) for all 16 dinucleotides. In a B-DNA strand, grooves arise due to the two glycosyl bonds branching off from one side of the hydrogen-bonded base pair. At minor grooves, backbones appear closer together, and it is a key factor for indirect readout for DNA-protein recognition. The tetranucleotide model derived from protein-DNA crystal structure complexes (14) is employed for minor groove width computation in this study.
With the knowledge of each unique dimer/trimer/tetramer feature, one can utilize a one-nucleotide sliding window model to convert a given sequence into a numerical profile. Smoothing windows with the size of 15 nucleotides (corresponding to 14 dinucleotide steps) for dinucleotide models and 30 nucleotides for tri or tetranucleotide structural descriptors were employed based on our previous studies. (18,20,21)

Computation of Structural Motifs

A DNA G-quadruplex is defined as a four-stranded DNA structure that is composed of stacked guanine tetrads. (62) G-quadruplex-forming sequences in the genomes are envisaged from the primary sequence of contextual DNA. A putative G-quadruplex consensus sequence has been identified using a simple pattern match, G3–5N1–7G3–5N1–7G3–5N1–7G3–5, (63) where N represents the linker nucleobases and can be any of four nucleotides. The complementary sequences on the other strand of the G-quadruplex [C3–5N1–7C3–5N1–7C3–5C1–7C3–5] can form an intercalated motif. We have looked for i-motifs (intercalated motifs) separately as its significant role in human genome has been depicted in a recent study. (64) G-quadruplex motifs, i-motifs, A-tracts, and G-tracts are computed using pattern search methods. Long stretches of A or G can act as antinucleosomal sequences. A-tracts constitute a stretch of four or more continuous runs of A/T base pairs excluding a flexible TA dinucleotide step. G-tracts (G7 or C7) are also computed as poly(A); poly(G) can act as an antinucleosomal sequence. (65,66)

CpG Island Calculations and Promoter Motif Element Search

CpG islands (CGIs) are described as DNA sequences with length greater than 500 nucleotides, GC percentage ≥55, and the ratio of observed/expected CpG content ≥0.65. CGI start locations in the Ori regions are predicted using a published method. (46)

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.0c00441.

  • (Supplementary figure 1) Structural profiles of DNA helicoidal parameters for Ori sequences, (Supplementary figure 2) a profile of DNA structural properties of tissue-specific Ori sequences, and (Supplementary figure 3) positional distribution of promoter elements in Ori sequences (PDF)

  • (Supplementary table 1) Characteristic structural feature values observed in Ori sequences and (Supplementary table 2) oligonucleotide compositional analysis of Ori sequences (XLS)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Authors
  • Authors
    • Akkinepally Vanaja - Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur 522502, Andhra Pradesh, IndiaKL College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
    • Umasankar Kulandaivelu - KL College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
  • Author Contributions

    V.R.Y. and A.K. conceived the project and performed data analysis. A.V. helped in data analysis and interpretation. All the authors read and approved the manuscript.

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

The authors are grateful to the Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Government of India for the grant (ECR/2017/001100/LS) and for supporting A.V. with her JRF. We would like to thank the management of Koneru Lakshmaiah Education Foundation for helping us with the necessary resources.

Abbreviation

Click to copy section linkSection link copied!

Oris

origins of replication

References

Click to copy section linkSection link copied!

This article references 66 other publications.

  1. 1
    Masai, H.; Matsumoto, S.; You, Z.; Yoshizawa-Sugata, N.; Oda, M. Eukaryotic chromosome DNA replication: where, when, and how?. Annu. Rev. Biochem. 2010, 79, 89130,  DOI: 10.1146/annurev.biochem.052308.103205
  2. 2
    Aladjem, M. I.; Redon, C. E. Order from clutter: selective interactions at mammalian replication origins. Nat. Rev. Genet. 2017, 18, 101116,  DOI: 10.1038/nrg.2016.141
  3. 3
    Fragkos, M.; Ganier, O.; Coulombe, P.; Méchali, M. DNA replication origin activation in space and time. Nat. Rev. Mol. Cell Biol. 2015, 16, 360374,  DOI: 10.1038/nrm4002
  4. 4
    Jacob, F.; Brenner, S. On the regulation of DNA synthesis in bacteria: the hypothesis of the replicon. C R Hebd. Seances Acad. Sci. 1963, 256, 298300
  5. 5
    Marahrens, Y.; Stillman, B. A yeast chromosomal origin of DNA replication defined by multiple functional elements. Science 1992, 255, 817823,  DOI: 10.1126/science.1536007
  6. 6
    Dai, J.; Chuang, R. Y.; Kelly, T. J. DNA replication origins in the Schizosaccharomyces pombe genome. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 337342,  DOI: 10.1073/pnas.0408811102
  7. 7
    Xu, J.; Yanagisawa, Y.; Tsankov, A. M.; Hart, C.; Aoki, K.; Kommajosyula, N.; Steinmann, K. E.; Bochicchio, J.; Russ, C.; Regev, A.; Rando, O. J.; Nusbaum, C.; Niki, H.; Milos, P.; Weng, Z.; Rhind, N. Genome-wide identification and characterization of replication origins by deep sequencing. Genome Biol. 2012, 13, R27,  DOI: 10.1186/gb-2012-13-4-r27
  8. 8
    Cayrou, C.; Coulombe, P.; Puy, A.; Rialle, S.; Kaplan, N.; Segal, E.; Méchali, M. New insights into replication origin characteristics in metazoans. Cell Cycle 2012, 11, 658667,  DOI: 10.4161/cc.11.4.19097
  9. 9
    Ghosh, A.; Bansal, M. A glossary of DNA structures from A to Z. Acta Crystallogr. D. Biol. Crystallogr. 2003, 59, 620626,  DOI: 10.1107/S0907444903003251
  10. 10
    Guiblet, W. M.; Cremona, M. A.; Cechova, M.; Harris, R. S.; Kejnovská, I.; Kejnovsky, E.; Eckert, K.; Chiaromonte, F.; Makova, K. D. Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate. Genome Res. 2018, 28, 17671778,  DOI: 10.1101/gr.241257.118
  11. 11
    Marathe, A.; Karandur, D.; Bansal, M. Small local variations in B-form DNA lead to a large variety of global geometries which can accommodate most DNA-binding protein motifs. BMC Struct. Biol. 2009, 9, 24,  DOI: 10.1186/1472-6807-9-24
  12. 12
    Gorin, A. A.; Zhurkin, V. B.; Wima, K. B-DNA twisting correlates with base-pair morphology. J. Mol. Biol. 1995, 247, 3448,  DOI: 10.1006/jmbi.1994.0120
  13. 13
    Drew, H. R.; Dickerson, R. E. Structure of a B-DNA dodecamer. III. Geometry of hydration. J. Mol. Biol. 1981, 151, 535556,  DOI: 10.1016/0022-2836(81)90009-7
  14. 14
    Rohs, R.; West, S. M.; Sosinsky, A.; Liu, P.; Mann, R. S.; Honig, B. The role of DNA shape in protein-DNA recognition. Nature 2009, 461, 12481253,  DOI: 10.1038/nature08473
  15. 15
    Morey, C.; Mookherjee, S.; Rajasekaran, G.; Bansal, M. DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes. Plant Physiol. 2011, 156, 13001315,  DOI: 10.1104/pp.110.167809
  16. 16
    Yella, V. R.; Bansal, M. DNA structural features and architecture of promoter regions play a role in gene responsiveness of S. cerevisiae. J. Bioinform. Comput. Biol. 2013, 11, 1343001,  DOI: 10.1142/S0219720013430014
  17. 17
    Yella, V. R.; Bansal, M. DNA structural features of eukaryotic TATA-containing and TATA-less promoters. FEBS Open Bio 2017, 7, 324334,  DOI: 10.1002/2211-5463.12166
  18. 18
    Yella, V. R.; Kumar, A.; Bansal, M. DNA Structure and Promoter Engineering. In Systems and Synthetic Biology; Singh, V.; Dhar, P. K., Eds. Springer Netherlands: Dordrecht %@ 978–94–017-9514-2, 2015; pp 241254.
  19. 19
    Yella, V. R.; Kumar, A.; Bansal, M. Identification of putative promoters in 48 eukaryotic genomes on the basis of DNA free energy. Sci. Rep. 2018, 8, 4520,  DOI: 10.1038/s41598-018-22129-8
  20. 20
    Kumar, A.; Bansal, M. Modulation of Gene Expression by Gene Architecture and Promoter Structure. In Bioinformatics in the Era of Post Genomics and Big Data Abdurakhmonov, I. Y., Ed. IntechOpen: 2018; pp 3753.
  21. 21
    Bansal, M.; Kumar, A.; Yella, V. R. Role of DNA sequence based structural features of promoters in transcription initiation and gene expression. Curr. Opin. Struct. Biol. 2014, 25, 7785,  DOI: 10.1016/j.sbi.2014.01.007
  22. 22
    Kanhere, A.; Bansal, M. Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res. 2005, 33, 31653175,  DOI: 10.1093/nar/gki627
  23. 23
    Kumar, A.; Bansal, M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res. 2017, 24, 2535,  DOI: 10.1093/dnares/dsw045
  24. 24
    Marin-Gonzalez, A.; Vilhena, J. G.; Moreno-Herrero, F.; Perez, R. DNA Crookedness Regulates DNA Mechanical Properties at Short Length Scales. Phys. Rev. Lett. 2019, 122, 048102  DOI: 10.1103/PhysRevLett.122.048102
  25. 25
    Parker, S. C. J.; Hansen, L.; Abaan, H. O.; Tullius, T. D.; Margulies, E. H. Local DNA topography correlates with functional noncoding regions of the human genome. Science 2009, 324, 389392,  DOI: 10.1126/science.1169050
  26. 26
    Meysman, P.; Marchal, K.; Engelen, K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform. Biol. Insights 2012, 6, 155168,  DOI: 10.4137/BBI.S9426
  27. 27
    Yella, V. R.; Bhimsaria, D.; Ghoshdastidar, D.; Rodríguez-Martínez, J. A.; Ansari, A. Z.; Bansal, M. Flexibility and structure of flanking DNA impact transcription factor affinity for its core motif. Nucleic Acids Res. 2018, 46, 1188311897,  DOI: 10.1093/nar/gky1057
  28. 28
    Kumar, A.; Manivelan, V.; Bansal, M. Structural features of DNA are conserved in the promoter region of orthologous genes across different strains of Helicobacter pylori. FEMS Microbiol. Lett. 2016, 363, fnv207,  DOI: 10.1093/femsle/fnw207
  29. 29
    Cao, X. Q.; Zeng, J.; Yan, H. Structural properties of replication origins in yeast DNA sequences. Phys. Biol. 2008, 5, 036012  DOI: 10.1088/1478-3975/5/3/036012
  30. 30
    Comoglio, F.; Schlumpf, T.; Schmid, V.; Rohs, R.; Beisel, C.; Paro, R. High-resolution profiling of Drosophila replication start sites reveals a DNA shape and chromatin signature of metazoan origins. Cell Rep. 2015, 11, 821834,  DOI: 10.1016/j.celrep.2015.03.070
  31. 31
    Gao, F.; Luo, H.; Zhang, C. T. DeOri: a database of eukaryotic DNA replication origins. Bioinformatics 2012, 28, 15511552,  DOI: 10.1093/bioinformatics/bts151
  32. 32
    Chen, W.; Feng, P.; Lin, H. Prediction of replication origins by calculating DNA structural properties. FEBS Lett. 2012, 586, 934938,  DOI: 10.1016/j.febslet.2012.02.034
  33. 33
    Kumar, A.; Bansal, M. Characterization of structural and free energy properties of promoters associated with Primary and Operon TSS in Helicobacter pylori genome and their orthologs. J. Biosci. 2012, 37, 423431,  DOI: 10.1007/s12038-012-9214-6
  34. 34
    Cao, X. Q.; Zeng, J.; Yan, H. Physical signals for protein-DNA recognition. Phys. Biol. 2009, 6, 036012  DOI: 10.1088/1478-3975/6/3/036012
  35. 35
    Bleichert, F.; Botchan, M. R.; Berger, J. M. Mechanisms for initiating cellular DNA replication. Science 2017, 355, eaah6317,  DOI: 10.1126/science.aah6317
  36. 36
    Gai, D.; Chang, Y. P.; Chen, X. S. Origin DNA melting and unwinding in DNA replication. Curr. Opin. Struct. Biol. 2010, 20, 756762,  DOI: 10.1016/j.sbi.2010.08.009
  37. 37
    Rajewska, M.; Wegrzyn, K.; Konieczny, I. AT-rich region and repeated sequences - the essential elements of replication origins of bacterial replicons. FEMS Microbiol. Rev. 2012, 36, 408434,  DOI: 10.1111/j.1574-6976.2011.00300.x
  38. 38
    Dao, F. Y.; Lv, H.; Wang, F.; Feng, C. Q.; Ding, H.; Chen, W.; Lin, H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2019, 35, 20752083,  DOI: 10.1093/bioinformatics/bty943
  39. 39
    Li, W.-C.; Deng, E.-Z.; Ding, H.; Chen, W.; Lin, H. iORI-PseKNC: A predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemom. Intell. Lab. Syst. 2015, 141, 100106,  DOI: 10.1016/j.chemolab.2014.12.011
  40. 40
    Zhang, C. J.; Tang, H.; Li, W. C.; Lin, H.; Chen, W.; Chou, K. C. iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2016, 7, 6978369793,  DOI: 10.18632/oncotarget.11975
  41. 41
    Gowers, D. M.; Wilson, G. G.; Halford, S. E. Measurement of the contributions of 1D and 3D pathways to the translocation of a protein along DNA. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 1588315888,  DOI: 10.1073/pnas.0505378102
  42. 42
    Halford, S. E.; Marko, J. F. How do site-specific DNA-binding proteins find their targets?. Nucleic Acids Res. 2004, 32, 30403052,  DOI: 10.1093/nar/gkh624
  43. 43
    Jiang, C.; Pugh, B. F. Nucleosome positioning and gene regulation: advances through genomics. Nat. Rev. Genet. 2009, 10, 161172,  DOI: 10.1038/nrg2522
  44. 44
    Hoskins, R. A.; Landolin, J. M.; Brown, J. B.; Sandler, J. E.; Takahashi, H.; Lassmann, T.; Yu, C.; Booth, B. W.; Zhang, D.; Wan, K. H.; Yang, L.; Boley, N.; Andrews, J.; Kaufman, T. C.; Graveley, B. R.; Bickel, P. J.; Carninci, P.; Carlson, J. W.; Celniker, S. E. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 2011, 21, 182192,  DOI: 10.1101/gr.112466.110
  45. 45
    Dao, F. Y.; Lv, H.; Zulfiqar, H.; Yang, H.; Su, W.; Gao, H.; Ding, H.; Lin, H. A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform 2020,  DOI: 10.1093/bib/bbaa017
  46. 46
    Takai, D.; Jones, P. A. Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 37403745,  DOI: 10.1073/pnas.052410099
  47. 47
    Mirkin, E. V.; Mirkin, S. M. Replication fork stalling at natural impediments. Microbiol. Mol. Biol. Rev. 2007, 71, 1335,  DOI: 10.1128/MMBR.00030-06
  48. 48
    Kaushik Tiwari, M.; Adaku, N.; Peart, N.; Rogers, F. A. Triplex structures induce DNA double strand breaks via replication fork collapse in NER deficient cells. Nucleic Acids Res. 2016, 44, 77427754,  DOI: 10.1093/nar/gkw515
  49. 49
    Prioleau, M. N.; MacAlpine, D. M. DNA replication origins-where do we begin?. Genes Dev. 2016, 30, 16831697,  DOI: 10.1101/gad.285114.116
  50. 50
    Cayrou, C.; Ballester, B.; Peiffer, I.; Fenouil, R.; Coulombe, P.; Andrau, J.-C.; van Helden, J.; Méchali, M. The chromatin environment shapes DNA replication origin organization and defines origin classes. Genome Res. 2015, 25, 18731885,  DOI: 10.1101/gr.192799.115
  51. 51
    Antequera, F. Structure, function and evolution of CpG island promoters. Cell. Mol. Life Sci. 2003, 60, 16471658,  DOI: 10.1007/s00018-003-3088-6
  52. 52
    Delgado, S.; Gómez, M.; Bird, A.; Antequera, F. Initiation of DNA replication at CpG islands in mammalian chromosomes. EMBO J. 1998, 17, 24262435,  DOI: 10.1093/emboj/17.8.2426
  53. 53
    Eaton, M. L.; Galani, K.; Kang, S.; Bell, S. P.; MacAlpine, D. M. Conserved nucleosome positioning defines replication origins. Genes Dev. 2010, 24, 748753,  DOI: 10.1101/gad.1913210
  54. 54
    Li, W.-C.; Zhong, Z.-J.; Zhu, P.-P.; Deng, E.-Z.; Ding, H.; Chen, W.; Lin, H. Sequence analysis of origins of replication in the Saccharomyces cerevisiae genomes. Front. Microbiol. 2014, 5, 574,  DOI: 10.3389/fmicb.2014.00574
  55. 55
    Gilbert, D. M. Evaluating genome-scale approaches to eukaryotic DNA replication. Nat Rev Genet 2010, 11, 673684,  DOI: 10.1038/nrg2830
  56. 56
    Tyner, C.; Barber, G. P.; Casper, J.; Clawson, H.; Diekhans, M.; Eisenhart, C.; Fischer, C. M.; Gibson, D.; Gonzalez, J. N.; Guruvadoo, L.; Haeussler, M.; Heitner, S.; Hinrichs, A. S.; Karolchik, D.; Lee, B. T.; Lee, C. M.; Nejad, P.; Raney, B. J.; Rosenbloom, K. R.; Speir, M. L.; Villarreal, C.; Vivian, J.; Zweig, A. S.; Haussler, D.; Kuhn, R. M.; Kent, W. J. The UCSC Genome Browser database: 2017 update. Nucleic Acids Res. 2017, 45, D626D634,  DOI: 10.1093/nar/gkw1134
  57. 57
    SantaLucia, J., Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 14601465,  DOI: 10.1073/pnas.95.4.1460
  58. 58
    Anselmi, C.; De Santis, P.; Paparcone, R.; Savino, M.; Scipioni, A. From the sequence to the superstructural properties of DNAs. Biophys. Chem. 2002, 95, 2347,  DOI: 10.1016/S0301-4622(01)00246-0
  59. 59
    Brukner, I.; Sánchez, R.; Suck, D.; Pongor, S. Trinucleotide models for DNA bending propensity: comparison of models based on DNaseI digestion and nucleosome packaging data. J. Biomol. Struct. Dyn. 1995, 13, 309317,  DOI: 10.1080/07391102.1995.10508842
  60. 60
    Satchwell, S. C.; Drew, H. R.; Travers, A. A. Sequence periodicities in chicken nucleosome core DNA. J. Mol. Biol. 1986, 191, 659675,  DOI: 10.1016/0022-2836(86)90452-3
  61. 61
    Friedel, M.; Nikolajewa, S.; Sühnel, J.; Wilhelm, T. DiProDB: a database for dinucleotide properties. Nucleic Acids Res. 2009, 37, D37D40,  DOI: 10.1093/nar/gkn597
  62. 62
    Qin, Y.; Hurley, L. H. Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie 2008, 90, 11491171,  DOI: 10.1016/j.biochi.2008.02.020
  63. 63
    Todd, A. K.; Johnston, M.; Neidle, S. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 2005, 33, 29012907,  DOI: 10.1093/nar/gki553
  64. 64
    Zeraati, M.; Langley, D. B.; Schofield, P.; Moye, A. L.; Rouet, R.; Hughes, W. E.; Bryan, T. M.; Dinger, M. E.; Christ, D. I-motif DNA structures are formed in the nuclei of human cells. Nat. Chem. 2018, 10, 631637,  DOI: 10.1038/s41557-018-0046-3
  65. 65
    Drew, H. R.; Travers, A. A. DNA bending and its relation to nucleosome positioning. J. Mol. Biol. 1985, 186, 773790,  DOI: 10.1016/0022-2836(85)90396-1
  66. 66
    Tsankov, A.; Yanagisawa, Y.; Rhind, N.; Regev, A.; Rando, O. J. Evolutionary divergence of intrinsic and trans-regulated nucleosome positioning sequences reveals plastic rules for chromatin organization. Genome Res. 2011, 21, 18511862,  DOI: 10.1101/gr.122267.111

Cited By

Click to copy section linkSection link copied!

This article is cited by 12 publications.

  1. Subhojit Paul, Kaushika Olymon, Gustavo Sganzerla Martinez, Sharmilee Sarkar, Venkata Rajesh Yella, Aditya Kumar. MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI. Journal of Chemical Information and Modeling 2024, 64 (7) , 2705-2719. https://doi.org/10.1021/acs.jcim.3c02017
  2. Akkinepally Vanaja, Venkata Rajesh Yella. Delineation of the DNA Structural Features of Eukaryotic Core Promoter Classes. ACS Omega 2022, 7 (7) , 5657-5669. https://doi.org/10.1021/acsomega.1c04603
  3. Liujiang Song, Tomoko Hasegawa, Nolan J Brown, Jacquelyn J Bower, Richard J Samulski, Matthew L Hirsch. . Nucleic Acids Research 2025, 53 (3) https://doi.org/10.1093/nar/gkaf013
  4. Patrycja Obara, Paweł Wolski, Tomasz Pańczyk. Insights into the Molecular Structure, Stability, and Biological Significance of Non-Canonical DNA Forms, with a Focus on G-Quadruplexes and i-Motifs. Molecules 2024, 29 (19) , 4683. https://doi.org/10.3390/molecules29194683
  5. James G Davies, Georgina E Menzies, . Utilising biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning. Bioinformatics Advances 2024, https://doi.org/10.1093/bioadv/vbae125
  6. Mireille Bétermier, Lawrence A. Klobutcher, Eduardo Orias, . Programmed chromosome fragmentation in ciliated protozoa: multiple means to chromosome ends. Microbiology and Molecular Biology Reviews 2023, 87 (4) https://doi.org/10.1128/mmbr.00184-22
  7. Fumiaki Uchiumi. Biological roles of loop structures. 2023, 171-181. https://doi.org/10.1016/B978-0-12-818787-6.00001-1
  8. Hemanth Kari, Surya Manikhanta Sowri Bandi, Aditya Kumar, Venkata Rajesh Yella. DeePromClass: Delineator for Eukaryotic Core Promoters employing Deep Neural Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2022, 14 , 1-1. https://doi.org/10.1109/TCBB.2022.3163418
  9. Feng Wu, Runtao Yang, Chengjin Zhang, Lina Zhang. A deep learning framework combined with word embedding to identify DNA replication origins. Scientific Reports 2021, 11 (1) https://doi.org/10.1038/s41598-020-80670-x
  10. Akkinepally Vanaja, Sarada Prasanna Mallick, Umasankar Kulandaivelu, Aditya Kumar, Venkata Rajesh Yella. Symphony of the DNA flexibility and sequence environment orchestrates p53 binding to its responsive elements. Gene 2021, 803 , 145892. https://doi.org/10.1016/j.gene.2021.145892
  11. Sharmilee Sarkar, Upalabdha Dey, Trust Boitumelo Khohliwe, Venkata Rajesh Yella, Aditya Kumar. Analysis of nucleoid‐associated protein‐binding regions reveals DNA structural features influencing genome organization in Mycobacterium tuberculosis. FEBS Letters 2021, 595 (19) , 2504-2521. https://doi.org/10.1002/1873-3468.14178
  12. Upalabdha Dey, Sharmilee Sarkar, Valentina Teronpi, Venkata Rajesh Yella, Aditya Kumar. G-quadruplex motifs are functionally conserved in cis-regulatory regions of pathogenic bacteria: An in-silico evaluation. Biochimie 2021, 184 , 40-51. https://doi.org/10.1016/j.biochi.2021.01.017

ACS Omega

Cite this: ACS Omega 2020, 5, 23, 13601–13611
Click to copy citationCitation copied!
https://doi.org/10.1021/acsomega.0c00441
Published June 1, 2020

Copyright © 2020 American Chemical Society. This publication is licensed under CC-BY-NC-ND.

Article Views

3209

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. Analysis outline for computation of DNA structural features or motifs in origins of replication in the eukaryotic genomes. Experimentally mapped endogenous replication initiation sites are retrieved from the DeOri database (http://tubic.org/deori/). (31) Various different physiologically relevant DNA structural features and motifs, including stability, propeller twist, minor groove shape, G-quadruplexes, i-motifs, etc., were computed using lookup tables of di/tri/tetra nucleotide descriptors or regular expression patterns.

    Figure 2

    Figure 2. DNA structural profiles of S. cerevisiae, K. lactis, S. pombe, D. melanogaster, human, and A. thaliana Ori sequences. The x-axis in all the plots represents the sequences spanning from the −5000 to +5000 region with respect to Ori start sites. The rows indicate the property, while the columns represent genomes. Average free energy, normalized melting temperature, propeller twist, flexibility (two models, DNase 1 sensitivity and nucleosome positioning preference), and minor groove width were shown. The models of normalized melting temperature, DNase 1 sensitivity, and nucleosome positioning preference measure the properties in arbitrary units. Blue-colored error bars indicate the standard error of the mean property values. Experimentally identified genomic locations of Ori start sites are retrieved from the DeOri database (http://tubic.org/deori/). The y-axis for each structural property is maintained with equal ranges.

    Figure 3

    Figure 3. Positional distribution of (a) A-tracts, (b) G-tracts, G-quadruplexes, and intercalated motifs, and (c) CpG islands in Ori regions of various eukaryotes. The regular expressions “A7 or T7”, “G7 or C7”, “G3–5N1–7G3–5N1–7G3–5N1–7G3–5” and “C3–5N1–7C3–5N1–7C3–5N1–7C3–5” are searched in the −5000 to +5000 region relative to origin start sites and summed for each 200 nucleotide bin for defining A-tracts, G-tracts, G-quadruplexes, and intercalated motifs. In yeasts (S. cerevisiae, K. lactis and S. pombe), A-tracts are prevalent in the vicinity of Oris, while in D. melanogaster and humans, G-tracts, G-quadruplexes, and i-motifs are preferred. CpG islands are observed in D. melanogaster, humans, and A. thaliana. CpG islands in −5000 to +5000 regions are searched using “CpG island searcher” program with a 500 nt window. (46)

    Figure 4

    Figure 4. Positional distribution of promoter sequence elements in Ori regions in (a) yeasts and (b) A. thaliana. The plot shows preponderance of the TATA box [TATAWAWR] in Ori sequences [−5000 to +5000 relative to 0 Ori start sites] in the lower yeast species S. cerevisiae, K. lactis, P. pastoris, and S. pombe. Plots with green-colored bars indicate the occurrence of promoter elements BREu [SSRCGCC], DCE-I [CTTC], DCE-III [AGC], and Pause-button [KCGRWCG] in A. thaliana. The IUPAC nucleotide code is K = G or T, R = A or G, and W = A or T. Promoter sequence motif information was retrieved from the literature (eukaryotic core promoters and the functional basis of transcription initiation). Positional distribution of promoter sequence elements in Ori regions in all systems are also displayed in Supplementary figure 3.

  • References


    This article references 66 other publications.

    1. 1
      Masai, H.; Matsumoto, S.; You, Z.; Yoshizawa-Sugata, N.; Oda, M. Eukaryotic chromosome DNA replication: where, when, and how?. Annu. Rev. Biochem. 2010, 79, 89130,  DOI: 10.1146/annurev.biochem.052308.103205
    2. 2
      Aladjem, M. I.; Redon, C. E. Order from clutter: selective interactions at mammalian replication origins. Nat. Rev. Genet. 2017, 18, 101116,  DOI: 10.1038/nrg.2016.141
    3. 3
      Fragkos, M.; Ganier, O.; Coulombe, P.; Méchali, M. DNA replication origin activation in space and time. Nat. Rev. Mol. Cell Biol. 2015, 16, 360374,  DOI: 10.1038/nrm4002
    4. 4
      Jacob, F.; Brenner, S. On the regulation of DNA synthesis in bacteria: the hypothesis of the replicon. C R Hebd. Seances Acad. Sci. 1963, 256, 298300
    5. 5
      Marahrens, Y.; Stillman, B. A yeast chromosomal origin of DNA replication defined by multiple functional elements. Science 1992, 255, 817823,  DOI: 10.1126/science.1536007
    6. 6
      Dai, J.; Chuang, R. Y.; Kelly, T. J. DNA replication origins in the Schizosaccharomyces pombe genome. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 337342,  DOI: 10.1073/pnas.0408811102
    7. 7
      Xu, J.; Yanagisawa, Y.; Tsankov, A. M.; Hart, C.; Aoki, K.; Kommajosyula, N.; Steinmann, K. E.; Bochicchio, J.; Russ, C.; Regev, A.; Rando, O. J.; Nusbaum, C.; Niki, H.; Milos, P.; Weng, Z.; Rhind, N. Genome-wide identification and characterization of replication origins by deep sequencing. Genome Biol. 2012, 13, R27,  DOI: 10.1186/gb-2012-13-4-r27
    8. 8
      Cayrou, C.; Coulombe, P.; Puy, A.; Rialle, S.; Kaplan, N.; Segal, E.; Méchali, M. New insights into replication origin characteristics in metazoans. Cell Cycle 2012, 11, 658667,  DOI: 10.4161/cc.11.4.19097
    9. 9
      Ghosh, A.; Bansal, M. A glossary of DNA structures from A to Z. Acta Crystallogr. D. Biol. Crystallogr. 2003, 59, 620626,  DOI: 10.1107/S0907444903003251
    10. 10
      Guiblet, W. M.; Cremona, M. A.; Cechova, M.; Harris, R. S.; Kejnovská, I.; Kejnovsky, E.; Eckert, K.; Chiaromonte, F.; Makova, K. D. Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate. Genome Res. 2018, 28, 17671778,  DOI: 10.1101/gr.241257.118
    11. 11
      Marathe, A.; Karandur, D.; Bansal, M. Small local variations in B-form DNA lead to a large variety of global geometries which can accommodate most DNA-binding protein motifs. BMC Struct. Biol. 2009, 9, 24,  DOI: 10.1186/1472-6807-9-24
    12. 12
      Gorin, A. A.; Zhurkin, V. B.; Wima, K. B-DNA twisting correlates with base-pair morphology. J. Mol. Biol. 1995, 247, 3448,  DOI: 10.1006/jmbi.1994.0120
    13. 13
      Drew, H. R.; Dickerson, R. E. Structure of a B-DNA dodecamer. III. Geometry of hydration. J. Mol. Biol. 1981, 151, 535556,  DOI: 10.1016/0022-2836(81)90009-7
    14. 14
      Rohs, R.; West, S. M.; Sosinsky, A.; Liu, P.; Mann, R. S.; Honig, B. The role of DNA shape in protein-DNA recognition. Nature 2009, 461, 12481253,  DOI: 10.1038/nature08473
    15. 15
      Morey, C.; Mookherjee, S.; Rajasekaran, G.; Bansal, M. DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes. Plant Physiol. 2011, 156, 13001315,  DOI: 10.1104/pp.110.167809
    16. 16
      Yella, V. R.; Bansal, M. DNA structural features and architecture of promoter regions play a role in gene responsiveness of S. cerevisiae. J. Bioinform. Comput. Biol. 2013, 11, 1343001,  DOI: 10.1142/S0219720013430014
    17. 17
      Yella, V. R.; Bansal, M. DNA structural features of eukaryotic TATA-containing and TATA-less promoters. FEBS Open Bio 2017, 7, 324334,  DOI: 10.1002/2211-5463.12166
    18. 18
      Yella, V. R.; Kumar, A.; Bansal, M. DNA Structure and Promoter Engineering. In Systems and Synthetic Biology; Singh, V.; Dhar, P. K., Eds. Springer Netherlands: Dordrecht %@ 978–94–017-9514-2, 2015; pp 241254.
    19. 19
      Yella, V. R.; Kumar, A.; Bansal, M. Identification of putative promoters in 48 eukaryotic genomes on the basis of DNA free energy. Sci. Rep. 2018, 8, 4520,  DOI: 10.1038/s41598-018-22129-8
    20. 20
      Kumar, A.; Bansal, M. Modulation of Gene Expression by Gene Architecture and Promoter Structure. In Bioinformatics in the Era of Post Genomics and Big Data Abdurakhmonov, I. Y., Ed. IntechOpen: 2018; pp 3753.
    21. 21
      Bansal, M.; Kumar, A.; Yella, V. R. Role of DNA sequence based structural features of promoters in transcription initiation and gene expression. Curr. Opin. Struct. Biol. 2014, 25, 7785,  DOI: 10.1016/j.sbi.2014.01.007
    22. 22
      Kanhere, A.; Bansal, M. Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res. 2005, 33, 31653175,  DOI: 10.1093/nar/gki627
    23. 23
      Kumar, A.; Bansal, M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res. 2017, 24, 2535,  DOI: 10.1093/dnares/dsw045
    24. 24
      Marin-Gonzalez, A.; Vilhena, J. G.; Moreno-Herrero, F.; Perez, R. DNA Crookedness Regulates DNA Mechanical Properties at Short Length Scales. Phys. Rev. Lett. 2019, 122, 048102  DOI: 10.1103/PhysRevLett.122.048102
    25. 25
      Parker, S. C. J.; Hansen, L.; Abaan, H. O.; Tullius, T. D.; Margulies, E. H. Local DNA topography correlates with functional noncoding regions of the human genome. Science 2009, 324, 389392,  DOI: 10.1126/science.1169050
    26. 26
      Meysman, P.; Marchal, K.; Engelen, K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform. Biol. Insights 2012, 6, 155168,  DOI: 10.4137/BBI.S9426
    27. 27
      Yella, V. R.; Bhimsaria, D.; Ghoshdastidar, D.; Rodríguez-Martínez, J. A.; Ansari, A. Z.; Bansal, M. Flexibility and structure of flanking DNA impact transcription factor affinity for its core motif. Nucleic Acids Res. 2018, 46, 1188311897,  DOI: 10.1093/nar/gky1057
    28. 28
      Kumar, A.; Manivelan, V.; Bansal, M. Structural features of DNA are conserved in the promoter region of orthologous genes across different strains of Helicobacter pylori. FEMS Microbiol. Lett. 2016, 363, fnv207,  DOI: 10.1093/femsle/fnw207
    29. 29
      Cao, X. Q.; Zeng, J.; Yan, H. Structural properties of replication origins in yeast DNA sequences. Phys. Biol. 2008, 5, 036012  DOI: 10.1088/1478-3975/5/3/036012
    30. 30
      Comoglio, F.; Schlumpf, T.; Schmid, V.; Rohs, R.; Beisel, C.; Paro, R. High-resolution profiling of Drosophila replication start sites reveals a DNA shape and chromatin signature of metazoan origins. Cell Rep. 2015, 11, 821834,  DOI: 10.1016/j.celrep.2015.03.070
    31. 31
      Gao, F.; Luo, H.; Zhang, C. T. DeOri: a database of eukaryotic DNA replication origins. Bioinformatics 2012, 28, 15511552,  DOI: 10.1093/bioinformatics/bts151
    32. 32
      Chen, W.; Feng, P.; Lin, H. Prediction of replication origins by calculating DNA structural properties. FEBS Lett. 2012, 586, 934938,  DOI: 10.1016/j.febslet.2012.02.034
    33. 33
      Kumar, A.; Bansal, M. Characterization of structural and free energy properties of promoters associated with Primary and Operon TSS in Helicobacter pylori genome and their orthologs. J. Biosci. 2012, 37, 423431,  DOI: 10.1007/s12038-012-9214-6
    34. 34
      Cao, X. Q.; Zeng, J.; Yan, H. Physical signals for protein-DNA recognition. Phys. Biol. 2009, 6, 036012  DOI: 10.1088/1478-3975/6/3/036012
    35. 35
      Bleichert, F.; Botchan, M. R.; Berger, J. M. Mechanisms for initiating cellular DNA replication. Science 2017, 355, eaah6317,  DOI: 10.1126/science.aah6317
    36. 36
      Gai, D.; Chang, Y. P.; Chen, X. S. Origin DNA melting and unwinding in DNA replication. Curr. Opin. Struct. Biol. 2010, 20, 756762,  DOI: 10.1016/j.sbi.2010.08.009
    37. 37
      Rajewska, M.; Wegrzyn, K.; Konieczny, I. AT-rich region and repeated sequences - the essential elements of replication origins of bacterial replicons. FEMS Microbiol. Rev. 2012, 36, 408434,  DOI: 10.1111/j.1574-6976.2011.00300.x
    38. 38
      Dao, F. Y.; Lv, H.; Wang, F.; Feng, C. Q.; Ding, H.; Chen, W.; Lin, H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2019, 35, 20752083,  DOI: 10.1093/bioinformatics/bty943
    39. 39
      Li, W.-C.; Deng, E.-Z.; Ding, H.; Chen, W.; Lin, H. iORI-PseKNC: A predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemom. Intell. Lab. Syst. 2015, 141, 100106,  DOI: 10.1016/j.chemolab.2014.12.011
    40. 40
      Zhang, C. J.; Tang, H.; Li, W. C.; Lin, H.; Chen, W.; Chou, K. C. iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2016, 7, 6978369793,  DOI: 10.18632/oncotarget.11975
    41. 41
      Gowers, D. M.; Wilson, G. G.; Halford, S. E. Measurement of the contributions of 1D and 3D pathways to the translocation of a protein along DNA. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 1588315888,  DOI: 10.1073/pnas.0505378102
    42. 42
      Halford, S. E.; Marko, J. F. How do site-specific DNA-binding proteins find their targets?. Nucleic Acids Res. 2004, 32, 30403052,  DOI: 10.1093/nar/gkh624
    43. 43
      Jiang, C.; Pugh, B. F. Nucleosome positioning and gene regulation: advances through genomics. Nat. Rev. Genet. 2009, 10, 161172,  DOI: 10.1038/nrg2522
    44. 44
      Hoskins, R. A.; Landolin, J. M.; Brown, J. B.; Sandler, J. E.; Takahashi, H.; Lassmann, T.; Yu, C.; Booth, B. W.; Zhang, D.; Wan, K. H.; Yang, L.; Boley, N.; Andrews, J.; Kaufman, T. C.; Graveley, B. R.; Bickel, P. J.; Carninci, P.; Carlson, J. W.; Celniker, S. E. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 2011, 21, 182192,  DOI: 10.1101/gr.112466.110
    45. 45
      Dao, F. Y.; Lv, H.; Zulfiqar, H.; Yang, H.; Su, W.; Gao, H.; Ding, H.; Lin, H. A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform 2020,  DOI: 10.1093/bib/bbaa017
    46. 46
      Takai, D.; Jones, P. A. Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 37403745,  DOI: 10.1073/pnas.052410099
    47. 47
      Mirkin, E. V.; Mirkin, S. M. Replication fork stalling at natural impediments. Microbiol. Mol. Biol. Rev. 2007, 71, 1335,  DOI: 10.1128/MMBR.00030-06
    48. 48
      Kaushik Tiwari, M.; Adaku, N.; Peart, N.; Rogers, F. A. Triplex structures induce DNA double strand breaks via replication fork collapse in NER deficient cells. Nucleic Acids Res. 2016, 44, 77427754,  DOI: 10.1093/nar/gkw515
    49. 49
      Prioleau, M. N.; MacAlpine, D. M. DNA replication origins-where do we begin?. Genes Dev. 2016, 30, 16831697,  DOI: 10.1101/gad.285114.116
    50. 50
      Cayrou, C.; Ballester, B.; Peiffer, I.; Fenouil, R.; Coulombe, P.; Andrau, J.-C.; van Helden, J.; Méchali, M. The chromatin environment shapes DNA replication origin organization and defines origin classes. Genome Res. 2015, 25, 18731885,  DOI: 10.1101/gr.192799.115
    51. 51
      Antequera, F. Structure, function and evolution of CpG island promoters. Cell. Mol. Life Sci. 2003, 60, 16471658,  DOI: 10.1007/s00018-003-3088-6
    52. 52
      Delgado, S.; Gómez, M.; Bird, A.; Antequera, F. Initiation of DNA replication at CpG islands in mammalian chromosomes. EMBO J. 1998, 17, 24262435,  DOI: 10.1093/emboj/17.8.2426
    53. 53
      Eaton, M. L.; Galani, K.; Kang, S.; Bell, S. P.; MacAlpine, D. M. Conserved nucleosome positioning defines replication origins. Genes Dev. 2010, 24, 748753,  DOI: 10.1101/gad.1913210
    54. 54
      Li, W.-C.; Zhong, Z.-J.; Zhu, P.-P.; Deng, E.-Z.; Ding, H.; Chen, W.; Lin, H. Sequence analysis of origins of replication in the Saccharomyces cerevisiae genomes. Front. Microbiol. 2014, 5, 574,  DOI: 10.3389/fmicb.2014.00574
    55. 55
      Gilbert, D. M. Evaluating genome-scale approaches to eukaryotic DNA replication. Nat Rev Genet 2010, 11, 673684,  DOI: 10.1038/nrg2830
    56. 56
      Tyner, C.; Barber, G. P.; Casper, J.; Clawson, H.; Diekhans, M.; Eisenhart, C.; Fischer, C. M.; Gibson, D.; Gonzalez, J. N.; Guruvadoo, L.; Haeussler, M.; Heitner, S.; Hinrichs, A. S.; Karolchik, D.; Lee, B. T.; Lee, C. M.; Nejad, P.; Raney, B. J.; Rosenbloom, K. R.; Speir, M. L.; Villarreal, C.; Vivian, J.; Zweig, A. S.; Haussler, D.; Kuhn, R. M.; Kent, W. J. The UCSC Genome Browser database: 2017 update. Nucleic Acids Res. 2017, 45, D626D634,  DOI: 10.1093/nar/gkw1134
    57. 57
      SantaLucia, J., Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 14601465,  DOI: 10.1073/pnas.95.4.1460
    58. 58
      Anselmi, C.; De Santis, P.; Paparcone, R.; Savino, M.; Scipioni, A. From the sequence to the superstructural properties of DNAs. Biophys. Chem. 2002, 95, 2347,  DOI: 10.1016/S0301-4622(01)00246-0
    59. 59
      Brukner, I.; Sánchez, R.; Suck, D.; Pongor, S. Trinucleotide models for DNA bending propensity: comparison of models based on DNaseI digestion and nucleosome packaging data. J. Biomol. Struct. Dyn. 1995, 13, 309317,  DOI: 10.1080/07391102.1995.10508842
    60. 60
      Satchwell, S. C.; Drew, H. R.; Travers, A. A. Sequence periodicities in chicken nucleosome core DNA. J. Mol. Biol. 1986, 191, 659675,  DOI: 10.1016/0022-2836(86)90452-3
    61. 61
      Friedel, M.; Nikolajewa, S.; Sühnel, J.; Wilhelm, T. DiProDB: a database for dinucleotide properties. Nucleic Acids Res. 2009, 37, D37D40,  DOI: 10.1093/nar/gkn597
    62. 62
      Qin, Y.; Hurley, L. H. Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie 2008, 90, 11491171,  DOI: 10.1016/j.biochi.2008.02.020
    63. 63
      Todd, A. K.; Johnston, M.; Neidle, S. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 2005, 33, 29012907,  DOI: 10.1093/nar/gki553
    64. 64
      Zeraati, M.; Langley, D. B.; Schofield, P.; Moye, A. L.; Rouet, R.; Hughes, W. E.; Bryan, T. M.; Dinger, M. E.; Christ, D. I-motif DNA structures are formed in the nuclei of human cells. Nat. Chem. 2018, 10, 631637,  DOI: 10.1038/s41557-018-0046-3
    65. 65
      Drew, H. R.; Travers, A. A. DNA bending and its relation to nucleosome positioning. J. Mol. Biol. 1985, 186, 773790,  DOI: 10.1016/0022-2836(85)90396-1
    66. 66
      Tsankov, A.; Yanagisawa, Y.; Rhind, N.; Regev, A.; Rando, O. J. Evolutionary divergence of intrinsic and trans-regulated nucleosome positioning sequences reveals plastic rules for chromatin organization. Genome Res. 2011, 21, 18511862,  DOI: 10.1101/gr.122267.111
  • Supporting Information

    Supporting Information


    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.0c00441.

    • (Supplementary figure 1) Structural profiles of DNA helicoidal parameters for Ori sequences, (Supplementary figure 2) a profile of DNA structural properties of tissue-specific Ori sequences, and (Supplementary figure 3) positional distribution of promoter elements in Ori sequences (PDF)

    • (Supplementary table 1) Characteristic structural feature values observed in Ori sequences and (Supplementary table 2) oligonucleotide compositional analysis of Ori sequences (XLS)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.