Interactions of a Novel Anthracycline with Oligonucleotide DNA and Cyclodextrins in an Aqueous Environment

Berubicin, a chemotherapy medication belonging to the class of anthracyclines, is simulated in double-stranded DNA sequences and cyclodextrins in an aqueous environment via full-atom molecular dynamics simulations on the time scale of microseconds. The drug is studied in both the neutral and protonated states so as to better comprehend the role of its charge in the formed complexes. The noncovalent berubicin–DNA and berubicin–cyclodextrin complexes are investigated in detail, paying special attention to their thermodynamic description by employing the double decoupling method, the solvent balance method, the weighted solvent accessible surface model, and the linear interaction energy method. A novel approach for extracting the desolvation thermodynamics of the binding process is also presented. Both the binding and desolvation Gibbs energies are decomposed into entropic and enthalpic contributions so as to elucidate the nature of complexation and its driving forces. Selected structural and geometrical properties of all the complexes, which are all stable, are analyzed. Both cyclodextrins under consideration are widely utilized for drug delivery purposes, and a comparative investigation between their bound states with berubicin is carried out.


S1. DOCKING SIMULATIONS
The molecular docking performed using the AutoDock program.The blind docking uses a grid box which is as big as the whole DNA molecule, in order to investigate all the possible binding sites.The grid spacing is 0.0375 nm and the average grid box dimensions are 2.3×2.7×3.7 nm 3 for 1AL9 and 2.2×2.4×2.7 nm 3 for 1AMD.The docking algorithm is the Lamarckian Genetic Algorithm (LGA).S1 A total of 2.5×10 6 energy evaluations are used in 50 independent runs and the Genetic Algorithm (GA) population size is 300.Regarding the specific-site docking, the grid boxes are smaller, with the same grid spacing.The average grid box dimensions are 2.3×2.4×1.9 nm 3 and 2.3×2.3×2.1 nm 3 for the two equivalent 1AL9 intercalation sites and 2.2×2.3×1.7 nm 3 and 2.4×2.4×1.5 nm 3 for the median and terminal intercalation sites of 1AMD, respectively.The docking algorithm is LGA, but the searching should be more detailed, so the energy evaluations are 2.0×10 7 and the GA population size is 150 in 50 independent runs.In all the cases, the charge models in use are Gasteiger for berubicin and Kollman for DNA.

S2. TECHNICAL DETAILS ABOUT THE DOUBLE DECOUPLING METHOD
Equation 2 of the article implements DDM, which is schematically described by the thermodynamic cycle of Figure 3 of the article.Technical details and explanations are given in ref 60, so here they will be briefly discussed.For the first term on the right-hand side of eq 2, Δ 1→3 , 1 s of molecular dynamics is conducted on all the berubicin -DNA systems in their bound states in water adding a harmonic force constant between one atom on the berubicin and one atom on the DNA with a harmonic constant,  r , equal to 1000 kJ/mol/nm 2 .Then, the free energy, accompanying the switching on of the harmonic distance restraint is given by the Zwanzig free energy perturbation equation: S2 , where  B is the Boltzmann constant,  res is the harmonic biasing potential and the mean value is extracted throughout the biased simulation.Next, - 4→5 is the free energy associated with switching on the harmonic constant in the gas phase (the interactions of the ligand are turned off) at standard concentration (1 M), in order to properly obtain the binding Gibbs energy at the standard state.This term is estimated theoretically as: 59,60,S3 -Δ 4→5 = - B  • ln , where  0 is the volume corresponding to the solute's standard concentration.Last, the terms Δ 3→4 and  5→2 are both extracted alchemically by turning off gradually the electrostatic interactions of the ligand first and then the Lennard-Jones ones.Twenty  values are used to reach gradually from 0 to 1 with a constant step of 0.05 for both the electrostatic and Lennard-Jones decouplings.In each one, 60 ns of stochastic MD are conducted with the inverse friction constant being 0.2 ps.Thus, Δ 3→4 and -Δ 5→2 are extracted by adding their decoupling free energies from the electrostatic and Lennard-Jones contributions, both of them calculated numerically as d in the context of the thermodynamic integration approach, S4 where ℋ is the Hamiltonian function.The separation of these terms into their electrostatic and Lennard-Jones contributions gives us the possibility to decompose the total binding Gibbs energy into the electrostatic and van der Waals contributions by combining the electrostatic and Lennard-Jones parts, respectively, of Δ 3→4 and Δ 5→2 (see Table S1).A last comment is about eq 3 of the article that adds a correction term to the binding Gibbs energy in the protonated berubicin cases.This accounts for the periodicity-induced netcharge interactions, periodicity-induced undersolvation, residual integrated potential effects and discrete solvent effects. 61The residual integrated potential effects are estimated by evaluating the Poisson-Boltzmann equation via the APBS (version 3.4.1)web service.S5 The theoretical and S7 technical details of the electrostatic correction, as well as the procedure that is followed to evaluate them, may be found in ref 61.
The Δ el and Δ vdW are calculated as follows; . This decomposition allows us to rewrite the binding free energies as Δ bind = Δ vdW + Δ el -Δ 1→3 -Δ 4→5 beginning from eqs 2 and 3 of the article.

S3. DETAILS ABOUT DESOLVATION GIBBS ENERGY AND THERMODYNAMICS
We are interested in extracting the desolvation thermodynamics in the berubicin -DNA complexes.To this end, a thermodynamic cycle, similar to those in refs 105 and S6 and in most continuous-solvent approaches extracting the Gibbs energy, such as MM-PBSA and MM-GBSA, S7,S8 is constructed in Figure S4.This thermodynamic cycle is summarized by the next equation: The left-hand side of eq S1 is the binding Gibbs energy itself, Δ bind .Δ 3→4 corresponds to the in-vacuo binding Gibbs energy, Δ drug-DNA .(Δ 1→3 + Δ 4→2 ) is the desolvation Gibbs energy, Δ des , since it takes into account the desolvation of both the solute molecules as well as the solvation of the formed complex.With these definitions, eq S1 gives: Of course, the same thermodynamic analysis is valid for any state function, such as enthalpy and entropy.Therefore, similar equations as eq S2 hold for these functions by replacing the Gibbs energy, , with enthalpy, , and entropy, .This approach of extracting desolvation Gibbs energies has similarities with the corresponding desolvation parts in MM-PBSA and MM-GBSA approaches, but it seems to overpass some of the main disadvantages of them.Firstly, the accuracy of these methods is highly dependent to the parameters in use to the solvation approaches, S8 such as solutes' dielectric constant, whereas our approach does not need any extra parametrization.Also, Poisson-Boltzmann approach gives high uncertainties for highly charged or polar compounds.S8,S9 The use of an implicit solvent introduces errors regarding possible water molecules which contribute to the binding, S8-S10 either by their existence or dislocation.Our approach keeps the solvent explicit.Last, the formulation, followed herein, allows the decomposition of the desolvation Gibbs energy into its enthalpic and entropic parts.Although this method overpasses some of the disadvantages of the above-mentioned ones, it is more time-consuming, due to the two alchemical decouplings in DDM.Another rigorous method that could be used to approach the desolvation Gibbs energy is the direct extraction of (Δ 1→3 + Δ 4→2 ) via three alchemical runs.Δ 4→2 is the coupling Gibbs energy (turning on the interactions) of the whole complex from vacuum to water and Δ 1→3 is the sum of the decoupling (turning off the interactions) Gibbs energies of the ligand and receptor from water to vacuum.But this approach would be much more time-consuming.A last comment on these is that the formulation, showed herein, allows any approximation of the involving quantities, Δ bind and Δ  drug-DNA , and their enthalpic and entropic parts, offering flexibility to the method.In conclusion, the method, showed herein, seems to combine well accuracy, flexibility and efficiency.The red and black objects stand for the receptor and the ligand, respectively.When the objects are inside the blue circle, then they are solvated, otherwise they are desolvated (the solvent is away).State (1) refers to the unbound state in which both the solutes are solvated in solution, whereas state (2) refers to the solvated formed complex between them in solution.State (3) refers to the unbound desolvated state in which the molecules are far from each other in-vacuo and state (4) refers to the bound desolvated complex.The materialization of this thermodynamic cycle is executed via eq S1, the description of which is given in Sections 2 and S3.

DEGREES OF FREEDOM
An important aspect of MD simulations is the convergence of the rotational and translational degrees of freedom of the studied systems, because their convergence indicate that the phase space is properly sampled.Regarding the rotational degrees of freedom, the three principal axes of inertia of the solutes of all the unbound systems (neutral berubicin, protonated berubicin, -CD, HP--CD, 1AL9 and 1AMD) are computed as functions of the simulation time.Then, the autocorrelation functions of these axes are estimated as the mean values of the second-order Legendre polynomial of the cosine of the angle between the same axis at two different times: where  2 () = assuming that the decay is exponential.Table S2 contains all the characteristic times.It is seen that all of them are significantly lower than the 1 μs duration of the main MD runs.

S13
As far as the translational degrees of freedom are concerned, the Fickian diffusion of the unbound systems reveals the rate of the convergence of the molecules' COMs.To this end, the mean square displacement (MSD) is extracted as a function of the simulation time.Indicative plots for the same three cases, as in Figure S5, are given in Figure S6.In all the cases, we see a linear  Table S2.Correlation times for every unbound case.The correlation times are referred to the major (top value), middle (middle value) and minor (bottom value) principal axis of inertia.The time units are picoseconds (ps).S3.Lag times and the coefficients of determination of the linear regression between MSD and  for every unbound case.

S5. CONFORMATIONAL CONVERGENCE OF THE SIMULATED SYSTEMS
Unbiased MD offers us the opportunity to explore the system's phase space, according to the physical laws and without biases.But often they lack of convergence or non-ergodicity, due to high-energy barriers at the potential energies which result in trapping the studied systems in specific local minima of the potential energy.Therefore, there is a need to validate whether this is not happening at the present simulations and the ergodicity is not violated.A well-known strategy to this kind of simulations is the Simulated Annealing (SA) and is generally used in the literature.S12,S13 SA consists of simulations starting from a high temperature and gradually linearly decreasing it with time.The high temperature, in combination with a slow annealing, gives to the system the ability to overpass certain potential energy local minima, due to the higher thermal energy they possess.
The SA simulation is conducted starting from a temperature of 345 K and it gradually decreases with a rate of 1.5 K per 20 ns down to 310 K, which is the temperature of the productive MD runs, and then it is kept constant at 310 K for the rest time.Therefore, the first 460 ns of the SA simulation the temperature is decreasing and the rest has become an ordinary NPT simulation.The initial temperature should not be much higher than the selected one in order not to exceed the melting temperature and face undesired conformational changes to the oligonucleotide, as the melting temperature is in the region of 340 -350 K. S14,S15 The main output of this calculation is the RMSD (of the same heavy atoms and the same reference structure for proper comparisons with the constant-temperature NPT simulations, conducted herein), depicted in Figure S7.It is about the neutral berubicin -1AL9 case.We see that the RMSD has a significant fluctuation at the beginning of the SA simulation, but then it stabilizes at low values, due to the decrease of the

S6. PROTOCOLS FOR THE EQUILIBRATION AND PRODUCTIVE MD RUNS
Prior to the beginning of the production simulations, the systems are properly equilibrated.The equilibration protocol for the cyclodextrin-based complexes is exactly the same as in ref 35 (Section 2 therein), which investigates a complex formed between -cyclodextrin and fluoxetine, a widely-prescribed drug acting on the central nervous system.The production stage of all drugcyclodextrin simulations is 1 μs long.Regarding the berubicin -DNA systems, the equilibration protocol, which is applied in Amber 14 making use of the SANDER module, is as follows.First,

S18
an energy minimization step is applied to the solvated systems, which comprises 10000 iterations, using positional restraints on all DNA and drug atoms with a harmonic constant of 1000 kcal/mol/nm 2 .Next, a new energy minimization step of 10000 iterations is carried out in absence of these positional restraints.After this two-step energy minimization, the systems are heated to 310 K from 0 K by conducting MD for 200 ps under constant volume with the use of a Langevin thermostat S16 and a collision frequency equal to 1.0 ps -1 and under the same positional restraints as imposed during the first minimization step.Then, a MD simulation of 2 ns is conducted at the temperature of 310 K and pressure of 1 bar, without the restraints and by applying the Berendsen barostat S17 with the pressure relaxation time being 1.0 ps.The last two simulations are conducted employing an integration time step of 2 fs with all bonds containing hydrogen atoms constrained to their equilibrium length by use of the SHAKE algorithm.S18 Subsequently, a MD simulation of 25 ns is carried out at the same temperature and pressure in GROMACS, using a time step of 2 fs and the LINCS algorithm S19 for constraining the bonds containing hydrogen atoms, and without positional restraints.
The production MD simulations are carried out in the NPT statistical ensemble and particularly at the temperature of 310 K, using the Nosé-Hoover thermostat S20,S21 with a time constant of 1.0 ps, and at a pressure of 1 bar, using the Parrinello-Rahman barostat S22 with a time constant of 1.0 ps.
The numerical integration time step for the equations of motion, is set to 2.0 fs, whereas all bonds involving hydrogen atoms are constrained to their equilibrium length employing the LINCS algorithm.The long-range electrostatics is treated with the well-known particle mesh Ewald method, S23 while the Lennard-Jones interactions exerted between the atoms are calculated with a cutoff radius of 1.0 nm.In the latter case, the corresponding forces are smoothly switched to zero in the range between 0.9 nm and the cut-off distance, 1.0 nm, by using the force-switch option which is available in the GROMACS package.In addition, long-range energy and pressure correction terms are applied due to the truncation of the non-bonded potentials.Regarding the molecular visualization, in addition to VMD, most of the berubicin -DNA complexes are visualized with the UCSF Chimera software S24 in which the DNA strands are represented as ribbons, while the drug is depicted in the united atom representation.

Figure S1 .
Figure S1.Graphical representations of berubicin bound in DNA as an intercalating agent

Figure S2 .
Figure S2.RMSD plots of 1AL9 and 1AMD in the unbound state in water.

Figure S3 .
Figure S3.Amino-group's position relative to the -CD cavity in the former's (a) neutral and (b)

Figure S4 .
Figure S4.Thermodynamic cycle employed for the calculation of the desolvation Gibbs energies.
-1) is the second-order Legendre polynomial and cos[()] = () • (0) is the angle between the unit vector on the principal axis, , at different time-frames with a difference of time  between them.The brackets indicate the mean value from the several timeframes with time-difference of .The denominator  2 [〈cos[(0)]〉] normalizes the autocorrelation function.Indicatively, FigureS5shows this autocorrelation function for the major principal axis in three characteristic cases; protonated berubicin, -CD and 1AL9.We see an exponential decay of the autocorrelation functions down to zero, indicating that after some time any memory about the axes' orientation is lost.This randomization of the principal axes' orientations indicates that the main rotational degrees of freedom are properly sampled.To quantify this conclusion, the decorrelation characteristic times are extracted by integrating the autocorrelation functions,S11

Figure S5 .
Figure S5.Rotational autocorrelation functions of the major principal axes of inertia in three

Figure S6 .
Figure S6.Mean square displacement of the same three indicative unbound cases, as in Figure temperature and consequently of the molecular mobility.It is observed that the converged values of RMSD are very similar with the ones obtained from the productive MD simulations of this study (Figures4 and S2).

Figure S7 .
Figure S7.RMSD plot of the neutral berubicin -1AL9 during the SA run (see text for more

Table S1 .
Van derWaals and electrostatic contributions to the binding Gibbs energies.a These are eqs 6 of the article and are used in current desolvation thermodynamics.DDM and SBM provide all the binding parameters, i.e.Δ bind , Δ  bind and Δ bind .Δ drug-DNA is easily obtainable by computing the direct intermolecular and intramolecular interactions, involving only berubicin and DNA in the unbound and bound states (in a similar manner to the molecular mechanics term of the MM-PBSA and MM-GBSA methods) and then taking the differences of the form 〈〉 AB -(〈〉 A + 〈〉 B ), which are assumed to be approximately equal to  AB -( A +  B ), where 〈〉 stands for the intramolecular and intermolecular potential energy interactions only between the solutes. drug-DNA is estimated by the WSAS model, being employed in both the unbound and bound states and then taking differences of the form  AB -( A +  B ).The indices A, B and AB refer to the solvated simulations containing only the receptor, ligand and complex, respectively.Having these, it is possible to obtain the desolvation parameters by applying the eqs 5, 6 and S2.