Charge Engineering Reveals the Roles of Ionizable Side Chains in Electrospray Ionization Mass Spectrometry

In solution, the charge of a protein is intricately linked to its stability, but electrospray ionization distorts this connection, potentially limiting the ability of native mass spectrometry to inform about protein structure and dynamics. How the behavior of intact proteins in the gas phase depends on the presence and distribution of ionizable surface residues has been difficult to answer because multiple chargeable sites are present in virtually all proteins. Turning to protein engineering, we show that ionizable side chains are completely dispensable for charging under native conditions, but if present, they are preferential protonation sites. The absence of ionizable side chains results in identical charge state distributions under native-like and denaturing conditions, while coexisting conformers can be distinguished using ion mobility separation. An excess of ionizable side chains, on the other hand, effectively modulates protein ion stability. In fact, moving a single ionizable group can dramatically alter the gas-phase conformation of a protein ion. We conclude that although the sum of the charges is governed solely by Coulombic terms, their locations affect the stability of the protein in the gas phase.

. To model the proteins in the MS experiments, which were purified using a His-tag, the three structures were complemented with a sequence of six histidines following one Nterminal Met and a Gly redidue. The His-tails were modelled using PyMol to stick out straight from the rest of the structure in a beta-strand conformation, under the assumption that the tail can pick up charges and reduce the Coulombic repulsion by protruding in a straight conformation.
The His-tagged GFP structures were used as input for the Monte Carlo (MC) simulations. It is known that side chains can rearrange in response to the gas-phase condition [6] , but it is also reasonable to assume that side chains and other chemical groups will move to solvate charged groups, form salt bridges when favourable, etc. The finer details in crystallographic structures therefore risk being invalid for gas-phase proteins, or at least act to create a bias towards charge locations that are present in the crystal. We therefore took a residue-level coarsegrained approach, where charges are moved around between ionizable sites in a Metropolis MC scheme and allowed to relax in a short energy minimization at each step. The energy of any state is defined by three components: a) The electrostatic interaction between charges, b) the solvation energy of charges at the protein surface, and c) the gas-phase basicity of protonated groups.
In the coarse-graining we discard most atomic coordinates in the input structure, only keeping the Cɑ atoms, the Cβ atoms of ionizable residues (His, Lys, Arg, Glu, Asp) as well as the N-atom in the N-terminal amine and the C-atom in the C-terminal carboxylic group. In our model the Cβ and the terminal N-and C-atoms serve as anchor points for charges that are free to move within a distance from the anchor they are attached to, where the distance is determined by the number of covalent bonds in the sidechain from the charged group to the Cβ. More precisely, the maximum allowed distance d is defined as follows: where dC-C = 1.54 Å is the distance of a C-C bond, n the number of bonds separating the charge from the anchor point, and the sinus function reflecting the tetrahedral geometry of the bonds around the C-atoms. See Table S1 for the values of n for the different types of ionizable sites. To prevent opposite charges from coming unphysically close to each other, we imposed a minimum distance dmin = 3 Å between charges, reflecting a an approximate N-O distance in salt bridges between amino acid side chains [7] .
Coulombic interactions between charged sites were calculated using Coulomb's law. A relative dielectric constant of 2.0 was used to reflect a compromise between the presence of the protein and the vacuum surrounding and is therefore low compared to values commonly employed for continuum electrostatics in protein. Schnier et al. [8] reported that a dielectric constant of 2.0 ± 0.2 makes the best match for experimental observations of charging in cytochrome C. While GFP is larger than cytochrome C, a charge near the GFP surface will not experience a much larger protein dielectric than on cytochrome C, hence we are confident that a value of 2.0 will yield qualitatively sound results. It should be noted however that the calculations might be sensitive to the choice of dielectric constant and other parameters, and the exact quantitative results must therefore be treated with appropriate caution, but importantly that does not preclude a qualitative analysis of trends observed using a reasonable parameter set.
To prevent the mobile charges from penetrating the protein and form salt bridges through the backbone, we defined a physical barrier using the Cɑ atoms. A repulsive potential was defined around each Cɑ atom according to # = # ( # − ) $ , where kr = 250 kJ/(mol Å 2 ) is a force constant, dr = 3.5 Å is the distance where the repulsion starts, and r is the distance between a charge and the Cɑ atom.
The charge solvation at the protein surface, representing the rearrangement of local dipoles etc, was modelled with a potential that switches from zero at far distances to Usol at shorter distances, where a cosine function is used as a switching function. More specifically, where dsol = 5.5 Å is the distance where the switching starts, dr is where the repulsion starts (see above), and Emin is the energy minimum for this potential. Usol as defined above is only evaluated on the interval dsol < r ≤ dr, and Usol is stipulated to be equal to zero and Emin at longer and shorter distances, respectively. Although the solvation energy can be assumed to vary across the protein surface [9] , we take an agnostic approach and set Emin to a mean-field value of 62 760 J/mol that has been used before in the literature [10] . In contrast to all other contributions to the total potential, only the interaction with the closest Cɑ is used to calculate the solvation energy, which prevents multiple inclusions of this energy term for a given charge carrier. The interaction between a charge and the Cɑ from the same residue requires special treatment however because in that case there is no complete side chain blocking access to the backbone. The limits and reference distances used to calculate such self-interaction was therefore decreased by 2.5 Å for both Ur and Usol. The repulsion and the solvation potentials are illustrated in Fig S9. The last contribution to the energy comes from the GB. For each site that is protonated a contribution to the energy is made corresponding to the intrinsic GB for that site. GB values used in the simulations are shown in table S1.
Some internal ionizable residues can be assumed to have a constant protonation state throughout the simulations. We inspected the structures to determine which residues were to be given constant charge based on literature and interactions with surrounding residues.
Arg96 and Glu222 (residue numbering corresponding to the 2B3P structure) flanking the chromophore were both assumed to be charged. His169 and His181 in contrast was given a constant zero charge.
MC simulations were carried out for the three systems. Initial protonation states were generated by randomly placing protons on the ionizable sites until the net charge was 9+. At each MC step a trial move is made, where one protonated and one deprotonated site is chosen randomly to get deprotonated and protonated, respectively. A steepest descent energy minimization is then run to let the charged sites adjust according to the potential terms described above, whereupon the total energy is calculated. The move is accepted or   Tables   Table S1. Number of bonds from anchor point and intrinsic GB for ionizable side chains and termini. Values for side chains were taken from Marchese et al [11] . The N-terminal amine was given the same GB as the Lys side chain, and the C-terminal carboxylate was given a GB equal to the average of the GBs of Asp and Glu side chains.