Excited States of Xanthophylls Revisited: Toward the Simulation of Biologically Relevant Systems

Xanthophylls are a class of oxygen-containing carotenoids, which play a fundamental role in light-harvesting pigment–protein complexes and in many photoresponsive proteins. The complexity of the manifold of the electronic states and the large sensitivity to the environment still prevent a clear and coherent interpretation of their photophysics and photochemistry. In this Letter, we compare cutting-edge ab initio methods (CC3 and DMRG/NEVPT2) with time-dependent DFT and semiempirical CI (SECI) on model keto-carotenoids and show that SECI represents the right compromise between accuracy and computational cost to be applied to real xanthophylls in their biological environment. As an example, we investigate canthaxanthin in the orange carotenoid protein and show that the conical intersections between excited states and excited–ground states are mostly determined by the effective bond length alternation coordinate, which is significantly tuned by the protein through geometrical constraints and electrostatic effects.


Coupled Cluster Calculations
The CC3 and EOM-CC3 S1,S2 calculations have been performed with the Cfour package. S3 From now on we simply use CC3 for both ground and excited (EOM-CC3) calculations. We used the frozen-core approximation and the aug-cc-pVDZ atomic basis set for these calculations that were made considering the point group symmetry for obvious computational reasons. It is well-recognized that CC3 approach typically yields highly-accurate transition energies, on average within ca. 0.05 eV of the FCI limit. S4 However, as all single-reference method, CC3 has difficulty tackling transitions with a multi excitation character. More in details, one needs to distinguish the transitions that are "purely double" for which CC3 miserably fails, to those that have a significant, yet not dominant, double-excitation character, such as the A g transition in butadiene and hexatriene, for which CC3 slightly overshoots the transition energies. S5 For the latter category, CC3 seems in fact to provide an accuracy similar to the one obtained with CAS-PT2 or NEV-PT2. S5

DMRG(CAS)-SCF/NEVPT2 Calculations
DMRG-SCF calculations have been performed with in-house scripts based on the most recent version of PySCF S6,S7 which is able to perform CAS-SCF with orbital optimization decoupled from the Full Configuration Interaction (FCI) problem. S8 Within this setup one can use, in a modular way, approximate FCI solvers such as Quantum Chemical Density Matrix Renormalization Group (DMRG for short) or Quantum Monte Carlo (QMC) while performing a CAS-SCF like orbital optimization. All the CAS-CI calculation performed in this work have been performed using the "natural active space" for CO n system which include n π orbitals, n π * orbitals and 2 non bonding orbitals (located on the carbonyls' oxygen atoms) with 2n + 2 electrons. This choice leads to active spaces that are too large to be tractable with standard FCI solvers for systems larger than CO 6 . To overcome this limit S2 we used the DMRG solver for all the active spaces containing more than 16 active orbitals. DMRG calculations have been performed using StackBlock 1.5 coupled to the PySCF interface. S9 All DMRG calculations have been performed with a maximum bond length (m) of 2000; default StackBlock 1.5 parameters have been used for orbital ordering and initial guess for the MPS wave function. The initial active space for the CAS-SCF optimization was selected exploiting the symmetries of canonical SCF orbitals when possible. In all the other cases, we applied a simple localization/selection procedure which is described below. The optimization of active orbitals was done using a State-Average (SA) CAS-SCF procedure with equal weights for all the states extracted from the CI; when symmetry is used, all the electronic state selected in different irreducible representations (irrep) have been included in the state averaging, therefore in our calculation also electronic states from with different irrep are orthogonal. In all calculations, we used the default convergence parameters provided by PySCF. Density fitting was used in all the calculations to reduce the cost associated to the dimension of the basis set.

Localization Procedure
DMRG-SCF calculations on structures that do not belong to the C 2h point group have been performed using a localized molecular orbital basis set. Such orbitals are obtained from canonical SCF ones with the following procedure: (a) occupied orbitals are localized following the intrinsic bond orbital procedure, S10 (b) Localized Intrinsic Valence Virtual Orbitals (LIVVO) S11 are then constructed, defining a hard-virtual subspace which (c) is processed as suggested by Subotnik et al. S12 On the resulting set of orbital we selected the π-active space using the procedure proposed by Keal et al.: S13 a local conjugation plane is defined on the basis of molecule's geometry and a "π-score", which is later used for the selection, is computed using the overlap of the localized molecular orbitals with an atomic basis set aligned to the conjugation plane. This procedure was implemented in PySCF framework; the rotated basis set was computed using the recursive equations by Ivanic et S3 al. S14

TD-DFT Calculations
TD-DFT calculations have been performed using Gaussian 16 software package. S15 All of these calculations have been performed with the Gaussian 16 default parameters. Two different functionals have been tested, namely B3LYP, S16,S17 and CAM-B3LYP S18 ).

Semi-Empirical Configuration Interaction Calculations
Semi-Empirical Configuration Interaction (SECI) calculations have been performed with the MNDO software package. The orbitals used for the CI expansion are computed using the OM2 Hamiltonian in a SCF calculation targeting an open-shell singlet state having two singly occupied orbitals. Then we constructed an active space with all the π, n, and π * orbitals and included all single excitations from two reference determinants (the closed-shell HF singlet, and the doubly excited HOMO 2 → LUMO 2 ). The selection of π-type orbitals have been done using the so-called π-population approach proposed by Thiel and coworkers. S13

Geometry Optimization
All the geometry optimizations but the CC3 ones have been performed using Gaussian 16 software package. For MP2 calculations we used 6-31G(p) basis set.
For scan along the BLA and dihedral angles coordinates, we defined those variables in the redundant coordinate set used in the optimization and performed a relaxed scan along the variable itself. This was achieved using the so-called Generalized Internal Coordinates (GIC) input implemented in Gaussian 16. The BLA definition used is the one reported in Figure   4 of the main text: it should be noted that since we include C=O bonds in the average of double bonds length the value reported in this paper are not directly comparable with BLA of polyenes. For the canthaxanthin optimization we computed the BLA on the conjugated S4 backbone highlighted in Scheme 1 of the main text.
Optimizations in the protein have been performed with the ONIOM(QM:MM) model S19 as implemented in Gaussian 16; AMBER force field was used for the low-level calculations with the parameters used in previous works from some of us. S20 The CC3 optimization were performed with Cfour, S3 using Z-matrix coordinates enforcing the point group symmetry and correlating all electrons. Default geometry convergence parameters were applied.

Fitting of B + u CC3 Excitation Energies
To obtain a reference data set of B + u excitation energies on CO n compounds longer then CO 8 we used an extrapolation procedure. We performed a least-square fit of the excitation energies in eV computed for compounds CO 3 -CO 8 with the function:

S8
The optimized parameters are a=15.14348998, b=0.62407386 and c=1.81921892. The fitting yields a r 2 as large as 0.999992, confirming the regular evolution trend of the B + u excitation energies with increasing chain length in CO n oligomers.

S3 Additional tables and figures
A B Figure S1: Excitation energies of CO n systems computed at CC3 and CAS-SCF level (panel A and B respectively) on CC3 structures (dots) and MP2 structures (dashed lines).   Figure S3: Mutual Information (MI) diagrams of CO 13 computed from the DMRG-SCF converged wavefunctions; each dot corresponds to an active space orbital (the index number is the same as used in Figure S4) and its size is proportional to its single orbital entropy; the style and shape of the line connecting two dots is proportional to the mutual information of the two. Figure S4: Optimized active space orbitals for CO 13 ; orbitals are represented as isosurfaces at two arbitrary values (+0.001 yellow, -0.001 blue). Orbitals 9 and 10 are the n orbitals with B u and A g symmetries respectively; all the other orbitals are π orbitals with A u and B g symmetries.  Figure S5: Ground-to-excited state transition densities of CO 13 computed at DMRG-SCF (left) and TD-B3LYP level (right). From top to bottom: A g , B − u , B + u , A u and B g . Transition densities are represented as isodensity surfaces at ±0.001.

S27
A B C Figure S6: Evolution of the energies of CO 11 along the BLA coordinate. In all plots the solid lines refer to SECI energies calculated at MP2 geometries and they are the same reported in the main text while in ( Figure S7: Evolution of main determinant contributions to the SECI wavefunction along the BLA coordinate for the three lowest lying electronic states. Each determinant is represented with a different color; solid lines are used for HF and singly excited determinants while dashed lines are used for doubly excited determinants. The discontinuities can be due to the orbital switching/mixing during the geometry change. Figure S8: Ground and π → π * excited states of CAN computed at SECI level (left panel) on BLA scan performed at MP2 level in vacuum (right).