Nanoscale Infrared Spectroscopy and Chemometrics Enable Detection of Intracellular Protein Distribution

Determination of the intracellular location of proteins is one of the fundamental tasks of microbiology. Conventionally, label-based microscopy and super-resolution techniques are employed. In this work, we demonstrate a new technique that can determine intracellular protein distribution at nanometer spatial resolution. This method combines nanoscale spatial resolution chemical imaging using the photothermal-induced resonance (PTIR) technique with multivariate modeling to reveal the intracellular distribution of cell components. Here, we demonstrate its viability by imaging the distribution of major cellulases and xylanases in Trichoderma reesei using the colocation of a fluorescent label (enhanced yellow fluorescence protein, EYFP) with the target enzymes to calibrate the chemometric model. The obtained partial least squares model successfully shows the distribution of these proteins inside the cell and opens the door for further studies on protein secretion mechanisms using PTIR.

F rom transcription to secretion, proteins have a central role in life: a cell will invest up to 75% of its energy in translation alone. 1 Proteins are impressive in many ways. As catalysts, they are present in almost all metabolic pathways, catalyzing reactions that would otherwise not take place in an organism's lifetime 2 and doing so under mild conditions and with a selectivity that is hard to rival with. 3 However, not all proteins are enzymes, others serve a structural function or participate in cell signaling and signal transduction. This means that failures and errors that occur in such an important system are the cause of a number of pathologies. 4 Furthermore, proteins are present in several high-value industrial processes such as the production of pharmaceuticals, 5 biofuels, 6 and macromolecules. 3 Therefore, it is of great interest to have a detailed knowledge about the intracellular synthesis and trafficking of proteins and their regulation. Gene expression, 7 protein quality control mechanisms, 1 and protein secretion 8 have been studied for decades in a continuous multidisciplinary effort. Imaging techniques such as electron microscopy 9−11 and visible microscopy techniques that are capable of resolution below the diffraction limit (STED, SIM, PALM, and STORM) 12−15 are typical of this kind of research and have given important insights into the ultrastructural features, processes, and metabolic pathways of cells. However, these super resolution microscopy techniques are often limited by the choice of fluorophore (which needs to meet a series of tight criteria), tedious sample preparation, long sampling time, extensive postprocessing, and propensity for artifacts. 16−18 In this work, a new approach to localization and detection of proteins and groups of proteins inside of cells at a nanoscale lateral resolution is introduced. This novel method is able to provide spatial resolution rivaling that of the above-mentioned super-resolution techniques while also being able to provide label-free information about protein distribution at ambient conditions. It is based on nanometer spatial resolution midinfrared molecular spectroscopy coupled with chemometric data evaluation. While in general spatial resolution in mid-IR microscopy is diffraction-limited in the micrometer range, 19 techniques based on near field are not limited by diffraction. 20 Here, we use the photothermal-induced resonance technique (PTIR, also known as AFM-IR) 21,22 to perform near-field IR spectroscopy. PTIR is a combination of atomic force microscopy (AFM) and infrared spectroscopy. In this technique, a pulsed wavelength-tunable IR laser is aimed at the sample area under the AFM tip, causing it to rapidly heat up and expand. The thermal expansion is detected through changes in the deflection signal of the AFM cantilever, which oscillates with an amplitude proportional to the absorbed energy. 20 The signal detection occurs in the near-field region, allowing lateral spatial resolutions below the diffraction limit and down to 20 nm. 20 Vertically, PTIR can detect signals buried at over 1 μm in a sample. 23 The signal obtained in both PTIR and FTIR is proportional to the wavelength-dependent absorption coefficient of the sample. 20 Thus, PTIR spectral bands can be assigned to the localized vibrations of functional groups using well-established spectra−structure correlations. In the same way as FTIR spectroscopy, PTIR also provides access to the fingerprint range of the infrared spectrum, where absorption bands are present that result from vibrations involving a higher number of atoms than localized vibrations. General spectral ranges can be assigned to biologically relevant functional groups that are characteristic building blocks of, for example, DNA/RNA, carbohydrates, lipids, and proteins. For evaluation beyond the quantitation of these major compounds, multivariate methods or machine learning approaches are typically used in FTIR. 24 PTIR has found applications in material science, 25−31 art conservation, 32,33 environmental analysis, 34 nano-optics, 35 geology, 36,37 and in the life sciences. In the latter case, PTIR has been applied to the imaging of whole cells 38−41 and successfully identified and characterized lipid droplets, 42 poly(hydroxybutyrate) vesicles, 43 extracellular vesicles, 44 liposomal drug nanocarriers, 45 and photosynthetic complexes. 46 Protein and polypeptide secondary structure has been extensively studied by PTIR, 47,48 including in aqueous medium 49 and in human tissue. 50 Studies on the intracellular distribution of proteins with varying degrees of complexity have been reported. 38,40,51 In 2016, Baldassarre et al. 40 imaged E. coli and human (HeLa) cells and reported inhomogeneous protein density in the latter. In 2018, Quaroni et al. 51 identified filaments visible in topographical images as belonging to the cytoskeleton of fibroblasts. In these studies, the linear correlation between the PTIR signal and the concentration is used to determine the distribution of chemicals. Perez-Guaita et al. 38 located areas with relatively high hemoglobin content within red blood cells infected with P. falciparum using (unsupervised) cluster analysis. We take a different approach by using supervised machine learning to detect one predetermined group of analytes. We demonstrate the versatility of our method by analyzing the distribution of cellulases and enhanced yellow fluorescent protein (EYFP) in Trichoderma reesei. The filamentous mesophilic ascomycete T. reesei was found a little over 75 years ago on the Solomon Islands and soon caught the attention of researchers because of its exceptional cellulose-degrading capabilities. 6 In the wild, T. reesei is a saprobe, and the secretion of cellulases and hemicellulases in high amounts is essential for its survival. 52 T. reesei's efficient enzymatic secretory system has since been exploited in industrial-scale enzyme production. The cellulases and hemicellulases find application in numerous industries such as the pulp, 53 food and animal feed, 54 and textile industry 55 and are used for the production of secondgeneration biofuels. 6,56 The strain used, QM6a SecEYFP, derived from QM6a Δtmus53, bears an expression cassette for EYFP fused to the N-terminal secretion signal peptide of the main cellobiohydrolase CBHI 55 under the control of the cbh1 promoter. Hence, the fluorescence brightness can be considered proportional to the cellulase abundance.
In this work, fluorescence images from an EYFP secreting T. reesei strain were obtained and combined with AFM topographic images measured in the same locations. Using this information, a fluorescence value could then be attributed to each PTIR spectrum location and used to create a chemometric model using partial least squares regression (PLS). This PLS model relates the fluorescence intensity to the presence of β-sheet-containing proteins, such as cellulases and xylanases, and could then successfully be applied to a validation data set.

■ EXPERIMENTAL SECTION
Fungal Strains. The T. reesei strains QM6a Δtmus53 57 and QM6a SecEYFP used in this study were maintained on malt extract agar (MEX) at 30°C. Hygromycin B was added when applicable to a final concentration of 113 U/mL. Plasmid Construction. Polymerase chain reactions (PCRs) for cloning purposes were performed with Q5 highfidelity DNA polymerase (New England Biolabs, Ipswich, MA, USA) according to the manufacturer's instructions. All used primers are listed in Table 1. First, the cbh1 promotor and the sequence encoding for the first 18 amino acids of cbh1 were amplified with the primers Pcbh1_fwd_XhoI and Pcbh1_Q18r_NheI using chromosomal DNA of QM6a Δtmus53 as the template and inserted into an EcoRV-digested pJET1.2 (Thermo Scientific, part of Thermo Fisher Scientific Inc., Waltham, MA, USA) yielding pJET-Pcbh1+18. Next, a codon-optimized eyfp was amplified with the primers Yfp-fwd-XbaI and Yfp-rev-NotI-NsiI using pCD-EYFP 58 as the template and then inserted into pRLM ex 30 59 via digestion with Xba and NsiI. The eyfp:Tcbh2 fragment was released by digestion with XbaI and HindIII and inserted into an accordingly digested pJET1.2 (Thermo Scientific). This plasmid was digested with BspEI and XbaI and the cbh1 promoter fragment was inserted after the release from pJET-Pcbh1+18 via digestion with BspEI and NheI, yielding pCD-SecYFP.
Fungal Transformation. The protoplast transformation of T. reesei was performed as described by Gruber et al. 60 Approximately 10 7 protoplasts of the strain QM6a Δtmus53 were co-transformed with 5 μg of pCD-SecYFP and 1 μg of pAN7-1, 61 yielding the strain QM6a SecEYFP. The transformation reaction was added to the 40 mL melted, 50°C warm MEX agar containing 1.2 M sorbitol. Equal parts of this mixture were poured into four sterile Petri dishes and incubated at 30°C for 5 h after solidification. Subsequently, 10 mL melted, 50°C warm MEX agar containing 1.2 M sorbitol and double concentration of hygromycin B were poured on top of each protoplast-containing layer. Plates were incubated at 30°C for 2−5 days until colonies were visible. Resulting candidates were subjected to homokaryon purification by streaking conidia on selection plates.
Fungal Sample Preparation. Conidiospores of QM6a SecEYFP were harvested from a MEX plate, suspended in 0.8% NaCl/0.05% Tween 80 and diluted to approximately 10 2 −10 3 spores/mL. 100 μL of this spore suspension was spread on a To harvest and wash the mycelium, the cellophane foil was transferred into ice-cold deionized water using sterile tweezers. Floating mycelium was then transferred onto a CaF 2 disk (12 mm diameter × 2 mm thickness from Crystran) and dried using a freeze-drier (FreeZone 2.5 L benchtop freeze-drier from Labconco) at −50°C and 0.160 mbar for 30 min. PTIR Measurements. All PTIR measurements were carried out using a Bruker nano-IR 3 s coupled to a MIRcat external cavity quantum cascade laser array from Daylight Solutions. Spectra covering the range from 1200 to 1758 cm −1 (QCL transition at 1360.5 cm −1 ) and from 2770 to 2900 cm −1 were obtained using resonance-enhanced PTIR 63 in contact mode by tuning the repetition rate of the laser to match the second contact resonance frequency of the cantilever (roughly 190 to 220 kHz) and the sample. The cantilevers used were gold-coated with nominal first free resonance frequencies of 13 ± 4 kHz and spring constants between 0.07 and 0.4 N/m (PR-EX-nIR2 from Anasys Instruments/Bruker). The laser source operated at 3% duty cycle (corresponding to 160 to 140 ns pulse length) and 14.75% power (before beam splitter), and for each location, five spectra were recorded at 2 cm −1 spectral resolution. The instrument and all beam paths were purged with dry air generated by an adsorptive dry air generator.
Data Preprocessing. Recorded spectra in which no bands were above the noise level were rejected, and the remaining spectra were averaged by location. The spectra were then normalized using vector normalization in the range between 1200 and 2900 cm −1 (excluding the wavenumbers where no laser emitted from calculations) and smoothed using a Savitzky−Golay filter (three points, first-order, treating both contiguous spectral ranges separately).
To align the fluorescence images to the AFM coordinate system, each fluorescence image was rotated, scaled, and shifted to match it to its AFM topography counterpart. Fluorescence intensity was determined from the green channel of the fluorescence image because of the GFP filter used. Finally, for each location at which PTIR spectra were recorded, the fluorescence intensity was determined by performing a linear interpolation of the neighboring fluorescence image pixels. Analytical Chemistry pubs.acs.org/ac Article Chemometric Modeling of Fluorescence. A PLS regression was performed to fit the fluorescence using PTIR spectra. The optimum number of components was determined using 5-fold cross validation with the root mean square error (RMSE) as metric. Modeling was performed using the scikitlearn 64 (v0.22.2) machine learning library for Python 3.
To diagnose which spectral features were most important for the regression, the selectivity ratio (SR) 65 of the PLS was calculated. In brief, the SR determines the fraction of the variance of a variable that is explained by the model. High SR at a specific wavenumber means that a large part of it is correlated to the target variable. The SR was calculated as outlined by Farreś et al. 65 ■ RESULTS AND DISCUSSION QM6a SecEYFP was grown on Mandels−Andreotti medium containing lactose because this carbon source induces cellulase production in T. reesei. 66 In the strain QM6a SecEYFP, the cellulase production is accompanied by EYFP production. Furthermore, the EYFP carries the same signal peptide as the main cellulase, hence the fluorescence intensity correlates with the presence of cellulase. After deposition on the sample carrier, first, a bright-field (see Figure 1b) and fluorescence image (see Figure 1a) were collected before transferring the sample into the AFM instrument. AFM topography images (see Figure 1c) and PTIR spectra (Figure 1d) were recorded in the same area. After overlaying the fluorescence image and AFM topography image, each PTIR spectrum could be assigned a fluorescence brightness (denoted by the color of PTIR spectra and location markers in Figure 1d,c, respectively), as described in the Experimental Section.
To establish a regression from PTIR spectra to fluorescence intensities in the training set, a PLS model was constructed. The metric used here for the quality of the fit, RMSE, was 11% of the maximum fluorescence measured, indicating that the average error of the fit was 11% of the maximum fluorescence.
A cutoff between statistically significant and not statistically significant variables according to the SR is given by the Ftest. 65 Here, we use the critical value at a 90% confidence level (the horizontal line in Figure 2). It should be noted that a low SR in a part of the spectrum could be caused by low signal to noise, high variance in the spectra uncorrelated to the target variable, or the absence of biologically relevant spectral information.
Wavenumbers between 1618 and 1634 cm −1 have the highest SR with the maximum occurring at 1628 cm −1 . This area of the spectrum is part of the amide I band (1600 to 1700 cm −1 ) which is sensitive to the protein concentration and protein secondary structure. The central part of this range (here marked with α) contains spectral signatures of α-helix secondary structure amide backbones, as well as contributions from turns and disordered proteins. The bands between ∼1600 and 1634 cm −1 (maximum at 1628 cm −1 ) and between 1670 and 1700 cm −1 (maximum at 1685 cm −1 ) correspond to proteins with a β-sheet secondary structure. 67−71 EYFP, the protein used for the calibration of this model, has a β-barrel structure; 72 however, because the expression of EYFP is driven by the cbh1 promoter and EYFP is fused to the secretion signal peptide, it should follow the same path and be colocalized with CBHI, the most abundant cellulase, 55 whose structure also contains β-sheets. 73 Furthermore, the presence of lactose causes not only the expression of CBHI and EYFP but also of other cellulases such as CBHII, EGLI, and EGLII and the expression of some xylanases. 66,74 Of these, EGLI and the major xylanase XYNII have structures which are mostly composed of β-sheets. 75,76 Because PTIR would pick up a signal from all the β-sheet containing proteins, this means that proteins with this type of structure whose distribution coincides with that of EYFP contribute to the model and are in turn detected by it. Furthermore, as the literature suggests that these cellulases and xylanases are significantly more abundant than EYFP, 77 we posit that our model is mainly sensitive to these proteins, while EYFP only serves as a necessary crutch to establish the regression.
The absorption band at 2850 cm −1 also has a high SR. The cellulases, xylanases, and the accompanying EYFP are the proteins tagged for secretion with a signal sequence. During translation or post-translation, the proteins carrying this peptide sequence are translocated into the ER lumen where they are modified (e.g., glycosylation) and folded to their correct conformation with the help of chaperones. Correctly folded proteins are then transported in vesicles to the Golgi complex where they undergo further modifications and are then placed in vesicles once again, this time bound for the plasma membrane and, ultimately, the outside of the cell. 78 Thus, these proteins are surrounded at every step by lipid membranes, either from the vesicles or from the ER or the Golgi apparatus. This colocation of the enzymes with lipids likely explains the high SR of CH 2 stretch vibrations at 2850 cm −1 . Having thus established, that there are indeed local differences in the protein concentration across the hypha and that these are used to model the fluorescence intensity rather than a spurious correlation with an unrelated latent variable or even noise, the model was applied to a sample outside the training set.
A second PTIR data set (testing set) was collected and pretreated in the same way as the training set before applying the model. The model yielded a slightly worse RMSE of 13% of the maximum value for the testing set than was achieved for the training set with a comparable distribution of residuals ( Figure 3). Because the RMSE did not have a large increase from the training to the testing set, we can assume that the model applies well to data which was not part of its training set and that it is not overfitting. Hence, the model can be applied to PTIR spectra outside its training set.
Using the testing set also allows to demonstrate one important property of using PTIR to determine the local fluorescence brightness: by collecting PTIR spectra in a grid, a fluorescence image can be calculated using the PLS model that closely matches the original fluorescence image (see Figure 4). However, as PTIR has a significantly higher spatial resolution Analytical Chemistry pubs.acs.org/ac Article D (approximately 20 nm) than fluorescence microscopy, the PTIR image here is significantly under sampled. This likely explains some of the lack of fit in Figure 3 and part of the RMSE, as small local differences in protein concentration are picked up by the PTIR but not by fluorescence microscopy. However, a PLS model with an appropriately chosen number of components will still fit the general trend in the data set, disregarding small deviations in a few measurement locations.

■ CONCLUSIONS AND OUTLOOK
In this work, we demonstrated that PTIR spectroscopy and multivariate modeling enabled spatially resolved determination of the presence of proteins. The model was applied here to determine the distribution of cellulases and xylanases that contain β-sheets in a technologically relevant fungus, T. reesei.
Because of the well-established properties of mid-IR spectroscopy, 24 the procedure should translate well to other microorganisms and types of cells as well as to other properties beyond distribution of proteins such as the presence of inclusion bodies or metabolic imbalances in cells. The only requirement for applying this technique is an external reference that can be used to establish the chemometric model. The calibration procedure could be improved through the use of confocal fluorescence microscopy in future studies. It should be noted, that while in this study, the label was expressed by the microorganism together with the molecule of interest, this is not strictly required. 24 Therefore, label-free applications of this method can be conceived. Furthermore, as recent works have demonstrated PTIR measurements in aqueous media, 49 PTIR-based imaging of living microorganisms would appear to be a promising endeavor.
■ ASSOCIATED CONTENT

* sı Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.0c02228. Raw data and script files to perform the data analysis as described in the manuscript (zip archive) (ZIP) PTIR spectra of the T. reesei strain used in this work, a wild type, and a super producer strain (PDF)  Analytical Chemistry pubs.acs.org/ac Article evaluation. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.