Proteome Variation with Collagen Yield in Ancient Bone

Isotope analyses are some of the most common analytical methods applied to ancient bone, aiding the interpretation of past diets and chronology. For this, the evaluation of “collagen yield” (as defined in radiocarbon dating and stable isotope research) is a routine step that allows for the selection of specimens that are deemed adequate for subsequent analyses, with samples containing less than ∼1% “collagen yield” normally being used for isotopic analysis but discounted for radiocarbon dating. The aims of this study were to use proteomic methods of MALDI–TOF (matrix assisted laser desorption ionization time-of-fligh mass spectrometry) and LC−ESI−MS/MS (liquid chromatography electrospray ionization tandem mass spectrometry) to investigate the endogeneity of the dominant proteinaceous biomolecules within samples that are typically considered to contain poorly preserved protein. Taking 29 archaeological samples, we evaluated the proteome variability between different acid-soluble fractions removed prior to protein gelatinization and considered waste as part of the radiocarbon dating process. We then correlated these proteomes against the commonly used “collagen yield” proxy for preservation. We found that these waste fractions contained a significant amount of both collagenous and noncollagenous proteins (NCPs) but that the abundance of these was not correlated with the acquired “collagen yield”. Rather than a depleted protein load as would be expected from a low “collagen yield”, the variety of the extracted NCPs was comparable with that commonly obtained from ancient samples and included informative proteins useful for species identification, phylogenetic studies, and potentially even for isotopic analyses, given further method developments. Additionally, we did not observe any correlation between “collagen yield” and peptide mass fingerprint success or between the different fractions taken from the same sample but at different radiocarbon pretreatment stages. Overall, these findings highlight the value in retaining and analyzing sample fractions that are otherwise discarded as waste during the radiocarbon dating process but more importantly, that low “collagen yield” specimens that are often misinterpreted by archaeologists as being devoid of protein can still yield useful molecular sequence-based information.


■ INTRODUCTION
The analysis of stable isotopes in bones and teeth is widely used in archaeological and paleontological sciences due to its potential to address questions related to past human activities and ecology. 1−3 In particular, while radiocarbon dating allows the determination of the absolute age of archaeological samples, stable isotope analysis facilitates investigations into past human diet and ecological history. 4 Stable isotopes most widely used for this type of analysis are carbon (δ 13 C) and nitrogen (δ 15 N) ratios. 5 This is due to their differential incorporation into the organic and inorganic phases of bones and teeth during their biosynthesis and remodeling 6 that reflects the dietary habits of the specific individual. After death, isotopes are no longer introduced into the tissues, and while radioactive isotopes such as 14 C start decaying, the relative concentration of stable isotopes remains constant. Isotopic analyses in bones are most commonly conducted on proteins and in particular, on collagen, the most abundant protein in modern bone, accounting for ∼85−90% of the whole bone proteome. 7 For this reason, the utility of the "collagen yield" (as defined in radiocarbon dating and stable isotope research and sometimes also referred to as "gelatine yield", 8−10 see section "Collagen Yield" in Experimental Section) from archaeological specimens has been the subject of interest for decades. Particularly relevant here is the commonly held belief that samples yielding less than 1% "collagen" should be excluded from radiocarbon dating as a result of their increased susceptibility to contamination from exogenous proteins or carbon sources, 11 which would lead to incorrect data interpretations. Samples that were pretreated with solvents are indicated with "Y" in the "Solvent wash" column. This step was only applied to samples that were suspected of containing exogenous carbon derived from conservation treatment, as is standard procedure for radiocarbon dating. 20 Samples from which the second prewash (HCl) fraction was sampled are indicated by "A", and samples from which the incubation (HCl) fraction was sampled are indicated by "B" in the "Fractions sampled" column (for details see text). "AG" and "AF" collagen yields refer to measurements taken before and after ultrafiltration, respectively. Context information contains, where known, excavation trench ("Trench"), excavation square ("Sq."), and stratigraphic unit ("Layer"). For the latter, at Kozarnika, Arabic To date, several protocols have been proposed for "collagen" extraction for radiocarbon and stable isotope analysis, 8,12−17 and these pretreatment protocols generally share three main strategies: to demineralize the sample; to remove contaminants such as humic acids, soil contaminants, bone lipids, and exogenous proteins; 18 and finally to solubilize proteins. Several methodological steps are then applied to extract the final "collagen" fraction, and each of these produces liquid fractions that are normally discarded. Previously, Wadsworth and Buckley 19 showed that discarded biomolecular material from these procedures contains numerous noncollagenous proteins (NCPs), with the highest protein variety and abundance being present in the base-soluble fractions obtained from preliminary wash steps performed during the extraction of "collagen" for isotopic or radiocarbon analyses.
The overall process for pretreatment of our samples for collagen extraction at the Oxford Radiocarbon Accelerator Unit (ORAU) consists briefly of the following steps: (1) weighing of the pulverized bone sample, (2) pretreatment of the samples with solvents only if exogenous carbon from conservation/restoration efforts is suspected, (3a) first prewash step with acid, (3b) second prewash step with acid, (3c) incubation step with acid (i.e., demineralization), (3d) cleaning with acid, (4) gelatinization of the sample and extraction of unpurified collagen, (5) freeze-drying of unpurified collagen and weighing (obtaining "after gelatinization" or "AG" collagen yield), (6) collagen hydrolysis and ultrafiltration to obtain purified collagen, and (7) freeze-drying of purified collagen and weighing (obtaining "after filtration" or "AF" collagen yield). 20 Several studies have been conducted so far in order to find a cheap prescreening methodology for archaeological bones to determine if they could be suitable for radiocarbon dating and isotopic analysis or if they should be discarded from the study. 21 Previous studies proposed the use of attenuated total reflection Fourier transform infrared (ATR−FTIR) spectroscopy as a way to assess the collagen content of bones prior to subjecting them to subsequent disruptive processes such as isotopic, 22 palaeoproteomic, 23−25 and palaeogenetic 26 analyses. Additionally, Harvey et al. 27 proposed the use of Zooarchaeology by Mass Spectrometry (ZooMS) collagen fingerprinting to prescreen bone fragments for radiocarbon dating as a means to evaluate the collagen integrity of bone remains. ZooMS 28 is a peptide fingerprinting method based on mass spectrometric analyses (MALDI−TOF MS) that has been successfully applied in archaeology for species identification of bone fragments too small for conventional (i.e., morphological) identification. 29−32 Harvey et al. 27 found this collagen fingerprinting method to exhibit a 100% success rate with regard to successfully categorizing samples as suitable for dating or not, although further comparisons on a much wider range of specimens from different environments would need to be carried out to validate this. Regardless, this figure is significantly larger than the maximum success rates achieved using %N or C/N investigations (84% and 71%, respectively), which remain the two most commonly used screening techniques for commercial analysis. 33 Despite the notably greater abundance of collagen in bones in comparison to NCPs, proteomic studies on ancient materials are becoming increasingly popular for clarifying which other proteins survive for prolonged times in archaeological specimens. 34−36 Beyond interpretation of the dominant peptide signals in bone that are largely derived from collagen, a more complex mixture of proteins within a sample can be better analyzed using tandem mass spectrometry (MS/ MS), whereby identification is achieved by comparing the collective tandem spectra to sequence databases. 37 NCPs can also be used to obtain phylogenetic information, 38 provide insights into protein diagenetic alternations, 39 as well as on geological 36 and chronological 40 age of specimens and potentially assist in species identification of bone remains. 41 Moreover, NCPs could also provide an approach to performing isotopic analysis on proteins other than collagen. The opportunity to obtain informative NCPs (both for isotopic analyses and for species identification) and diagenetic information on samples normally considered too poorly preserved for radiocarbon dating would be extremely advantageous, especially for precious archaeological artefacts and human remains. Wadsworth and Buckley 34 showed that several NCPs, such as serum albumin, fetuin-A, biglycan, chondroadherin, PEDF, lumican, and prothrombin can be recovered from ancient bone samples up to 900 ka and that in general, the proteome complexity of such samples is inversely proportional to geological age. They also postulated that fetuin-A and albumin might be the most useful NCPs commonly found in ancient samples regardless of their absolute age, proposing that fetuin-A could be the most suitable of all NCPs detected for phylogenetic analysis. 34 Therefore, the aim of this study was to investigate the survival of NCPs in archaeological specimens, comparing samples classified (by ORAU criteria) as low "collagen yield" samples (hereafter this terminology from radiocarbon dating is preserved for consistency with the literature, despite the fact that the extracted proteome is not purely collagen), with those deemed to have sufficient "collagen yield" for radiocarbon dating. We also aimed to compare the proteomes observed in different fractions obtained from the same sample, focusing on the second prewash and the acid incubation (demineralization) steps with hydrochloric acid, which form part of the routine radiocarbon bone pretreatment protocol at ORAU. Finally, we aimed to investigate limitations of cross-species proteomics in such analyses, investigating differences that may help the taxonomic verification of the samples using LC−ESI−MS/MS for those which generate poor peptide mass fingerprints (via ZooMS), 28 owing to the relatively low abundance of collagen present in the samples.

■ EXPERIMENTAL SECTION
This study was conducted on 29 specimens from four different archaeological sites (Table 1). 14 samples were collected from Kozarnika, a cave in northwestern Bulgaria, 12 were collected from Temnata Dupka, a cave in Bulgaria located about 52 km north of Sofia city, two were collected from Maŕiaremete (Remete Felso), a cave in Hungary located in the Bakony Mountains, and one sample was collected from Manastira, a cave located in Bulgaria in the Oblast Veliko Tarnovo region.

Protein Extraction
In all 29 cases, the subsamples for proteomic analysis were collected from waste fractions deriving from the radiocarbon dating pretreatment protocol for dating bone collagen. 20 Bone samples were carefully drilled to collect fine bone powder (410−1170 mg). A sequential solvent wash pretreatment (acetone, methanol, then chloroform [Sigma Aldrich, UK]) was applied to 11 samples, as is customary at ORAU for samples that are suspected to contain exogenous carbon derived from conservation treatment ( Table 1 and Figure 1).  (Table 1 and Figure 1), thereby increasing the robustness of our analyses by helping to evaluate whether NPCs are fraction specific. The resulting total of 44 subsamples was split into two fractions each, 0.5 mL for freeze storage (backup) and 0.5 mL for further treatment and analyses (see Table S1 for accession names for LC−ESI−MS/ MS analysis and associated sample names in the manuscript). Samples were ultrafiltrated using 10 kDa molecular weight cut-off filters(MWCO) (Vivaspin, UK) and were buffer exchanged into 50 mM ammonium bicarbonate (ABC, Sigma-Aldrich, UK). Extracted proteins were reduced using 5 mM dithiothreitol (DTT, Sigma-Aldrich, UK) for 40 min at R/T, alkylated with 15 mM iodoacetamide (IAM, Sigma-Aldrich, UK) for 45 min in the dark at R/T, and quenched with a further amount of 5 mM DTT as above. Proteins were then digested with 1 μg of sequencing grade trypsin (Promega, UK) at 37°C for 5 h. Digestion was stopped by adding 1% trifluoroacetic acid (TFA, Sigma-Aldrich, UK) (to a 0.1% TFA concentration) and then samples were desalted, purified, and concentrated with OMIX C18 reversed-phase Zip-Tips (Agilent Technologies, UK) following manufacturer's protocols. An elution buffer was prepared by mixing acetonitrile (ACN, Sigma-Aldrich, UK) with water and TFA to obtain 50% ACN/0.1% TFA. Peptides were eluted from the Zip-Tips in 100 μL of 50% ACN/0.1% TFA, then samples were dried under a fume cupboard for 1 day, and subsequently resuspended in 20 μL of 5% ACN/0.1% TFA for subsequent MALDI-ToF-MS and LC−ESI−MS/MS analysis.

ZooMS MALDI-ToF-MS Analyses
1 μL of each digest was cocrystalized with 1 μL of 10 mg/mL alpha hydroxycinnamic acid in 50% ACN/0.1% TFA and allowed to air dry on a stainless steel MALDI target plate. Up to 2,000 laser acquisitions were acquired over a m/z range 700−3700 following Buckley et al. 28 and compared to the range of megafaunal collagen peptide mass fingerprints acquired for fauna typical of the European Palaeolithic. 42

LC−ESI−MS/MS Analyses
A total of 44 LC−ESI−MS/MS analyses were performed using an UltiMate 3000 Rapid Separation LC (RSLC, Dionex Corporation, Sunnyvale, CA, USA) coupled to an Orbitrap Elite (Thermo Fisher Scientific, Waltham, MA, USA) mass spectrometer (120 k resolution, full scan, positive mode, normal mass range 350−1500). Peptides were separated on an Ethylene Bridged Hybrid (BEH) C18 analytical column (75 mm × 250 μm i.d., 1.7 μM; Waters) using a gradient from 92% A (0.1% FA in water) and 8% B (0.1% FA in ACN) to 33% B in 44 min at a flow rate of 300 nL min −1 . Peptides were then automatically selected for fragmentation by data-dependent analysis; six MS/MS scans (Velos ion trap, product ion scans, rapid scan rate, centroid data; scan event: 500 count minimum signal threshold, top 6) were acquired per cycle, dynamic exclusion was employed, and one repeat scan (i.e., two MS/MS scans total) was acquired in a 30 s repeat duration with that precursor being excluded for the subsequent 30 s (activation: collision-induced dissociation (CID), 2+ default charge state, 2 m/z isolation width, 35 eV normalized collision energy, 0.25 activation Q, 10.0 ms activation time).

Proteomic Data Analysis
The collective tandem mass spectra (.mgf) files were then searched against the Swiss-Prot database for matches to primary protein sequences using the Mascot search engine (version 2.5.1; Matrix Science, London, UK), without specific taxonomy filters. Each search included the fixed carbamidomethyl modification of cysteine (+57.02 Da) and the variable modifications for deamidation (asparagine and glutamine, +0.98 Da) and oxidation of lysine, proline, and methionine residues (all +15.99 Da) to account for post-translational modifications and diagenetic alterations (the oxidation of lysine and proline is equivalent to hydroxylation). Enzyme Journal of Proteome Research pubs.acs.org/jpr Article specificity was selected as trypsin-P (first batch of analyses) and semiTrypsin (second batch of analyses) with one missed cleavage allowed; mass tolerances were set at 5 ppm for the precursor ions and 0.5 Da for the fragment ions. All spectra were considered as having either 2+ or 3+ precursors. Scaffold (v4.10.0, Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they exceeded specific search engine thresholds (i.e., the suggested peptide homology scores). Protein identifications were accepted if they contained at least 2 identified peptides. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. The display option chosen for this work was "total spectrum count" to allow a semiquantitative measurement of the proteins present in the sample. In order to count NCPs, we considered all proteins but excluded from the count collagenous proteins as well as common contaminants and non-intrinsic bone proteins, such as keratins and trypsin. In order to evaluate the presence of peptides that have originated by nonspecific trypsin cleavages, such as those cleaved through the process of diagenesis, an additional Mascot search was performed specifying the digestion enzyme as semiTrypsin.
Percentage coverages for specific selected proteins were extracted from Scaffold (display option selected "percent coverage"). RStudio software (version 1.3.959) was used to perform plots and to make statistical analyses using the library tidyverse and the package ggpubr. STRING software version 11.0 was used to calculate functional protein association networks.
"Collagen Yield" The "collagen yield" for each sample was calculated using data from the radiocarbon dating process, by following standard radiocarbon and stable isotope practices. As a result, we use standard radiocarbon dating terminology in the following description. There were two stages at which "bulk collagen weight" could be measured: before ("AG") and after ultrafiltration ("AF"). More precisely, after the fractions for proteome analyses were collected, the radiocarbon sample was further treated with 0.1 M NaOH (30 min, RT) and 0.5 M HCl (15 min, RT), gelatinized (20 h, 75°C), Ezee filtered and freeze-dried for 48 h. The weight of the resulting "bulk collagen" collected corresponds to the AG yield (AG refers to the protocol code used at ORAU 20 ). If the sample was deemed sufficiently well preserved, that is, AG yield exceeded 10 mg, the "bulk collagen" was subsequently hydrolyzed in 10 mL of ultrapure water, and filtered using 30 kDa MWCO ultrafilters (Vivaspin, UK) until circa 1.5 mL of solution remained. The retentate was freeze-dried for 24 h, and the "purified collagen" weighed, thus giving the AF yield. With the weight of the original bone powder before treatment considered 100%, the yield for both AG and AF "collagen" could be calculated. Where no information for the AF yield is provided in Table 1, the sample could not be ultrafiltered as a result of low "collagen" preservation determined from the AG yield. From a radiocarbon perspective, an AG yield of <1 wt % was classified as very poor preservation, 1−3 wt % as poor preservation (but would be dated), 3−6 wt % as low preservation, and 6−8 wt % as good preservation. Archaeological samples with an AG "collagen" yield of >10 wt % would be seen as well preserved. For comparison, fresh bone would result in a "collagen" yield of approximately 22 wt %. 11 ■ RESULTS

Comparison "Collagen Yield" and Number of NCPs
After refinement of the NCPs list per sample, the number of intrinsic NCPs obtained was compared with the derived "collagen yield" taken from ORAU analyses (specifically the "AG" yield as described in the Experimental Section, under "Collagen Yield") ( Figure 2 and Table 3) and the total spectral counts matched with collagen α-1(I) (hereafter COL1A1) and α-2(I) (hereafter COL1A2) were also compared with the "% AG collagen yield" (Figure 3). Results showed that "collagen yield" and proteome variety were not correlated (Pearson's correlation p-value = 0.8833 and correlation coefficient = 0.0228). Specifically, several samples that generated a "collagen yield" of <1% contained up to 14 NCPs (TD1.B, Figure 2), whereas other samples with "collagen yields" of >7% contained less than four NCPs. Samples MR1.A and MR1.B generated the highest "AG collagen yield" in the dataset but contained three and zero NCPs respectively ( Figure 2). Furthermore, "AG collagen yield" was not correlated with the total spectrum counts for COL1A1 and COL1A2 (Pearson's correlation pvalue = 0.1965 and 0.4691 and correlation coefficient = 0.1985 and 0.1120, respectively), with the majority of the samples showing total spectrum numbers ranging between 234 and 380 for COL1A1 and between 149 and 277 for COL1A2 despite having different "collagen yield" values and with only two samples falling outside this range (KZ-58 and MR1.B). Pearson's correlation index calculated between the total spectrum count of the most abundant NCP (biglycan) and the "AG collagen yield" for each sample (e.g., considering each fraction generated from each bone as an individual sample) resulted in a nonsignificant correlation (p-value = 0.9606 and correlation coefficient = −0.0077) (Figure 4).

Comparisons between Fraction A and Fraction B
There were 15 samples for which it was possible to collect analysis material at two different stages of the radiocarbon dating processnamely after the second acid wash (fraction A) and after the overnight acid incubation (fraction B). For these samples, we analyzed the variability in the concentration of collagen and NCPs between those two fractions. While variations were present, no significant correlation was present ( Figure 5). Despite some samples were characterized by similar amounts of COL1A1 but different amounts of NCPs in the  Journal of Proteome Research pubs.acs.org/jpr Article two fractions (e.g., MAN1, TD16, TD1, and TD4), others contained different amounts of COL1A1 but similar amounts of NCPs (e.g., TD5). Furthermore, some samples were characterized by similar amounts of COL1A1 and NCPs (e.g., TD19, TD20) ( Figure 5). As there are differences in the degradation rate of collagen and post-translational protein modifications (PTMs), and there may be differences in the degradation process between individual samples, we compared the coverage values obtained from fraction A and fraction B for COL1A1 and COL1A2, as well as the three most abundant NCPs found in our dataset (BGN, ALB, and AHSG). To avoid biases due to differences in the annotation of the protein databases of different species, this evaluation was limited to one species only, Bos taurus. Results showed an absence of any specific trends, with some proteins showing deeper coverage in fraction A compared to fraction B, and vice-versa (Table 4).

Protein Diagenesis
Semitryptic searches that were performed to evaluate the extent of diagenesis (Table S2) showed a consistent percentage of total spectrum counts for COL1A1 and COL1A2 peptides regardless of whether they were from high-or low-protein yield samples; the ratio of spectral counts identified using standard tryptic searches to the ones found using semitryptic searches was on average 46.6% and 46.7% for COL1A1 and COL1A2, respectively, in the 10 lowest protein yield samples, and 44.9% and 46.9% in the 10 highest protein yield samples, respectively.

■ DISCUSSION
In the following, we discuss our findings on the endogeneity of the dominant proteinaceous biomolecules within archaeological samples typically considered by radiocarbon and stable isotope specialists to be poorly preserved. Following our main aims, we focused on variability between samples and fractions, correlations between proteomic results and radiocarbon "collagen yield" and species identification. First, we examine whether our decision on which waste fraction from the radiocarbon dating pretreatment to sample influences our analysis results. Second, we discuss the influence of bone diagenesis. This includes a comparison between the number of identified NCPs and sample age and between radiocarbon "collagen yield" (AG) and protein coverage. We subsequently extend this to focus on nonspecific trypsin cleavages caused by Table 2. Journal of Proteome Research pubs.acs.org/jpr Article diagenesis. Third, we compare the success and nature of species identification results obtained through traditional (i.e., morphological) means, ZooMS, and proteomic analysis. Lastly, we take a closer look to assess how our results compare with other findings from the literature.

Proteomic Differences between Radiocarbon Dating Fractions
In this study, we compared the fractions obtained from the second HCl prewash step (fraction A) and the overnight HCl incubation (fraction B) to provide a comparison between the two processing stages (Table 1). While we did not observe any significant correlation between the concentration of collagen or NCPs and the fraction sampled, we did notice that, in general, greater variations between the two fractions can be observed for NCPs compared to COL1A1 abundance. In particular, NCPs were more abundant in fraction A than in fraction B in seven cases and were identical in one case. These results may be related to differences in the degradation rate of collagen and PTMs and in the diagenetic processes that could have affected the samples, with more fragmented proteins tending to be released during the second prewashing (fraction A) and with more intact proteins being released within fraction B. However, when comparing coverage values (restricted to Bos taurus) for COL1A1, COL1A2, and for three of the most abundant PTMs found in the dataset (BGN, ALB, and AHSG), no specific trends could be identified: some proteins showed a deeper coverage in fraction A and lower coverage in fraction B and vice-versa (Table 4).
We also did not observe any consistent change in the degradation of the proteome between specific fractions, as showed by the semitryptic searches (Table S2). For example, in some cases, fraction A had an increased percentage of collagen semitryptic peptides, and in other cases, fraction B had a higher amount of those. This may be due to the fact that bone demineralization was not complete after the second HCl treatment (fraction A), retaining sufficient material for proteome recovery after the overnight incubation (fraction B). By contrast, a previous study on modern bones showed that a demineralization length of 6 h allowed for a better proteome recovery than prolonged lengths (24 and 48 h). 43 It is worth emphasizing here that the radiocarbon samples had already completed a first HCl treatment for 2 h, followed by discarding of the soluble fraction of proteins, prior to the HCl treatment from which fraction A had been collected. We believe that this combined time has been enough to demineralize the ancient samples sufficiently to extract a high number of NCPs and that the subsequent overnight incubation step did not significantly improve the overall extraction of the proteins embedded in the mineral matrix, nor did it significantly increase the protein damage induced by the interaction with the acid.
Looking at the common contaminants usually found in bone samples such as keratins, we found that seven out of 15 samples contained keratins only in fraction A, six samples contained keratins in both fractions and only two samples contained keratins exclusive to fraction B (Table S3). These results show that, although the pretreatment step can remove some of the common modern contaminants known to affect the dating results, further steps should be carried out to ensure removal of all of them, otherwise contamination with modern carbon would still be expected in the gelatinized fraction.

NCP Presence versus Sample Age
The samples that contained ten or more NCP matches (TD1,  TD4, TD14, TD17, TD19,  , whereas TD4, one of the oldest specimens, had nine and 13 NCPs extracted from the prewash (fraction A) and from the acid incubation fraction (fraction B), respectively. Clearly the depositional environment would have played a major role in the survival of the NCPs in the specimens. For example, although samples MR1 and MR8 were excavated from the same cave, depth, and level, they were dated to the late Holocene and late Middle Palaeolithic to early Upper Palaeolithic periods, respectively, and they contained three and one NCP only, respectively, in both the analyzed fractions (A plus B). This suggests that the taphonomic processes that affected the bone proteome survival were more likely to be related to environmental factors than to aging phenomena.

"Collagen Yield" and Protein Coverage
We did not observe any obvious trends between a sample's "collagen yield" (as determined by the radiocarbon dating Journal of Proteome Research pubs.acs.org/jpr Article process) and the proteome complexity. We did identify specific globular serum proteins (such as albumin and fetuin-A) together with collagen-binding proteins (such as biglycan) in the majority of the samples, despite the large range in "collagen  To further investigate this lack of correlation, we also explored the percentage coverage for COL1A1, COL1A2, BGN, and ALB (Table 3). For this, we excluded samples identified as a horse (n = 17) due to an incompleteness of the database that would not allow a reliable evaluation of the percentage coverage of each protein. The percentage protein coverage ranged between 41 and 52% for COL1A1 and between 37 and 58% for COL1A2 and did not follow any specific trends related with the "collagen yield" of the samples. For example, the highest "collagen" yielding sample, MR1.B, had 41% coverage for both COL1A1 and COL1A2, whereas one of the lowest "collagen" yielding samples, TD1.A, had 51 and 55% coverage, respectively ( Table  3). The only exception was sample KZ-58, which yielded no collagen and poor coverage for both COL1A1 and COL1A2 (11% for both chains). The same trend was observed for the most abundant PTMs found in the dataset, namely BGN and ALB, which did not show any correlation between increasing "collagen yield" between samples and percentage coverage.

Protein Diagenesis
Results on the percentage of semitryptic peptides found in our samples showed a relatively high level of those (average ∼45%) in comparison with the amounts usually obtained when operating in optimal conditions (e.g., extracting proteins from fresh and modern tissues); the percentage of semitryptic peptides was very low, from less than 3% for soft tissues 44 to around 15% for hard tissues, where a demineralization step similar to the one used in this study is required to allow for protein extraction. 43 The frequency of diagenetically altered peptides can increase during both preparation and digestion of samples, depending on the protocol used for the extraction; however, in this case, results were notably higher than the percentages expected from modern samples. We also made a comparison among the percentage of semitryptic peptides and the collagen yield obtained from our samples and we noted that, overall, collagenous proteins accumulate damage over time in archaeological timeframes, regardless of the total amount of collagen that can be extracted from the samples. For BGN, we found that percentages for semitryptic peptides averaged 47.4% for the ten lowest "collagen yield" samples (as defined by radiocarbon) and 48.5% for the ten highest "collagen yield" samples. This result suggests that globular proteins are subjected to a very similar decay rate to that of collagen and shows that the damage of NCPs is not directly related with the amount of "collagen" extracted during the radiocarbon dating pretreatment.

ZooMS and Cross-Species Proteomics
Focusing specifically on low "collagen yield" samples (as defined by the radiocarbon dating analysis), we had three samples classified as having a yield of less than 1%, namely samples KZ-58, KZ-44, and TD1. KZ-58 was a morphologically unidentified mammal bone from Upper Paleolithic contexts of the Kozarnika site (Bulgaria) whose ZooMS analysis also failed in identifying its species. Although shotgun proteomic analysis revealed that the specimen of interest could be attributed to a bovine, a further search against a local database derived from protein BLAST searches of the cattle COL1A1 and COL1A2 sequences confirmed a greater match to a cervid sequence (with notable matches to peptide sequences GETGPSGPAGPTGAR, GAPGAVGAPGPA-GANGDR, and TGQPGAVGPAGIR differentiating them from cattle); interestingly, no NCPs were identified in this sample, consistent with its relatively poor molecular survival   Figure 6). The limitation of the LC−MS/MS identification in comparison with ZooMS is the incompleteness of the available proteomic databases for some species of animal, which, for example, does not allow for the distinction of bovine from cervine samples (Table 1). Conversely, advantages for the use of LC−MS/MS proteomic analyses in combination with ZooMS approaches include the possibility to look at peptides of NCPs for the identification of specimens characterized by a poor collagen fingerprint spectrum. Among all NCPs identified in this work, albumin, biglycan, thrombospondin-1, and chondroadherin were the ones that were identified in each of the four species present, and pigment epithelium-derived factor and prothrombin were identified in three out of four species (specifically in bovine, horse, and human) ( Table 2). Fetuin-A, a protein normally identified in ancient bones, was not found in any horse or cervine samples. We believe that this result is potentially due to the lack of completeness of the proteomic databases for these species rather than to the decay or failure in the extraction of this specific protein in these animal species. Because of the great potential that fetuin-A has to extract phylogenetic information from samples, we suggest the creation of an ad-hoc database with fetuin-A sequences for the species of interest in order to allow for the identification and matching of its peptides and finally its use to conduct phylogenetic and species identification studies. Interestingly, fetuin-A was successfully identified in both of the two fractions in one of the lowest "collagen yield" samples (TD1, whose "collagen yield" was between the second-lowest found in this work at 0.88%) that would normally have been discarded by ORAU for subsequent radiocarbon dating due to the scarce reliability that the measurements would have in these cases. Further research may help to better evaluate whether a sample could provide a reliable date, despite the low radiocarbon "collagen yield".

Influence of Protein Extraction Protocols from Ancient Bone
When comparing our findings with previous work 34 where acid-insoluble pellets were treated with guanidine hydrochloride (GuHCl) after the overnight incubation step in 0.6 M HCl and only this fraction (comparable to fraction B in this Journal of Proteome Research pubs.acs.org/jpr Article study) was analyzed, we found that the variety of NCPs found in our batch was smaller (maximum number of 15 NCPs, versus 30 NCPs found in the previous work), despite the totals over the dataset being similar, with 37 and 44 NCPs, respectively. This difference in protein number further supports the suggestion that incubation in GuHCl rather than HCl is a valuable method to increase the number of identified NCPs in bone samples, as has also been shown in a study conducted on ancient bovid teeth and mandible bones. 36 The most commonly identified NCPs in this study were the same as those found by Wadsworth and Buckley 34 despite the fact that in this study, there was no incubation step in GuHCl. The only two exceptions to this were the NCPs thrombospondin and SPARC (commonly found in this study but not in ref 34) and for lumican (mentioned in ref 34 but found only in a limited number of samples here).
Observing the methods achieved in previous work, 19 four bovid specimens (dated from ∼4 to 130 Ka) were treated with a similar protocol to the one used here, omitting the pretreatment using solvents but including a prewash step with 0.6 M HCl for 2 h at R/T (similar to fraction A) prior to incubation with the same acid overnight at R/T (similar to fraction B), with the two acid fractions having been pooled together, "RC sol-fraction". These results were comparable to those of this study. In particular, from four to ten NCPs were previously observed in the "RC sol-fraction", and fetuin-A, PEDF, ALB, biglycan, lumican, complement C3, decorin, and prothrombin were the most commonly identified NCPs, despite not being found in all samples. Despite the fact that the two acid fractions were not combined in our work, the comparability of the results obtained with other proteomic analyses on ancient bones 19 suggests that either of the acid soluble fractions generated during the processing of the samples for radiocarbon dating contain a substantial variety of NCPs that can be used for phylogenetic purposes and for cross-proteomics analyses, as well as potentially for isotopic and radiocarbon studies.

■ CONCLUSIONS
Overall, our results show that the indication provided by the "collagen yield" of archaeological samples (as defined in radiocarbon and stable isotope studies) should be used with caution, in that what may be considered "poor collagen" specimens for isotopic purposes may not necessarily be so for yielding proteomic information. Furthermore, our results support the previous studies highlighting that even the fractions that are typically discarded during the collagen extraction process can yield useful proteomes, with both the prewash and the acid incubation fractions containing several NCPs that can be successfully used to determine species identity. Moreover, the proteins contained in the acid fractions may be adequate to conduct isotopic studies and radiocarbon dating of the specimens; in fact, results showed that the total number of spectra found for either collagen α-1, collagen α-2, and for NCPs can be sufficient to conduct these types of studies despite the poor "collagen yield" calculated from the samples and the complete lack of correlation between these two variables. We have not found a clear correlation between proteome variety and age of the specimens (which stands in contrast with other findings from other datasets) but rather that the depositional environment played a more important role in the survival of specific proteins over any aging phenomena. We also showed here that both "fraction A" and "fraction B″, originated during the collagen extraction methodology, can contain a high number of NCPs. Additionally, the overall level of protein decay in the two fractions is comparable and common contaminants, such as keratins, are less abundant in the second fraction than in the first one. Finally, we show that LC−MS/MS proteomic analysis can be valuable in identifying samples that fail species identification through ZooMS collagen peptide mass fingerprinting.
Supporting Table S1: Accession names found on PRIDE and associated sample names found in the manuscript; Supporting Table S2: Full tryptic peptides (total spectrum count) and semitryptic peptides (total spectrum count) for COL1A1, COL1A2, and BGN for the ten lowest and from the ten highest collagen yield samples, percentages for the amount of semitryptic peptides and average for the ten lowest and ten highest collagen yield samples; Supporting Table S3: Keratins found in fraction A and fraction B (PDF) ■ ACKNOWLEDGMENTS