Bone Proteomics Method Optimization for Forensic Investigations

The application of proteomic analysis to forensic skeletal remains has gained significant interest in improving biological and chronological estimations in medico-legal investigations. To enhance the applicability of these analyses to forensic casework, it is crucial to maximize throughput and proteome recovery while minimizing interoperator variability and laboratory-induced post-translational protein modifications (PTMs). This work compared different workflows for extracting, purifying, and analyzing bone proteins using liquid chromatography with tandem mass spectrometry (LC–MS)/MS including an in-StageTip protocol previously optimized for forensic applications and two protocols using novel suspension-trap technology (S-Trap) and different lysis solutions. This study also compared data-dependent acquisition (DDA) with data-independent acquisition (DIA). By testing all of the workflows on 30 human cortical tibiae samples, S-Trap workflows resulted in increased proteome recovery with both lysis solutions tested and in decreased levels of induced deamidations, and the DIA mode resulted in greater sensitivity and window of identification for the identification of lower-abundance proteins, especially when open-source software was utilized for data processing in both modes. The newly developed S-Trap protocol is, therefore, suitable for forensic bone proteomic workflows and, particularly when paired with DIA mode, can offer improved proteomic outcomes and increased reproducibility, showcasing its potential in forensic proteomics and contributing to achieving standardization in bone proteomic analyses for forensic applications.

Random-forest imputation is commonly used in proteomic workflows since it gives consistent and reliable performance with a variety of datasets (Egert et al., 2021;Jin et al., 2021;Stekhoven & Bühlmann, 2012).However, it was compared to the above methods to validate its performance with the different workflow datasets in this investigation.
For each workflow, proteins were matched between comparator datasets.Missingness is summarised in Figures S2, S3 and S4 for workflows one, two and three, respectively.Artificially missing data was introduced for each workflow to compare imputation methods (Figure S1).Random forest imputation outperformed the other techniques for each workflow (Tables S2, S3 and S4) and was used thereafter to impute missing data.For a given protein or peptide, imputation was performed if <50% (protein) or <35% (peptide) data was missing across the cohort.These cut-offs were empirically determined by artificially removing data (from 30%-50% at 5% intervals) and assessing concordance between true and imputed data values.

Choosing a Missingness (%) cut-off threshold:
The missingness (%) cut-off threshold was decided based on comparing the distributions for protein normalised abundances and peptide modification ratios per workflow.They were compared within increasing imputation thresholds from 30%-50% in 5% intervals.There were no major changes to the distribution of proteins after imputation up to 50% missingness; for peptides, changes in the distribution of peptides were evident above 35%.Therefore the missingness (%) cut-off threshold for proteins was set at 50% and for peptides at 35%; proteins/peptides above their respective missingness cutoff were removed from further consideration.

Proteome Coverage
Coverage in these studies was defined as the proportion of unique identifiable peptide sequences among the identified proteins, compared to the entire database reference sequences.
All proteins prior to missingness cleaning, as well as non-proteotypic proteins and those with <2 unique peptides were included.Coverage was calculated at the individual sample level and per group.Proteins between each workflow were matched to compare coverage per protein of interest.
Statistical analysis was conducted on the normalised coverage values using a Welch T-Test for both a group comparison and per protein comparison (Table S5).

Procopio and Buc kley vs S−Trap Samples PTM Ratios
LGLGHNQIR2_

Fig
Fig. S2 (A) -Missingness reported for Procopio and Buckley dataset of Workflow One (B) -Missingness reported for S-Trap dataset of Workflow One.
Fig. S5 -Heatmap with Euclidean hierarchical clustering between workflow one subgroups 'Procopio and Buckley' and 'S-Trap' for modified peptides.Scale is in modification ratio percentage (%).