Automated Annotation of Sphingolipids Including Accurate Identification of Hydroxylation Sites Using MSn Data

Sphingolipids constitute a heterogeneous lipid category that is involved in many key cellular functions. For high-throughput analyses of sphingolipids, tandem mass spectrometry (MS/MS) is the method of choice, offering sufficient sensitivity, structural information, and quantitative precision for detecting hundreds to thousands of species simultaneously. While glycerolipids and phospholipids are predominantly non-hydroxylated, sphingolipids are typically dihydroxylated. However, species containing one or three hydroxylation sites can be detected frequently. This variability in the number of hydroxylation sites on the sphingolipid long-chain base and the fatty acyl moiety produces many more isobaric species and fragments than for other lipid categories. Due to this complexity, the automated annotation of sphingolipid species is challenging, and incorrect annotations are common. In this study, we present an extension of the Lipid Data Analyzer (LDA) “decision rule set” concept that considers the structural characteristics that are specific for this lipid category. To address the challenges inherent to automated annotation of sphingolipid structures from MS/MS data, we first developed decision rule sets using spectra from authentic standards and then tested the applicability on biological samples including murine brain and human plasma. A benchmark test based on the murine brain samples revealed a highly improved annotation quality as measured by sensitivity and reliability. The results of this benchmark test combined with the easy extensibility of the software to other (sphingo)lipid classes and the capability to detect and correctly annotate novel sphingolipid species make LDA broadly applicable to automated sphingolipid analysis, especially in high-throughput settings.

Note S-2. Algorithmic Differences between LDA and MS-DIAL.
The basis of the LDA's decision rule sets is an integrated combination of fragment rules (m/z values of fragments) and intensity rules (intensity relations between fragments). Both of these rules are utilized to verify structural annotation levels. Accordingly, the concept is organized hierarchically, i.e., there are rules to verify the lipid subclass/adduct, followed by rules for the fatty acid chain assignment (FA and/or LCB), and if possible, rules to determine the position of the chains.
In comparison, the MS-DIAL 20 hybrid scoring system consists of two consecutive steps (see Extended Data Fig.   1 of the MS-DIAL 4 publication on page 10): • An MS/MS similarity search by a dot product and a reverse dot product to remove noisy spectra.
• A decision tree algorithm that works on distinct fragments to avoid structural overinterpretation. This decision tree algorithm has some common characteristics with the LDA approach using decision rule sets: i) decisive fragments are defined that must be present to verify the lipid subclass/adduct and derive chain information; ii) for validity, these fragments must exceed a certain intensity threshold (in relation to the base peak).
It is not apparent whether the observed difference in performance is a matter of step one or step two. There are reasons for both of them, which we will explain in the following. The difference cannot be explained by the number of used fragments: In the pages 60-68 of the Supplementary  and m/z 644 in LDA) as subclass/adduct specific fragments. For LCB identification, both use the fragments at m/z 266 and m/z 284, and a fragment at m/z 254 that was not detected in the LDA example. Additionally, the LDA uses the fragment at m/z 302 to distinguish dihydroxylated from trihydroxylated LCBs.
Of note, the 'ideal' MS-DIAL spectrum shows little similarity with the experimental one (both shown on page 61 in Supplementary Fig. 1 of the MS-DIAL publication). Such differences in the spectra might cause removal by the S6 similarity search used in MS-DIAL. From our experience, it is not essential to exclude noisy spectra entirely from the annotation process, but it is sufficient to remove the noise within the spectra (see Supplementary Note 3 of the LDA 2 publication 10 ). As can be seen in the figure of Supplementary Note 3, even in noisy spectra it is possibly to reliably detect the important fragments. The key to do this is to define relations between fragments, a feature that is completely missing in the MS-DIAL decision tree concept. These intensity relations are typically quite stable between platforms, e.g., relations between the produced LCB (or SPB) fragments. In contrast to the missing intensity relations, MS-DIAL uses intensity cutoffs in relation to the base peak. By comparing the intensities of the SPB (LCB) fragments and the NL_H2O fragments of the 'ideal' spectrum to the two experimental ones in the MS-DIAL and the LDA figure, it is obvious that these relations are subject to high variations between MS platforms and collision energies. Such intensity rules can be easily adapted by the LDA decision rule concept to match specific MS instruments, making it platform independent. This is the reason why we implemented a flexible solution that can be readily adjusted by the end user, and not a rigid database that is difficult to modify. Another important feature that is missing in the MS-DIAL concept are rules to differentiate between isobaric/isomeric species. In contrast, the LDA can differentiate between, e.g., di-and trihydroxylated LCBs, and spectra from protonated species and protonated species that lost water.
In summary, we assume that the lower performance of MS-DIAL compared with LDA rests on two limitations: i) MS-DIAL uses a rigid database for the decision tree fragments and intensity thresholds; this global concept ignores any platform specific differences, and provides no means for adaptation; ii) the absence of means to algorithmically differentiate between isobaric/isomeric species, which is particularly problematic when they produce similar fragmentation patterns. The authentic standards were prepared in two mixes to avoid overlaps. The last column indicates the assigned mix.  NL_H2O is typically the base peak for all protonated Cer species, and must be observed. For the annotation of a monohydroxylated LCB, the protonated LCB (LCB-Ion) and the protonated LCB that lost water must be found (LCB-H2O). The LCB-H2O is quite strong, and must have at least an intensity of 40% of the base peak, and the intensity of the LCB-H2O times 1.5 must be stronger than the one of the LCB-Ion. To exclude false positive dihydroxylated LCB identifications, a fragment LCB-H2O minus one carbon is defined, that must be smaller than 2% of the LCB-H2O. NL_H2O is typically the base peak for all protonated Cer species, and must be observed. Additionally, a fragment with two water losses of the precursor is observed quite often (NL_2xH2O_36). LCB fragments are rather small for protonated dihydroxylated Cer species, and will be explained in the following zoomed figure. S11 Dihydroxylated LCBs show the protonated LCB minus one water (LCB-H2O) and the LCB minus two water (LCB-2H2O), where the latter one must be observed. For saturated species, typically the LCB-H2O is stronger than the LCB-2H2O; for unsaturated species, it is the other way round. Since the intensity relation between LCB-H2O and LCB-2H2O can change, the intensity relation between both is formulated rather lenient as seven times the intensity of the LCB-2H2O must be greater than the LCB-H2O. Furthermore, the LCB-2H2O fragment must be less intense than 60% the NL_H2O. To remove false positive trihydroxylated identifications, the LCB-tri_WRONG fragment is defined which is essentially the protonated LCB. This fragment must be smaller than two times the LCB-H2O, and 80% of its intensity must be smaller than the NL_2xH2O fragment. Sometimes, an additional fragment is observable that has the mass of LCB-H2O minus one carbon. This fragment must be smaller than the sum of the intensities of LCB-H2O and LCB-2H2O. NL_H2O is typically the base peak for all protonated Cer species, and must be observed. Additionally, a fragment with two water losses of the precursor is observed quite often (NL_2xH2O_36), which is typically stronger than for dihydroxylated species. LCB fragments are rather small for protonated dihydroxylated Cer species, and will be explained in the following zoomed figure. Spectra of precursors from protonated monohydroxylated Cer species that lost one water are primarily dominated by a fragment from the protonated LCB that lost one water (LCB-H2O), which is mandatory. For bigger molecules, neutral losses of the LCB might be observed. Since none of these fragments are reliably present, they are not used in any decision. The intensity rules are the same as for the protonated spectra (see monohydroxylated LCB [M+H-H2O]+) with the addition that LCB-H2O must be greater than 25% of the base peak, and the NL_H2O must be smaller than 5% of LCB-H2O. . The class specific fragments are neutral losses of formaldehyde, formaldehyde and water, and methanol, and an unspecific loss of water may be observed. For class verification, any of the specific fragments must be found. The sum of the intensities of those specific fragments is used in various intensity rules. This sum must exceed 90% of the precursor. In order to exclude false positive formate adducts, the neutral loss of formic acid must be smaller than 30% of this sum. This sum must exceed 50%, 15% and 5% of the base peak, for species containing two, three and four hydroxylation sites, respectively. For chain identification, deprotonated LCB that loses ammonia and water (SPH_fragment) and the FA chain as deprotonated carboxylate (Carboxy) and as ketene (Ketene) can be found. The former two are mandatory. There are two intensity rules, i.e. the carboxy fragment times 2.5 must be greater than the Ketene fragment, and the same fragment times 3.5 must be greater than the SPH_fragment. Additionally, an Carboxy_iso fragment is defined to exclude false positive chains. Since this mass would be the same as the plus one isotope of the Carboxy fragment, we called this indicator for false positives 'Carboxy_iso'. The Carboxy fragment times 1.2 must be greater than the Carboxy_iso fragment.

S19
The fragmentation rules for class specific fragments are described in the previous figure (see Cer dihydroxylated LCB [M-H] -). The detectable chain fragments are essentially the same as for dihydroxylated species, except for the SPH_fragment, which is now the deprotonated LCB that loses CH7NO (SPH_fragment_3). For this fragment, the intensity rule is that the Carboxy fragment times six is higher than the SPH_fragment_3. Class specific fragments are the neutral loss of phosphate (NL_phosph) and water (NL_H2O). NL_phosph must be greater than 2% of the NL_H2O. The sum of both intensities must be greater than 40% of the base peak, and greater than a potentially overlapping neutral loss (141) of a PE head group. Furthermore, the NL_phosph must be greater than 2% of the base peak. The fragment rules are not very specific, thus, a rule was added accepting protonated adducts only if at the same retention time a protonated adduct that lost water or a sodiated adduct is detectable (MS 1 peak is sufficient). For chain annotation, only a single fragment is detectable, i.e. the protonoated LCB that lost water (LCB-H2O). This fragment must be smaller than 10% of the base peak. Class specific fragment is the neutral loss of phosphate (NL_phosph). The intensity of this fragment must be greater than 30% of the base peak, and the intensity times two must be greater than a potentially overlapping neutral loss (141) of a PE head group. These rules are not very specific, thus, a rule was added accepting protonated adducts only if at the same retention time a protonated adduct that lost water or a sodiated adduct is detectable (MS 1 peak is sufficient). For chain annotation, only a single fragment is detectable, i.e., the protonoated LCB that lost two water (LCB-2H2O). This fragment must be greater than 2% of the base peak and the NL_posph. Class specific fragments are the neutral loss of phosphate (NL_phosph) and water (NL_H2O). The former one is mandatory. 50% of the intensity of NL_phosph must be greater than NL_H2O, and the intensity of NL_posph must be greater than 30% of the base peak. Chain annotation is not possible for sodiated adducts. Protonated dihydroxylated HexCer show a neutral loss of water (NL_H2O), which is mandatory. Furthermore, three additional fragments might be observed, i.e. a neutral loss of the hexosyl group (NL_Hex), a neutral loss of the hexosyl group plus water (NL_Hex_H2O), and the neutral loss of C6H6O1 (NL_Prop).
For accepting a spectrum as HexCer, one of the three specific fragments have to be found in addition to the unspecific NL_H2O fragment. The intensity of the three fragments changes, and they are hardly ever observed in the same spectrum. For chain annotation, the protonated LCB minus two water molecules (LCB-2H2O) must be observed. Class specific fragmentation and intensity rules are the same as for protonated dihydroxylated HexCer (see HexCer dihydroxylated LCB [M+H] + ). The observed chain fragment is the same too (LCB-2H2O), however, the intensity is much lower. To avoid misidentification of a dihydroxylated LCB, two intensity rules have been added, i.e., LCB-2H2O must be smaller than 5% of the base peak, and LCB-2H2O must be smaller than 3% of the sum of the intensities from NL_Hex and NL_Hex_H2O. The major class specific fragment is the neutral loss of the hexosyl group plus water (NL_Hex_H2O), which is mandatory. This fragment is often accompanied by a neutral loss of the hexosyl group alone (NL_Hex). Furthermore, the NL_Hex_H2O may further lose formaldehyde (NL_Hex_formaldehyde_30). The sum of the intensities from NL_Hex_H2O and NL_Hex is used in intensity rules, i.e., this sum must exceed 10% of the base peak and, it must be bigger than 5% of a neutral loss of water fragment to distinguish it from other isobaric/isomeric classes. The mandatory class specific fragments are neutral loss of the hexosyl group (NL_Hex), and neutral loss of the hexosyl group plus water (NL_Hex_H2O). A neutral loss of water from the precursor might be observed (NL_H2O). The sum of intensities from NL_Hex and NL_Hex_H2O is used in intensity rules. To avoid false positive identifications, this sum must be greater than a neutral loss of an ethanolamine (NL_Ethanolamine_43_WRONG), and it must be greater than 40% of the base peak. The mandatory fragment of sodiated LSM is the neutral loss of trimethylamine (NL_trimethylamine_59). In addition, the neutral loss of phosphocholine head group (NL_PChead_183) can be observed. All intensity rules relate to NL_trimethylamine_59, whose intensity must be greater than 30% of the base peak, 50% of NL_PChead_183, and 4 times a potential precursor fragment. As for LSM, MS 2 spectra of protonated SM species show two prominent fragments, i.e., phosphocholine head group (PChead_184) and the neutral loss of water (NL_H2O). Only the unspecific NL_H2O fragment is set mandatory, since the PChead_184 might fall out of the scan range for heavier molecules due to the dynamic scan range adjustment of ion trap based CID. If present, the intensity of PChead_184 must be greater than 5% of the base peak and 10% of the precursor fragment. To provide the same specificity, the same intensity rules are repeated containing the sum of PChead_184 and NL_H2O instead of using PChead_184 alone. Chain annotation is not possible for protonated adducts. As chlorinated LSM adducts, MS 2 spectra of the chlorinated adducts of SM show only a single neutral loss, i.e. CH3Cl (NL_PChead_50; mandatory). The intensity of NL_PChead_50 must be greater than 20% of the base peak, and two times the intensity of NL_PChead_50 must be greater than the neutral loss of 60 to remove isobaric species from formate adducts. Chain annotation is not possible for MS 2 spectra of chlorinated adducts. Protonated SphBase adducts show a fragmentation-rich spectrum. However, the detectable fragments change from species to species, and the intensity relations vary heavily. The only fragment which is always present is the unspecific neutral loss of water (NL_H2O; mandatory). Other frequently detectable fragments are at 71 Da (Frag71), 83 Da (Frag83), 95 Da (Frag95), 97 Da (Frag97) and 109 Da (Frag109), a neutral loss of ammonia (NL_NH3) and a neutral loss of ammonia and water (NL_NH3+H2O). For an identification by LDA, one of those fragments have to be detected in addition to the NL_H2O, and 80% of the intensity of the NL_H2O fragment must be greater than the intensity of the NL_NH3 fragment. For trihydroxylated species, a fragment of high abundance with two water losses is detectable (NL_2xH2O_36; mandatory). Since the specificity of this rule set is low, a rule was added accepting protonated SphBase species only if at the same retention time a protonated adduct with a loss of water or a sodiated adduct is detectable (MS 1 peak is sufficient).

Figure S-2:
Spectral evidence for novel lipid species and molecular species identified in murine brain and human plasma.
These figures highlight the annotated experimental spectra for novel lipid molecular species identified. Evidence relies on spectra at full mass range and is supported by zoom in spectra where necessary. The spectra are low resolution CID spectra acquired on an Orbitrap Velos Pro. Where available, spectra in positive and negative ion mode are shown. In case the novel species is a lipid molecular species, and the other ion mode verifies the lipid species only, just spectra of the ion mode proofing the existence of the molecular species are shown.