TAILOR-MS, a Python Package that Deciphers Complex Triacylglycerol Fatty Acyl Structures: Applications for Bovine Milk and Infant Formulas

Liquid chromatography tandem mass spectrometry (LC/MS) and other mass spectrometric technologies have been widely applied for triacylglycerol profiling. One challenge for targeted identification of fatty acyl moieties that constitute triacylglycerol species in biological samples is the numerous combinations of 3 fatty acyl groups that can form a triacylglycerol molecule. Manual determination of triacylglycerol structures based on peak intensities and retention time can be highly inefficient and error-prone. To resolve this, we have developed TAILOR-MS, a Python (programming language) package that aims at assisting: (1) the generation of targeted LC/MS methods for triacylglycerol detection and (2) automating triacylglycerol structural determination and prediction. To assess the performance of TAILOR-MS, we conducted LC/MS triacylglycerol profiling of bovine milk and two infant formulas. Our results confirmed dissimilarities between bovine milk and infant formula triacylglycerol composition. Furthermore, we identified 247 triacylglycerol species and predicted the possible existence of another 317 in the bovine milk sample, representing one of the most comprehensive reports on the triacylglycerol composition of bovine milk thus far. Likewise, we presented here a complete infant formula triacylglycerol profile and reported >200 triacylglycerol species. TAILOR-MS dramatically shortened the time required for triacylglycerol structural identification from hours to seconds and performed decent structural predictions in the absence of some triacylglycerol constituent peaks. Taken together, TAILOR-MS is a valuable tool that can greatly save time and improve accuracy for targeted LC/MS triacylglycerol profiling.


Lipid extraction from bovine milk and infant formulas.
A single-phase lipid extraction method was used to extract TG from bovine milk and two infant formulas of interest, similar to the method reported previously (1). Briefly, 20 µL of bovine milk (containing 3.8 % total lipid) or infant formulas (prepared to contain equivalent amount of total lipid to bovine milk) were taken and mixed with 400 µL of CHCl 3 :MeOH = 2:1 with a positive displacement pipette. The samples were subsequently shaken for 10 min and sonicated in a water bath for 30 min under room temperature. Afterwards, the samples were centrifuged at 16000g for 10 min. Supernatants were transferred to a 96-well plate and then blown dry by N 2 gas. The sample plate was stored at -20°C and were reconstituted prior to LC/MS analysis.
Reconstitution was achieved by adding 50 µL butanol and 50 µL methanol into plate wells containing the samples. The plate was then centrifuged at 2250g for 5 min. The reconstituted samples were transferred to glass vials with inserts for analysis. The experiments were done in triplicate of which 75% triplicate peaks areas have <17.1% C.V. (Supporting Table 4).
Triplicate log2 peak areas were presented in Figure 3. All the other results presented in this study were based on peak areas from one of the three replicates.
LC/MS data acquisition and processing. A Shimadzu NexeraX2 UHPLC system tandem Shimadzu #8050 triple quadrupole mass spectrometer system equipped with an ESI module was used for targeted TG data acquisition (Shimadzu Australia, Sydney, NSW). Briefly, 5 µL S3 of samples were injected to a Waters Symmetry C18 column (100Å, 3.5 µm, spherical Silica, 4.6x75 mm) for chromatographic separation, with 40°C column oven temperature. A gradient was established over 40 minute run time with the following binary solvent system: solvent A: H2O: acetonitrile = 4:6 (v/v); solvent B: acetonitrile: isopropanol =1:9 (v/v), both containing 10 mM ammonium acetate, similar to an earlier report (2)*. Flow rate was set at 0.4 mL/min. : Contents in the brackets after the TG species represent the neutral loss peaks used to identify this TG species. When more than one TG species share an ID peak, all the possible TG species are listed. See TAILOR-MS method section and the manual for TAILOR-MS usage for detailed explanations. The dataset used is a sub-dataset (with only the more abundant brutto level TG) of the input bovine milk TG dataset used for TAILOR-MS automated TG structure identification. A total of 107 TG species have been identified manually.

S7
Supporting Table 3. Impacts of altered input settings on TAILOR-MS identifier performance and numbers of generated TG species. @ : Total, "I" and "P" columns indicate the number of TG species generated by TAILOR-MS identifier. "I (identified)" means all 3 FAs constituting the structure of a TG species are based on FA neutral losses in the acquisition method, whereas "P (predicted)" means the third FA is predicted by TAILOR-MS based on the given FA list. Total is the sum of "I" and "P" FAs.

Bovine Milk
: The FA groups used here are based on their abundances, with the FAs of lower abundances being removed first i.e., the FA list of 5 contains the 5 most of abundant FA groups determined by the profiles of digested bovine milk and infant formulas.

S9
Supporting Figure 1. Examples for TAILOR-MS assisted TG structure identification that may require user's attention. Several infant formula TG ion peak examples are shown here to demonstrate key data interpretation and processing points. As panel A shows, TAILOR-MS may predict a structure that is less likely to exist or with very low abundance, especially when the prediction is based solely on one known fatty acyl moiety. While TG(16:0_16:0_20:3) (AA#) is a predicted structure, it is more likely that most of the fatty acyl 16:0 neutral loss was contributed by the fragmentation of TG(16:0_18:1_18:2), as 18:1 and 18:2 neutral losses both have similar high abundance to 16:0. Also, the abundance of 20:3 in TG is expected to be low. Measuring the neural loss of 20:3 from TG(52:3) will be necessary to confirm the existence of TG(16:0_16:0_20:3). If the two peaks are separated enough to be distinguished (e.g., peak a and b in panel B), the expected peak elution time spans for both peaks are smaller than similar standalone peaks, as parts of the two peaks merge. A smaller retention time tolerance may be set to avoid missing TG structures containing these fatty acyl ion peaks when performing TG structure determination using TAILOR-MS. Alternatively, a partial separation can occur with only one discernible apex (e.g., peak a in panel C). In this case, the user must decide if they view this as one or two peaks and integrate the peak(s) accordingly. If viewed as one peak, this peak will have a larger retention time span.