Automated Feature Mining for Two-Dimensional Liquid Chromatography Applied to Polymers Enabled by Mass Remainder Analysis

A fast algorithm for automated feature mining of synthetic (industrial) homopolymers or perfectly alternating copolymers was developed. Comprehensive two-dimensional liquid chromatography–mass spectrometry data (LC × LC–MS) was utilized, undergoing four distinct parts within the algorithm. Initially, the data is reduced by selecting regions of interest within the data. Then, all regions of interest are clustered on the time and mass-to-charge domain to obtain isotopic distributions. Afterward, single-value clusters and background signals are removed from the data structure. In the second part of the algorithm, the isotopic distributions are employed to define the charge state of the polymeric units and the charge-state reduced masses of the units are calculated. In the third part, the mass of the repeating unit (i.e., the monomer) is automatically selected by comparing all mass differences within the data structure. Using the mass of the repeating unit, mass remainder analysis can be performed on the data. This results in groups sharing the same end-group compositions. Lastly, combining information from the clustering step in the first part and the mass remainder analysis results in the creation of compositional series, which are mapped on the chromatogram. Series with similar chromatographic behavior are separated in the mass-remainder domain, whereas series with an overlapping mass remainder are separated in the chromatographic domain. These series were extracted within a calculation time of 3 min. The false positives were then assessed within a reasonable time. The algorithm is verified with LC × LC–MS data of an industrial hexahydrophthalic anhydride-derivatized propylene glycol-terephthalic acid copolyester. Afterward, a chemical structure proposal has been made for each compositional series found within the data.


S-1 Visual representation of the feature mining flowchart
This section illustrates a visual representation of the feature mining algorithm and its userdefined parameters. Figure S1. Flowchart for the feature mining algorithm. Purple values are user-defined parameters summarized in table S1 below.

Symbol
Parameter Value I Min,dp ROI analysis: Minimum mass peak intensity 100 counts  Table S1. User-defined parameters used in the algorithm as shown in Fig. S1.

S-2 Background removal
This section provides a visual example of a ROI that was deemed background by the algorithm. In this example 12% of the datapoints within this ROI were higher than the 20% threshold. Meaning this ROI was discarded as background. Example of a background ROI Figure S2. Example of a background ROI. 12% of the intensities of this ROI are higher than the 20% threshold (Red line).

S-3 Visual representation of charge-state reduction within the data
This section illustrates a visual representation of the charge-state reduction within the algorithm. Differences in m/z between the isotopic distributions ( Fig. S5) can be used to estimate the charge of the measured unit. When multiplying the m/z with the charge the mass can be found ( S-7

S-4 Grouping of information
This section shows an illustration of grouping of information for series 2 and 6. In figure S7A, the information follows the mass remainder information, whereas in figure S7B the information follows the mass domain. Combining this information leads to the grouping and formation of series 2 and 6 as shown in figure S7C. Figure S7D shows the location of each identified group. Positions of each final compositional series is roughly indicated.

S-5 Information about each found compositional group
This section shows information about each compositional group found within the data. Table  S2 shows mass remainder and proposed structures and a contour plot showing each selected group and the mass remainder plot are shown in figure S8 and S9. Figure S10 shows the assisted molecular formula determination within the MOREDISTRIBUTIONS user-interface. Figures S11 to S36 show each group and it was selected or not. If it wasn't selected it explains the reason why the group was discarded. Red dots on the chromatogram of the compositional series shows that the group was selected. If the dots remain white, the series was not selected. The red bars on the cumulative mass remainder plot show the newest group of mass remainders. All older mass remainders remain visible as black bars on the plot. Also the native mass spectrum of the group is shown together with the charge-state reduced mass spectrum. Also note that the mass of the ionization unit (M Na = 22.9898 Da) has been removed from this spectrum as well. Most wrongly assessed groups are series wherein the charge-state was not successfully reduced and should have been part of other series. Mass differences between each compositional series should be a list of integers (Or close to 1.0033 Da). When the mass differences plot didn't show good agreement between the found differences (i.e. no increments of close to 1), the charge-state deconvolution was deemed wrong. For example, if the mass differences plot showed differences close to 0.5, 1.0, 1.5 and 2.0, a charge of 2 should have been found (As the mass differences increment by ½) and the distribution should have had its charge-state reduced to 1, and thus the found masses should have been doubled. Wrong charge-state reduction most-likely happens due to small deviations within the measured m/z values due to the resolution within the mass spectrometer. These small deviations can make it be that the m/z difference is not close enough to to be sure about the charge of that 1 particular group. Furthermore, three instances of wrongly assessed groups can be explained due to wrong clustering and the found mass remainder and location on the chromatogram are almost identical to previously determined compositional series and therefore not selected.  Table S2: Table of MRs and proposed structures of the 10 compositional series and 2 series that underwent sodium exchange within the mass spectrometer (series 10 and 12) within the copolyester sample. For each series a number, found MR, a proposed end-group composition, a proposed structure, mass difference with related series, the exact mass of the proposed end-groups (with the non-aliased exact mass in parentheses), the mass error, the number average molecular weight, molecular weight dispersity and the relative abundance are given.   Figure S8. Figure S36: Compositional series number 12. Classified with end-groups HHPA-HHPA, but underwent sodium exchange within one of the free carboxylic acid of the cyclohexane dicarboxylic acid (Reaction product of the derivation with HHPA).