Genetic Programming: A Novel Method for the Quantitative Analysis of Pyrolysis Mass Spectral Data
Abstract
A technique for the analysis of multivariate data by genetic programming (GP) is described, with particular reference to the quantitative analysis of orange juice adulteration data collected by pyrolysis mass spectrometry (PyMS). The dimensionality of the input space was reduced by ranking variables according to product moment correlation or mutual information with the outputs. The GP technique as described gives predictive errors equivalent to, if not better than, more widespread methods such as partial least squares and artificial neural networks but additionally can provide a means for easing the interpretation of the correlation between input and output variables. The described application demonstrates that by using the GP method for analyzing PyMS data the adulteration of orange juice with 10% sucrose solution can be quantified reliably over a 0−20% range with an RMS error in the estimate of ∼1%.
†
E-mail: [email protected].
*
In papers with more than one author, the asterisk indicates the name of the author to whom inquiries about the paper should be addressed.
‡
E-mail: [email protected].
§
E-mail: [email protected].
‖
E-mail: [email protected].
✗
Abstract published in Advance ACS Abstracts, October 1, 1997.
Cited By
This article is cited by 44 publications.
- Arthur S. Edison, Maxwell Colonna, Goncalo J. Gouveia, Nicole R. Holderman, Michael T. Judge, Xunan Shen, Sicong Zhang. NMR: Unique Strengths That Enhance Modern Metabolomics Research. Analytical Chemistry 2021, 93 (1) , 478-499. https://doi.org/10.1021/acs.analchem.0c04414
- Vishwesh Venkatraman,, Andrew Rowland Dalby, and, Zheng Rong Yang. Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR. Journal of Chemical Information and Computer Sciences 2004, 44 (5) , 1686-1692. https://doi.org/10.1021/ci049933v
- E. Consuelo López-Díez and, Royston Goodacre. Characterization of Microorganisms Using UV Resonance Raman Spectroscopy and Chemometrics. Analytical Chemistry 2004, 76 (3) , 585-591. https://doi.org/10.1021/ac035110d
- E. Consuelo López-Díez,, Giorgio Bianchi, and, Royston Goodacre. Rapid Quantitative Assessment of the Adulteration of Virgin Olive Oils with Hazelnut Oils Using Raman Spectroscopy and Chemometrics. Journal of Agricultural and Food Chemistry 2003, 51 (21) , 6145-6150. https://doi.org/10.1021/jf034493d
- Royston Goodacre,, Beverley Shann,, Richard J. Gilbert,, Éadaoin M. Timmins,, Aoife C. McGovern,, Bjørn K. Alsberg,, Douglas B. Kell, and, Niall A. Logan. Detection of the Dipicolinic Acid Biomarker in Bacillus Spores Using Curie-Point Pyrolysis Mass Spectrometry and Fourier Transform Infrared Spectroscopy. Analytical Chemistry 2000, 72 (1) , 119-127. https://doi.org/10.1021/ac990661i
- Bjørn K. Alsberg,, Douglas B. Kell, and, Royston Goodacre. Variable Selection in Discriminant Partial Least-Squares Analysis. Analytical Chemistry 1998, 70 (19) , 4126-4133. https://doi.org/10.1021/ac980506o
- Ulf W. Liebal, An N. T. Phan, Malvika Sudhakar, Karthik Raman, Lars M. Blank. Machine Learning Applications for Mass Spectrometry-Based Metabolomics. Metabolites 2020, 10 (6) , 243. https://doi.org/10.3390/metabo10060243
- Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng. Learning to Rank Peptide-Spectrum Matches Using Genetic Programming. 2019, 3244-3251. https://doi.org/10.1109/CEC.2019.8790049
- Piotr S. Gromski, Elon Correa, Andrew A. Vaughan, David C. Wedge, Michael L. Turner, Royston Goodacre. A comparison of different chemometrics approaches for the robust classification of electronic nose data. Analytical and Bioanalytical Chemistry 2014, 406 (29) , 7581-7590. https://doi.org/10.1007/s00216-014-8216-7
- Soha Ahmed, Mengjie Zhang, Lifeng Peng. Improving feature ranking for biomarker discovery in proteomics mass spectrometry data using genetic programming. Connection Science 2014, 26 (3) , 215-243. https://doi.org/10.1080/09540091.2014.906388
- Soha Ahmed, Mengjie Zhang, Lifeng Peng. Enhanced feature selection for biomarker discovery in LC-MS data using GP. 2013, 584-591. https://doi.org/10.1109/CEC.2013.6557621
- B. Vandeginste. Chemometrics in studies of food origin. 2013, 117-145. https://doi.org/10.1533/9780857097590.2.117
- Douglas B. Kell, Pedro Mendes. The markup is the model: Reasoning about systems biology models in the Semantic Web era. Journal of Theoretical Biology 2008, 252 (3) , 538-543. https://doi.org/10.1016/j.jtbi.2007.10.023
- William B. Langdon, Riccardo Poli, Nicholas F. McPhee, John R. Koza. Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications. 2008, 927-1028. https://doi.org/10.1007/978-3-540-78293-3_22
- Michael C. Jewett, Michael A. E. Hansen, Jens Nielsen. Data acquisition, analysis, and mining: Integrative tools for discerning metabolic function in Saccharomyces cerevisiae. 2007, 159-187. https://doi.org/10.1007/4735_2007_0222
- Timothy M.D. Ebbels. Non-linear Methods for the Analysis of Metabolic Profiles. 2007, 201-226. https://doi.org/10.1016/B978-044452841-4/50008-4
- Richard A. Davis, Adrian J. Charlton, Sarah Oehlschlager, Julie C. Wilson. Novel feature selection method for genetic programming using metabolomic 1H NMR data. Chemometrics and Intelligent Laboratory Systems 2006, 81 (1) , 50-59. https://doi.org/10.1016/j.chemolab.2005.09.006
- Masahiro Sugimoto, Shinichi Kikuchi, Masaru Tomita. Reverse engineering of biochemical equations from time-course data by means of genetic programming. Biosystems 2005, 80 (2) , 155-164. https://doi.org/10.1016/j.biosystems.2004.11.003
- Marie Brown, Warwick B. Dunn, David I. Ellis, Royston Goodacre, Julia Handl, Joshua D. Knowles, Steve O’Hagan, Irena Spasić, Douglas B. Kell. A metabolome pipeline: from concept to data to knowledge. Metabolomics 2005, 1 (1) , 39-51. https://doi.org/10.1007/s11306-005-1106-4
- Fang Liu, Junde Wang. Using Genetic Algorithm to Identify Completely Unknown System in FTIR Spectra Analysis. Journal of Environmental Science and Health, Part A 2004, 39 (6) , 1525-1533. https://doi.org/10.1081/ESE-120037851
- George G Harrigan, Roxanne H LaPlante, Greg N Cosma, Gary Cockerell, Royston Goodacre, Jane F Maddox, James P Luyendyk, Patricia E Ganey, Robert A Roth. Application of high-throughput Fourier-transform infrared spectroscopy in toxicology studies: contribution to a study on the development of an animal model for idiosyncratic toxicity. Toxicology Letters 2004, 146 (3) , 197-205. https://doi.org/10.1016/j.toxlet.2003.09.011
- Royston Goodacre. Explanatory analysis of spectroscopic data using machine learning of simple, interpretable rules. Vibrational Spectroscopy 2003, 32 (1) , 33-45. https://doi.org/10.1016/S0924-2031(03)00045-6
- Juan I Castrillo, Andrew Hayes, Shabaz Mohammed, Simon J Gaskell, Stephen G Oliver. An optimized protocol for metabolome analysis in yeast using direct infusion electrospray mass spectrometry. Phytochemistry 2003, 62 (6) , 929-937. https://doi.org/10.1016/S0031-9422(02)00713-6
- Royston Goodacre, Douglas B. Kell. Evolutionary Computation for the Interpretation of Metabolomic Data. 2003, 239-256. https://doi.org/10.1007/978-1-4615-0333-0_13
- M Careri, F Bianchi, C Corradini. Recent advances in the application of mass spectrometry in food-related analysis. Journal of Chromatography A 2002, 970 (1-2) , 3-64. https://doi.org/10.1016/S0021-9673(02)00903-2
- J.T. Magee, R. Goodacre. Fingerprint Spectrometry Methods in Bacillus Systematics. 2002, 254-270. https://doi.org/10.1002/9780470696743.ch17
- Aoife C. McGovern, David Broadhurst, Janet Taylor, Naheed Kaderbhai, Michael K. Winson, David A. Small, Jem J. Rowland, Douglas B. Kell, Royston Goodacre. Monitoring of complex industrial bioprocesses for metabolite concentrations using modern spectroscopies and machine learning: Application to gibberellic acid production. Biotechnology and Bioengineering 2002, 78 (5) , 527-538. https://doi.org/10.1002/bit.10226
- David I. Ellis, David Broadhurst, Douglas B. Kell, Jem J. Rowland, Royston Goodacre. Rapid and Quantitative Detection of the Microbial Spoilage of Meat by Fourier Transform Infrared Spectroscopy and Machine Learning. Applied and Environmental Microbiology 2002, 68 (6) , 2822-2828. https://doi.org/10.1128/AEM.68.6.2822-2828.2002
- Oliver Fiehn. Metabolomics — the link between genotypes and phenotypes. 2002, 155-171. https://doi.org/10.1007/978-94-010-0448-0_11
- Jennifer P. Day, Douglas B. Kell, Gareth W. Griffith. Differentiation of Phytophthora infestans Sporangia from Other Airborne Biological Particles by Flow Cytometry. Applied and Environmental Microbiology 2002, 68 (1) , 37-45. https://doi.org/10.1128/AEM.68.1.37-45.2002
- Douglas B. Kell, Robert M. Darby, John Draper. Genomic Computing. Explanatory Analysis of Plant Expression Profiling Data Using Machine Learning. Plant Physiology 2001, 126 (3) , 943-951. https://doi.org/10.1104/pp.126.3.943
- Jan Gorodkin, Bodil Søgaard, Hanne Bay, Hans Doll, Per Kølster, Søren Brunak. Recognition of environmental and genetic effects on barley phenolic fingerprints by neural networks. Computers & Chemistry 2001, 25 (3) , 301-307. https://doi.org/10.1016/S0097-8485(00)00103-0
- Fang Liu, Junde Wang. APPLICATION OF A GENETIC ALGORITHM TO QUANTITATIVE ANALYSIS OF OVERLAPPED FTIR SPECTRA *. Spectroscopy Letters 2001, 34 (1) , 13-24. https://doi.org/10.1081/SL-100001446
- Oliver Fiehn. Combining Genomics, Metabolome Analysis, and Biochemical Modelling to Understand Metabolic Networks. Comparative and Functional Genomics 2001, 2 (3) , 155-168. https://doi.org/10.1002/cfg.82
- Andrew Tuson, David E. Clark. New Techniques and Future Directions. 2000, 241-264. https://doi.org/10.1002/9783527613168.ch12
- Douglas B. Kell, Pedro Mendes. Snapshots of Systems. 2000, 3-25. https://doi.org/10.1007/978-94-011-4072-0_1
- Adrian D. Shaw, Naheed Kaderbhai, Alun Jones, Andrew M. Woodward, Royston Goodacre, Jem J. Rowland, Douglas B. Kell. Noninvasive, On-Line Monitoring of the Biotransformation by Yeast of Glucose to Ethanol Using Dispersive Raman Spectroscopy and Chemometrics. Applied Spectroscopy 1999, 53 (11) , 1419-1428. https://doi.org/10.1366/0003702991945777
- Andrew M Woodward, Richard J Gilbert, Douglas B Kell. Genetic programming as an analytical tool for non-linear dielectric spectroscopy. Bioelectrochemistry and Bioenergetics 1999, 48 (2) , 389-396. https://doi.org/10.1016/S0302-4598(99)00022-7
- Barry K. Lavine, Anthony Moores, Lisa K. Helfend. A genetic algorithm for pattern recognition analysis of pyrolysis gas chromatographic data. Journal of Analytical and Applied Pyrolysis 1999, 50 (1) , 47-62. https://doi.org/10.1016/S0165-2370(99)00002-9
- Durmus Ozdemir, Ron Williams. Multi-Instrument Calibration with Genetic Regression in UV-Visible Spectroscopy. Applied Spectroscopy 1999, 53 (2) , 210-217. https://doi.org/10.1366/0003702991946343
- A. D. Shaw, M. K. Winson, A. M. Woodward, A. C. McGovern, H. M. Davey, N. Kaderbhai, D. Broadhurst, R. J. Gilbert, J. Taylor, É. M. Timmins, R. Goodacre, D. B. Kell, B. K. Alsberg, J. J. Rowland. Rapid Analysis of High-Dimensional Bioprocesses Using Multivariate Spectroscopies and Advanced Chemometrics. 1999, 83-113. https://doi.org/10.1007/3-540-48773-5_3
- Matt Mosley, Ron Williams. Determination of the Accuracy and Efficiency of Genetic Regression. Applied Spectroscopy 1998, 52 (9) , 1197-1202. https://doi.org/10.1366/0003702981945011
- A. Jones, A. D. Shaw, G. J. Salter, G. Bianchi, D. B. Kell. The exploitation of chemometric methods in the analysis of spectroscopic data: application to olive oils. 1998, 317-376. https://doi.org/10.1007/978-1-4613-1131-7_10
- . Chapter 5. Analytical techniques used with pyrolysis. 1998, 97-199. https://doi.org/10.1016/S0167-9244(98)80026-7