Proteomics and Informatics for Understanding Phases and Identifying Biomarkers in COVID-19 Disease

The emergence of novel coronavirus disease 2019 (COVID-19), caused by the SARS-CoV-2 coronavirus, has necessitated the urgent development of new diagnostic and therapeutic strategies. Rapid research and development, on an international scale, has already generated assays for detecting SARS-CoV-2 RNA and host immunoglobulins. However, the complexities of COVID-19 are such that fuller definitions of patient status, trajectory, sequelae, and responses to therapy are now required. There is accumulating evidence—from studies of both COVID-19 and the related disease SARS—that protein biomarkers could help to provide this definition. Proteins associated with blood coagulation (D-dimer), cell damage (lactate dehydrogenase), and the inflammatory response (e.g., C-reactive protein) have already been identified as possible predictors of COVID-19 severity or mortality. Proteomics technologies, with their ability to detect many proteins per analysis, have begun to extend these early findings. To be effective, proteomics strategies must include not only methods for comprehensive data acquisition (e.g., using mass spectrometry) but also informatics approaches via which to derive actionable information from large data sets. Here we review applications of proteomics to COVID-19 and SARS and outline how pipelines involving technologies such as artificial intelligence could be of value for research on these diseases.


CHALLENGES FOR UNDERSTANDING PATHOGENESIS AND BIOMARKER DEVELOPMENT
A pandemic of a respiratory disease (COVID-19) 1 associated with the novel 2019 coronavirus (SARS-CoV-2) has highlighted the need for biomarkers that detect the infection, progression, patient stratification, and future indicators of sequelae associated with those surviving the disease. Tang et al. have highlighted the major issues facing clinicians in diagnosis and prognosis using viral RNA-based measurements. 2 COVID-19 has demonstrated high rates of infection (early calculations on reproductive number were between 2 and 3 3,4 ) as well as high mortality rates (initially reported in China to be 2.3%), although they are age-dependent. 5 In the U.K. the overall death rate from COVID-19 has been estimated at 0.7%, rising to 7.8% in people aged over 80 and declining to 0.002% in children (<9 years). 6 Tang et al. discuss RNA-based assays as they relate to analytical and clinical sensitivity and the relationship to disease severity and determining the individual's disease trajectory. 2 The seriousness and scale of the pandemic requires a comprehensive approach to biomarker discovery ( Figure 1). Furthermore, hypoxia and inflammation can damage the kidneys, liver, heart, brain, and other organs. 7 This multiple organ involvement of COVID-19, as the disease progresses from an asymptomatic phase (where many patients remain) to a life-threatening disease, and potential follow-on morbidities requires a full consideration of the longitudinal effects of SARS-CoV-2 infection on blood biomarkers such as proteins, in tandem with other clinical markers (lymphopenia, hypotension, heart palpitations, myalgia, and CT imaging data). Such an approach will detail the means for patient stratification and disease outcome prediction and provide a blueprint of preparedness for the next pandemic.
Because data about COVID-19 are only in their infancy, we can, in part, learn from the analysis of previous coronavirusmediated diseases. SARS-CoV-2 virus is related to SARS-CoV, which was the infectious agent in the severe acute respiratory syndrome (SARS) outbreak. 9 SARS-CoV emerged in China in 2002, infecting 8098 people and causing about 800 deaths. Another coronavirus, Middle-East respiratory syndrome coronavirus 10 (MERS-CoV), seen first on the Arabian Peninsula, infected about 3000 individuals and led to approximately 850 deaths. At the time of writing, SARS-CoV-2 has infected over ten million people and caused greater than 500 000 deaths. The differences between these diseases are marked; however, both SARS-CoV and SARS-CoV-2 can adversely affect vital organs such as the lungs, heart, liver, and kidneys. During the 2003 outbreak of SARS, it was shown that major changes in proteins occurred in the peripheral blood. 11 A genotypic definition of the disease risk for COVID-19 will undoubtedly be an area of study. Given the age of those most seriously affected by SARS-CoV-2 and the concomitant comorbidities, these data will give only a partial picture of the risk and no reflection of the response to therapy or later complications. Proteomics and mass-spectrometry techniques are highly sensitive and can provide a comprehensive picture of a patient's state and underlying biomedical processes. Proteomics biomarkers are more useful because they are more readily available than human transcripts in infectious diseases (although transcripts have great value in cancer risk detection; see ref 12). In addition, they vary with different disease states and progressions, unlike the genome that effectively provides a "blueprint" detail of risk in some instances, such as cardiac disease. 13 The diagnosis of COVID-19 presently consists of two main approaches: immunoassays that detect antibodies against specific viruses (e.g., SARS-CoV-2) in patient samples 14,15 and assays employing real-time reverse-transcription polymerase chain reaction (RT-PCR), which can be carried out using a variety of clinical specimens, including bronchoalveolar lavage fluid, bronchoscopy, sputum, nasal swabs, pharyngeal swabs, feces, or blood. 16 However, a phalanx of other questions (and therefore an accompanying need for biomarker assays) arise during the course of this disease.
Most individuals infected with SARS-CoV-2 will display an immune response within a week of infection that leaves them able to remain healthy (and often asymptomatic). Those who do not achieve this see the response to infection begin to adversely affect many physiological processes, with symptoms including: severe shortness of breath at rest or difficulty breathing, coughing up blood, pale pallor, syncope or collapse, confusion, and decreased urine output. Early identification of those who are infected, and those who are going to progress to this frightening second phase, could have tremendous implications for the management of this pandemic and the outcomes for those affected. This can, in part, be achieved by consideration of features such as body mass index 17 and potentially ethnicity. Biomarkers that offer further insight are also required. Monitoring those most at risk (e.g., those who have had an organ transplant, are undergoing cancer treatment, or have leukemia, respiratory disease, or a serious heart condition) for signs of onset or increasing disease severity is important.
Because the COVID-19 disease severity varies considerably, understanding who has had the disease but remained largely asymptomatic is also of critical importance. Presently, it is unclear if all asymptomatic patients produce measurable antibody titers that signify infection. Immunoglobulin seroconversion in response to COVID-19 occurs 5 to 12 days after symptoms are first detected. 18 Serological testing in the second week of the disease is then generally pursued where possible. 18 Lou et al. have reported the seroconversion rates of total antibody, IgM, and IgG, respectively, to be 98.8, 93.8, and 93.8%, with a median seroconversion time of 9, 10, and 12 days for these three entities post-onset of COVID-19. 19 From 148 people with a negative RT-PCR result, 7 tested positively for SARS-CoV-2-specific IgG and/or IgM, indicating the value of serological testing alongside RT-PCR-based assays of viral RNA. 20 Immunoassay studies suggest that neutralizing antibodies target the receptor-binding domain of the spike (S) protein, which is a similar observation to other coronaviruses. 21 As the antibody levels increase (starting at 6 days post-onset of disease), the viral load decreases. 19 However, seroconversion may not be a reliable measure in all cases, as some patients may Proteins are the main stimulants of cellular mechanisms and are responsible for cellular homeostasis. The disruption of these cellular mechanisms is generally associated with many disease phenotypes. 8 Therefore, in a complex multiorgan disease such as is associated with SARS-CoV-2, establishing the underlying proteins involved in the various stages of this disease will pave a way to the discovery of new biomarkers in the diagnosis and prognosis of COVID-19 and its complications. Markers for prognosis, diagnosis, and chronic effects are among those required. The second wave of disease depicted here infers that monitoring in the population for (re)infection would require biomarkers of sufficient specificity. In this respect, a generic inflammatory biomarker response would not be sufficiently specific. The measurement of IgG and IgM antibodies directed against the virus do offer opportunities for screening; however, these generally do not appear for several days following the onset of symptoms and are not always observed during the usual screening window. Four-study meta-analysis. Analysis of odds ratios.
Patients with severe disease compared with patients with nonsevere disease, testing for an association between increased procalcitonin concentration and the odds of severe disease.

Procalcitonin
The heterogeneity of the included results was moderate. The authors suggest that an increased concentration of procalcitonin may be an indicator of bacterial coinfection in COVID-19 patients. Nine-study meta-analysis. Analysis of standardized mean differences. Sensitivity analysis.
Patients with severe disease compared with patients with nonsevere disease.

Interleukin-6
The heterogeneity of the included results was high. See also the reports by Huang  Journal of Proteome Research pubs.acs.org/jpr Reviews not seroconvert during the testing period. For example, a mother and daughter remained seronegative throughout their hospitalization period. 20 Bentivegna et al. describe a case where a patient tested positive for COVID-19 using RT-PCR and had undergone seroconversion; following two negative RT-PCR tests, the patient was discharged. 22 23 days later, the patient was once again hospitalized but tested negatively for COVID-19 four times, and serological testing revealed the presence of SARS-CoV2-specific IgG but no IgM. Following exposure to COVID-19, the patient tested positive and had undergone a second IgM seroconversion. The patient was asymptomatic. The problem of the incomplete coverage of asymptomatic infections could potentially be addressed through the development of novel diagnostic methods. Then, when the disease is clearly present, predicting the severity can aid the patient and the healthcare provider. Symptomatic case fatality is age-related; however, determining which younger people with symptoms are likely to require ventilation early is just one aspect where biomarker tools would be of assistance. Over and above this, determining potential sequelae events in those undergoing recovery will be equally important in the months and years ahead. For example, in cases of SARS-CoV, neuromusculoskeletal disorders were observed in some patients. 23 Patients with COVID-19 can also display neurological symptoms such as cerebrovascular diseases, impaired consciousness and taste, and vision and smell impairment. 24 Cardiovascular function is also likely to be a matter for consideration. The widespread effects of SARS-CoV-2 on the patient during the acute phase strongly suggest that biomarkers of wellness or ill health are required; presently, we have no data on these.
Thus, in a complex, rapidly progressing pandemic such as COVID-19, there is a need for a fuller description of the proteomic biomarkers present in plasma or nasopharyngeal/ throat swabs. Because the disease comprises several phases, biomarkers for distinct clinical decisions are required at each point. Technologies to measure hundreds of proteins in the plasma of each patient have improved tremendously since the proteomics work on SARS-CoV was published. SARS-CoV-2 research is moving swiftly, and there are currently several available biomarker assays (Table S1, Supporting Information). Nonetheless, a more complete picture of the disease  Table 1. A cytokine storm profile is associated with severe COVID-19 disease, characterized by increased interleukin (IL)-2, IL-7, granulocyte-colony stimulating factor, interferon-γ inducible protein 10, monocyte chemoattractant protein 1, macrophage inflammatory protein 1-α, and tumor necrosis factor-α. 1 Predictors of fatality also include increased ferritin. 26 It has been suggested these and other observations imply a secondary hemophagocytic lymphohistiocytosis (sHLH). 27 A recently described model for predicting disease severity utilizes neutrophils, lymphocytes, C-reactive protein (a biomarker for inflammation), and D-dimer (a measure of blood clotting). 28 A model for predicting mortality utilizes age, lymphocytes, Creactive protein, and D-dimer. 29 Another model for predicting mortality utilizes lymphocytes, C-reactive protein, and lactate dehydrogenase (a marker of cell damage 30 ). 31 A further analysis on nonsevere and severe COVID-19 patients concluded that baseline levels of IL-2, IL-4, IL-10, TNF-α, and IFN-γ were within the normal range in severely affected patients, whereas the IL-6 level was significantly increased in severe cases. 32 These data suggest that a systematic and longitudinal assessment of cytokines is required in COVID-19 disease. Lymphopenia has also been cited as a biomarker for disease severity, 33 whereas thrombocytopenia is an indicator of severe disease. 34 In the SARS outbreak, cardiac involvement was identified 35 with hypotension and tachycardia; in COVID-19, cardiac involvement is a major cause of death with elevated levels of the biomarkers troponin I and BNP observed and associated with a poor outcome. However, the role of myocarditis is not defined in COVID-19. The relationship between inflammation and thrombosis can involve endothelial cell activation, monocyte/macrophage activation, and platelet activation, with the additional production of pro-inflammatory mediators. This complexity of interactions implies that plasma profiling can provide more information on patients for their benefit during the acute and convalescent phases of their disease.
The previously described results illustrate how routine assays have begun to define the molecular and cellular events leading to severe COVID-19 and to identify potential biomarkers as well. Although routine biochemical assays are unlikely to provide a specific and complete description of the disease, they do appear to have predictive value. Even by itself, D-dimer, for example, may be useful in predicting mortality. 36 Lactate dehydrogenase, a relatively unselective marker (particularly unselective when isoenzyme analyses are not performed), has also shown some predictive ability, 37 especially when used in conjunction with other blood components. 31 Targeted biochemical assays will also continue to be useful for detecting damage to specific tissues.
Proteomics technologies have the potential to expand on the information gathered using cellular and biochemical assays, and findings have already started to emerge (see later for a discussion of the technologies). A proteomics analysis by Shen et al. identified 105 proteins whose abundances were altered in COVID-19 patients' sera, and 93 proteins, including several acute phase proteins, for which a correlation with disease severity was observed. 38 Forty-two proteins displayed differential abundance in severe cases as compared with nonsevere cases ( Table 1). The authors used their data to formulate a predictive model (see later). In another proteomics analysis of COVID-19 patients' sera, Messner et al. identified 27 proteins whose abundances were associated with disease severity. 39 Differential abundance was observed for proteins involved in inflammation, blood coagulation, and the complement system. Some of Shen and coauthors' results, including associations observed for acute-phase proteins, were reproduced. (See Table 1.) In the context of COVID-19 research, proteomics analyses of urine and naso-oropharyngeal swabs have also been reported. 40,41 Prior to the definition of a signature or algorithm, understanding pathogenesis will be benefited by the systematic analysis of protein, cellular, and disease data (Figure 2a). Informatics analysis of the literature is a rapid way to approach this as publications on the SARS-CoV-2 outbreak emerge.
Here we show such a piece of work linking cells, diseases, and immune-system proteins (cytokines and chemokines) in the available literature on SARS and coronaviruses more generally. 56 This approach enables clearer insight into potential biomarkers associated with any specific pathogenic feature of the disease (Figure 2b−d). As a greater empirical database is built on biomarkers, imaging data, and other features of the COVID-19, data network analysis can provide confidence as putative biomarker signatures are developed. These signatures can enable algorithm development for the definition of patient trajectory or future related disease complications. Protein biomarker signatures will enhance evidence such as CTimaging-defined scarred lungs, whereas follow-up measurements of serum IgG and IgM and SARS-CoV-2 in blood, feces, and nasopharyngeal swabs can provide a wider spectrum of information to inform on chronic effects and treatment choices during the acute phase.

■ HOW DO WE COMPREHENSIVELY ASSESS PROTEIN BIOMARKERS OF VALUE IN THE ASSESSMENT OF COVID-19 DISEASE PROGRESSION?
Humans have approximately 20 000 protein-encoding genes. There could be over 1 000 000 proteoforms including splice variants and essential post-translational modifications (PTMs). 57 By contrast, there are 29 predicted proteins in the SARS-CoV-2 virus with 4 main structural proteins (S, E, M, and N). 9 The S protein is notably glycosylated in SARS-CoV, and there is evidence to suggest that it is similarly modified in SARS-CoV-2. 58,59 The proteomics of a particular sample (cell, plasma, sputum, etc.) will relatively quantify many of the human proteins involved in the pathogenesis in a sample. Thus proteomics will enable us to see protein patterns (clusters) within different phases of a disease state, whereas virus proteins will not likely be consistently observed. In general, the human proteins will effectively swamp the signal from the viral proteins in mass spectrometry experiments on unfractionated samples. Given the heavy involvement of T lymphocytes in SARS-CoV-2, it is noteworthy that 1725 hostcell proteins and 4 HIV-1 proteins were quantified in a lymphoid cell line expressing the HIV virus using the SWATH-MS technique (see later). 60 There may be some opportunities in the analysis of buffy coat cells for viral proteins.
Journal of Proteome Research pubs.acs.org/jpr Reviews Protein PTMs could provide additional signatures via which to detect SARS-CoV-2 and its effects. The consideration of PTMs such as glycosylation and lipidation could provide more opportunities for the detection of viral proteins (if these are sufficiently abundant). A concentration on SARS-CoV-2 protein-derived glycoconjugated peptides may enable the specificity and sensitivity of detection of the viral protein to be improved. 59,61,62 Modifications to endogenous proteins could add detail to signatures of the body's response to infection. Ren et al. found that the concentration of a modified protein, N-terminally truncated α 1 -antitrypsin, was elevated in SARS patients' sera (Table 1). 55 Powerful mass-spectrometrybased approaches to PTM discovery have been developed in recent years, 63 and these could be employed in an early stage of biomarker development. Recent studies of the effects of environment on health 64 have demonstrated the possibility of implementing PTM discovery in an epidemiological setting. Thus testing in the population using mass-spectrometry-based methods may be possible in the future as manufacturers improve the sensitivity, speed, and resolution of the platform, enhancing the clinical application options available in pandemic situations and elsewhere.
RNA from coronaviruses is a biomarker, describing the infection but not the disease trajectory and outcome. Nucleic acid can be tested in various types of clinical specimens including blood, serum, plasma, urine, and sputum. A list of companies and others found to be involved in tests for COVID-19 is shown in Table S1 (Supporting Information). The RNA-based assays are complementary to the biomarkers that are required, as shown in Figure 1. Assays for IgM and IgG have equal value in determining (some of) those who have been infected with a virus, and the issue in the metrics here is specificity: Do these assays clearly show who has recovered from a SARS-CoV-2 infection? 65 The host humoral response assayed by measuring viral spike protein antibodies offers many advantages in terms of specificity and sensitivity. 66 In general, clinical proteomics follows a course of discovery, validation, and verification for diagnostic, prognostic and theranostic biomarkers. The most effective way to cover as many potential protein biomarkers as possible with high specificity (using either fixed or native material) is by using liquid chromatography and tandem mass spectrometry (LC-MS/MS). For this approach, there are two main ways of collecting the spectral data: data-dependent acquisition (DDA) and data-independent acquisition (DIA). Each type of acquisition has its strengths and weaknesses, and both types have been used, presently to a limited extent, in clinical COVID-19 research.
In DDA, a narrow m/z window is used to isolate precursor ions for fragmentation. The mass spectrometer sequentially captures data on specific peptide ions by adjusting the m/z value of selected ions. Selecting ions in this way establishes a strong link between a precursor and its products, enabling product-ion spectra to be readily identified by database searching. The approach therefore lends itself to discovery. Shen et al. used DDA for their analyses of COVID-19 patients' sera and detected a total of 894 proteins. 38 Li et al. analyzed urine samples and detected 1008 proteins that were common to both COVID-19 patients and healthy controls. 40 The main drawbacks of DDA stem from its inability to capture all of the incoming precursor ions.
DIA differs from DDA in that its m/z window is wider and multiple precursor ions are simultaneously fragmented, enabling the complete and permanent recording of all products of all precursor ions. The link between a given precursor ion and its products is weaker than in DDA, so proteins tend to be identified by searching data against a spectral reference library. A number of DIA methods are available for massspectrometry-based proteomics. 67 Some of these, such as allion fragmentation 68 and MS E , 69 employ a single window that spans the full m/z range. Others methods, such as PAcIFIC, 70 SONAR, 71 and SWATH-MS, 72 employ smaller windows. So far, SONAR and SWATH-MS have been applied to clinical COVID-19 research. SONAR, a "scanning quadrupole DIA" method, was used by Akgun et al. to analyze nasooropharyngeal swabs from SARS-CoV-2-infected individuals. 41 The authors detected 207 proteins across 30 samples. SWATH-MS, or sequential windowed acquisition of all theoretical fragment ion mass spectra, is an established biomarker discovery tool that has been employed in a number of epidemiological studies. 73 Given the large amount of the data collected by DIA methods, artificial intelligence methods are best applied to extract information from the data, especially when there is also a significant quantity of multimodal clinical data available (e.g., comorbidities, imaging data, respiratory function, age, sex, and clinical biochemistry laboratory measurement of proteins such as troponin and D dimer).
The Somalogic platform of DNA aptamers that bind to a wide range of proteins including those in plasma 77 is also being deployed for COVID-19 research. This may complement the mass-spectrometry-based methodologies in rapidly defining biomarkers for the questions posed in Figure 1. Protein arrays presently do not have the scope of available highly tested reagents to be taken forward in COVID-19 research.
Validation assesses biomarker measurement performance characteristics, determining reproducibility and accuracy and proving a linkage between biomarkers/algorithms and clinical end points. Selected reaction monitoring (SRM) mass spectrometry can be employed as a verification method because it offers specificity (detecting proteotypic peptides from specific proteins) and can be multiplexed to accommodate all proteins of interest found in the discovery phase. 78,79 Stable isotope dilution SRM can give absolute quantification values for a peptide and thus a protein. The effects of sample preparation on the protein/peptide structure can be accounted for in the quantitative procedure.
The use of antibody-based relative quantification assays can be employed for validation (an approach that is more easily accommodated in most clinical biochemistry laboratories) with a view to a more rapid rollout of a point-of-care test. The lack of ability to first generate a point-of-care test and then demonstrate sensitivity and specificity has been a major issue in the COVID-19 pandemic, and it is for this reason that the tight integration of sample collection, discovery science, and validation has to be undertaken. 80 A useful discussion forum on this subject comes from the Association of Biomolecular Resource Facilities. 81 In a disease such as COVID-19, then, clinical-laboratory-based tests based on protein quantification and the use of algorithms can also be effective in assessing the disease severity and personalizing care for the COVID-19 Journal of Proteome Research pubs.acs.org/jpr Reviews patient. The ability to predict the disease severity in secondary care would be of great value using clinical biochemistry laboratory assays. In the screening of known protein biomarkers using antibody-based methods, however, there may be issues with fixed samples such as material from swabs where epitopes may have been modified. Bloodborne cells such as monocytes and T lymphocytes are involved in COVID-19 pathogenesis and have been shown to produce the virus. HIV-infected cells show an altered proteome in lymphoid cell lines, and HIV proteins can be detected within these cells. Given the availability and ease of use of immunomagnetic bead separation techniques for monocytes, CD4+ lymphocytes, CD8+ lymphocytes, and other specific cell types found in peripheral blood, there are opportunities for proteomics analysis on such cell populations. Relatively small numbers of cells are required, and the data acquired could lead to an understanding of the effects of the virus on hematopoietic cells using omics technologies. This could not only help to explain lymphopenia in COVID-19 but also inform further analysis on biomarker entities.

PANDEMICS REQUIRE INTEGRATED PIPELINES FOR OMICS AND INFORMATICS
Biomarker verification in COVID-19 and other research involves testing of the association between a biomarker-defined stratification process and outcome using samples drawn from a different sample set to provide a bridge between discovery and clinical application. The process of verification of the stratification markers has to adhere to specific criteria that are subject to International Standards Organisation (ISO) plus regulatory authority requirements and guidelines. The involvement of parties such as Clinical Research Organizations, universities, or companies is required to access accredited laboratories (ISO 17025/FDA) for validation. These organizations provide not only routine monitoring and measurement but also proof that these have been done to a certain standard. Auditable evidence is provided for how to perform the work and how to apply policies and processes on the prosecution of the assays. Laboratory developed tests for "in-house" in vitro diagnostic use must meet ISO 15849 and/or Clinical Laboratory Improvement Amendments (CLIA) before any test results are released. The issues this creates with respect to research-laboratory-based contributions to clinical research are profound. Few are suitably resourced or prepared to develop a biomarker test through validation. Thus the maintenance of extant facilities ready to undertake such work in pandemic situations would be of value. Well-regulated discovery proteomics laboratories can save years in developing a validated test.
Similarly, integration with health and biomedical informatics is key to the rapid deployment of tests in clinical research or in The regenerative or host defense capabilities for protection against SARS-CoV-2 infection diminish with age. This has led to the use of algorithms consisting of multiple markers, and factors like comorbidities, age, or sex can be used to develop "score"-based indicators of risk, prognosis, or trajectory. This is one reason for the rapid deployment of the pipeline described here inclusive of artificial intelligence. the wider clinical context. (See Figure 3.) Multidimensional signatures for the prediction of COVID-19 trajectory or sequelae can be derived by taking proteomics data forward with advanced machine-learning approaches as well as integrated analysis with genomics (when this becomes available) and clinical data that will establish combined biomarker signatures associated with COVID-19, as shown in Figure 1. Methods including artificial neural networks, multiomic kernel machine learning, 82 and correlation network and community analysis 83 lend themselves to this task. To identify patient subgroups, unsupervised machine-learning methods for subgroup discovery and trajectory analysis can be employed. Latent class analysis, a statistical method for identifying unmeasured class membership among subjects using observed variables, is one such approach. 84−87 Other more advanced approaches such as topological data analysis (TDA), a method enabling group discovery from large, complex data by creating patient networks and linking those who display clinical and phenotypic similarities, can also assist in patient stratification. 88,89 Supervised classification and multivariate trajectory analysis (e.g., random forests, support vector machines) with different time shifts as well as statistical modeling (e.g., regression analyses) can strengthen these results and determine the predictive power of algorithms or signatures. Shen et al. have recently used a random forest model to classify COVID-19 patients as having severe or nonsevere disease based on proteomics data. 38 Common genomic variants can reflect risk for many conditions, making it possible to include such data in assessments of trajectory or prognosis. Also, recent data show that the incorporation of rare genetic variants into statistical models of risk can considerably increase the proportion of the observed population heritability of complex traits that can be accounted for: 90 This indicates that sequence data may reveal rare variants of large effects on some novel signatures and improve the quality of the test for clinical decision making. The availability of exome and wholegenome sequence data on the COVID-19 affected population could come from UK Biobank or other major epidemiological studies in the future. Signatures that combine genomic, proteomics, and other data can be of value in risk prediction, as seen with the CancerSEEK test. 91

■ CONCLUSIONS
Given the apparently high infection rate of CoV-SARS-2 in the population, it is likely COVID-19 will be around for some time to come. The ability to stratify patients and assess possible sequelae is critically important in this new era in which we live. Proteomics has the potential to define the disease status in many scenarios. Much of what we know so far has come from hospital laboratories and relates to proteins that are quantified in the course of routine analyses. Elevated concentrations of immunological markers (e.g., interleukins) are consistent with a "cytokine storm" hypothesis, whereas elevated concentrations of D-dimer, for example, have implicated inappropriate blood coagulation as a possible contributor to pathogenesis. A number of studies have reported associations between protein concentrations and disease severity, and predictive algorithms have begun to be formulated on this basis. The first massspectrometry-based proteomics analyses of patients' blood were reported a few weeks into the pandemic, and the picture emerging from these is one of systemic perturbation. Now there is a need for clinical sensitivity plus specificity about any biomarker signature associated with COVID-19 with respect to making clinical decisions based on biomarker data. Thus the initial assessment of major protein components in samples from COVID-19 patients will be followed by a deeper analysis of swab-and blood-derived material. The rapid deployment of plasma proteomics or throat swab proteomics in a wellregulated discovery proteomics environment with sufficient samples to lend power to the studies is going to be a key part of future developments in combatting this disease. The overall objective can be inclusive of genomics data and certainly should be inclusive of advanced health informatics approaches to turn data into clinically useful information.
■ ASSOCIATED CONTENT

* sı Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00326. Table S1. Assays for detecting SARS-CoV-2 nucleic acid and antibody responses (PDF) A.D.W. is supported by the NIHR Manchester Biomedical Research Centre.