Intrinsically Disordered Proteins: Perspective on COVID-19 Infection and Drug Discovery

Since the beginning of the COVID-19 pandemic caused by SARS-CoV-2, millions of patients have been diagnosed and many of them have died from the disease worldwide. The identification of novel therapeutic targets are of utmost significance for prevention and treatment of COVID-19. SARS-CoV-2 is a single-stranded RNA virus with a 30 kb genome packaged into a membrane-enveloped virion, transcribing several tens of proteins. The belief that the amino acid sequence of proteins determines their 3D structure which, in turn, determines their function has been a central principle of molecular biology for a long time. Recently, it has been increasingly realized, however, that there is a large group of proteins that lack a fixed or ordered 3D structure, yet they exhibit important biological activities—so-called intrinsically disordered proteins and protein regions (IDPs/IDRs). Disordered regions in viral proteins are generally associated with viral infectivity and pathogenicity because they endow the viral proteins the ability to easily and promiscuously bind to host proteins; therefore, the proteome of SARS-CoV-2 has been thoroughly examined for intrinsic disorder. It has been recognized that, in fact, the SARS-CoV-2 proteome exhibits significant levels of structural order, with only the nucleocapsid (N) structural protein and two of the nonstructural proteins being highly disordered. The spike (S) protein of SARS-CoV-2 exhibits significant levels of structural order, yet its predicted percentage of intrinsic disorder is still higher than that of the spike protein of SARS-CoV. Noteworthy, however, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the existing ones play major roles in the functioning and virulence of the virus and are thus promising drug targets for rational antiviral drug design. Presented here is a COVID-19 perspective on the intrinsically disordered proteins, summarizing recent results on the SARS-CoV-2 proteome disorder features, their physiological and pathological relevance, and their prominence as prospective drug target sites.


INTRODUCTION
For a long time, one of the central principles of molecular biology has been the belief that the amino acid sequence of each protein determines its three-dimensional structure which, in turn, determines its function. Recently, it has been increasingly realized, however, that there is a large group of proteins and protein regions that lack a fixed or ordered 3D structure, yet they exhibit biological activitiesso-called intrinsically disordered proteins and intrinsically disordered regions (IDPs/IDRs) (Figure 1). 1−3 The highly dynamic disordered regions of these proteins have been linked to important phenomena such as enzyme catalysis and allosteric regulation and vital physiological functions such as cell signaling and transcription. They are also key players in the cellular liquid−liquid phase separation driving the formation of membraneless organelles, allowing the concentration of biomolecules to increase biochemical reaction efficiency and protection of nucleic acids or proteins to promote cell survival under stress. 4,5 In viral proteins, disordered regions have been strongly correlated to the viral infectivity and pathogenicity because they provide the viral proteins with the ability to easily  and promiscuously bind to host proteins. With the COVID-19 disease caused by infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) rapidly spreading around the world, the search for correlations between the viral protein disorder and the infection pathway has been intensified, hoping to find ways to mitigate the effects of the viral infection. Furthermore, the practice of rational drug design has largely ignored so far the presence of intrinsic disorder in target proteins. Understanding of the structure of these regions in the COVID-19 proteome would be a valuable asset for high-throughput screening in drug design and development for treatment of COVID-19.
In fact, the SARS-CoV-2 proteome has been found to exhibit significant levels of structural order. Except for the nucleocapsid (N) structural protein and two of the nonstructural proteins (ORF6 and ORF9b), the majority of SARS-CoV-2 proteins are predominantly highly ordered proteins, with a few disordered regions. 6 The spike (S) protein of SARS-CoV-2 exhibits significant levels of structural order, yet its predicted percentage of intrinsic disorder is still higher than that of the spike protein of SARS-CoV. Furthermore, the IDRs located in the SARS-CoV-2 proteins appear to be of high functional importance. Most of the SARS-CoV-2 proteins contain intrinsic disorder-based protein−protein interaction sites utilized for molecular recognition and interaction with certain partner proteins. The accumulated knowledge on the SARS-CoV-2 proteome is important for understanding the virulence of coronaviruses and to find ways to alleviate the effects of the viral infections.
Here, along with a concise review of the common knowledge on IDPs, we present a COVID-19 perspective on the intrinsically disordered proteins, summarizing and analyzing recent results on the SARS-CoV-2 proteome disorder features and their physiological and pathological relevance. As IDPs/ IDRs typically undergo structural transitions upon attaching to their physiological associates, such knowledge generates an important base for better understanding the activity of these proteins, their interactions with host proteins, and their prominence as prospective drug target sites.

INTRINSICALLY DISORDERED PROTEINS
2.1. IDPs Lack a Fixed or Ordered 3D Structure. IDPs/ IDRs do not exhibit a fixed three-dimensional structure. Instead, they fold dynamically into a series of conformations depending on the surrounding conditions. 7−9 This allows them to have a wide range of binding associates and thus serve significant roles in critical biological processes such as cell signaling and transcription. 2,10 It has been clear for quite some time now that IDPs/IDRs are functionally important and abundant in proteins implicated across the disease spectrum. 11 Whereas ordered proteins usually have a single, well-defined conformation representing a global free-energy minimum with the potential to bind small molecules with high affinity, the free-energy landscape of disordered proteins is characterized by a large number of local minima. These minima correspond to the many conformations within the structural ensemble populated by disordered proteins, which can transiently bind small molecules with weak affinity. Examples exist of disordered proteins interacting with other proteins or nucleic acids in which they undergo disorder-to-order transitions, resulting in low affinity complexes. 12−15 Due to the difference in their structure, ordered and disordered proteins exhibit different hydration degrees. The hydration is significantly higher for the IDPs in comparison to the similar size globular proteins. 16 IDPs also exhibit a high propensity of binding to charged solute ions. 17,18 The IDPs are abundant in living systems. Eukaryotes generally have the highest amount of IDPs in their proteomes, estimated between 30 and 45%. 19,20 More than half of eukaryotic proteins have long regions of disorder. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling and transcription and are subject of tight control by the organisms. 21 IDPs play key roles in many biological processes. They have numerous crucial functions that complement the functionality of ordered proteins. Highly dynamic disordered proteins have been related to functionally important processes such as allosteric regulation and enzyme catalysis. For many disordered proteins, the binding affinity to their receptors is regulated by post-translational modifications. The flexibility of disordered proteins facilitates the conformational requirements for binding enzymes and their receptors. IDPs are involved in regulation of transcription and translation, cellular signal transduction, protein phosphorylation, and the regulation of the self-assembly of large multiprotein complexes.
Many intrinsically disordered proteins undergo transitions to more ordered states or fold into stable secondary or tertiary structures upon binding to their targetsi.e., a coupled folding and binding route. Coupled folding and binding often produces a complex with high specificity and relatively low affinity, which is suitable for signal transduction proteins that must not only associate specifically to initiate the signaling process but also be capable of dissociation upon signaling completion. Another advantage of a system that undergoes coupled folding and binding is that the conformational flexibility expediates the post-translational modification of important transcription factors.
The difference between ordered and disordered proteins starts at the level of their amino acid sequences. There are noticeable differences between ordered and disordered proteins in terms of their amino acid compositions, charge, flexibility, and hydrophobicity. 22 Thus, the tendency of a protein to stay intrinsically disordered is programed in its amino acid sequence. 23,24 Indeed, IDPs are notably depleted in bulky hydrophobic (Ile, Leu, and Val) and aromatic amino acid residues (Trp, Tyr, Phe), which normally create the hydrophobic core of a folded globular protein and also have low content of the hydrophobic and uncharged Cys and Asn residues. These lessened residues are considered orderpromoting amino acids. On the other hand, natively unfolded proteins are substantially enriched in polar, hydrophilic, disorder-promoting amino acids: Ala, Arg, Gly, Gln, Ser, Pro, Glu, and Lys. Many IDPs have a substantial excess of basic or acidic amino acids and are thus highly charged at neutral pH. The charge destabilizes a compact structure on such proteins. The relationship between amino acid composition and sequence and protein order or disorder has been carefully explored in an effort to develop predictors of intrinsic disorder. At present, there are a multitude of such predictors based on a variety of amino acid attributes. 25−31 IDPs are suggested to be relevant to the origin of life. Scientists had long hypothesized that the early genetic code was simpler than it is now. The number of protein-building amino acids have been supposedly 12 or 14 amino acids rather than 20. 32 The last of the 20 amino acids to evolve was tryptophan, the largest amino acid and also the most structurepromoting one. It has been speculated that the earliest proteins have been disordered and may have played a unique and crucial role in the origin of life. Even more, it has been suggested that the amino acids of the earliest proteins lacked aromatic rings, which are required to stabilize the active site of the structured proteins, thus the outcome is that the laterevolving amino acids may have been key to developing structure. 32 The evolution of intrinsic disorder has been supposed to exhibit a wavy pattern, with disordered primordial proteins having mainly chaperone activities gradually substituted by highly ordered enzymes and protein intrinsic disorder reinvented at subsequent evolutionary steps along with development of more complex organisms, 33−35 playing substantial roles in the evolution of multifunctionality. 36−38 Moreover, the relationship between structural disorder and organism complexity, along with proteome size, has been discussed, suggesting that structural disorder may effectively increase the complexity of the species. 39,40 Interest in IDPs in protein science has been rapidly increasing in the years since 2000, as demonstrated by a search in CAS Content Collection. 41 Currently, there are over 6000 publications on IDPs-related topics ( Figure 2).

IDPs in Diseases.
IDPs have numerous crucial biological functions that complement the functionality of ordered proteins. However, when misfunction occurs (e.g., misexpression, misprocessing, or misregulation), IDPs/IDRs tend to engage into some undesirable interactions and get involved in the development of various pathological states. As a matter of fact, many proteins which are associated with neurodegeneration, diabetes, cardiovascular disease, amyloidosis, and genetic diseases, as well as the majority of the human cancer-related proteins, are either IDPs or contain long IDRs. 42−48 Recently, IDPs were suggested to be responsible for exothermic events observed in cerebrospinal fluid and brain proteome. These reversible exothermic transitions vary depending on neurodegenerative pathologies and possibly reflect processes of protein fibrillization and/or aggregation. 49, 50 Further, it has become clear that IDPs such as αsynuclein and tau protein are common among neurodegenerative diseases, especially Parkinson's disease. It has been recognized that α-synuclein can form amyloid fibrils, which are implicated with the pathogenesis of Parkinson's disease and other neurodegenerative conditions, collectively termed synucleinopathies. 51 IDPs, such as α-synuclein, tau protein, p53, and BRCA1, are appropriate targets for drugs modulating protein−protein interactions. From these and other examples, novel strategies for drug discovery based on IDPs have been developed. 52 The COVID-19 pandemic has been ongoing for almost 2 years. Although various treatments have been explored, efficient antiviral drugs are currently still in short supply. Targeting the IDPs/IDRs in the SARS-CoV-2 proteome could be an alternative strategy for rational antiviral drug design.

INTRINSICALLY DISORDERED PROTEINS IN
SARS-CoV-2 PROTEOME AND THEIR ROLE IN COVID-19 INFECTION Since the beginning of the COVID-19 pandemic caused by the SARS-CoV-2 virus, millions of patients have been diagnosed and many of them have died from the disease worldwide. The identification of novel therapeutic targets are of utmost significance for the prevention and treatment of COVID-19. The SARS-CoV-2 virus is a single-stranded RNA virus with a 30 kb genome packaged into a membrane-enveloped virion 80−90 nm in diameter, transcribing several tens of proteins ( Figure 3). 53 SARS-CoV-2 forms a virion including its genomic RNA bundled in a particle comprising four structural proteins spike (S) glycoprotein that binds to human angiotensin converting enzyme 2 (ACE2) receptor to mediate the entry of the virus into the host cell, 54 the membrane (M) protein facilitating viral assembly in the endoplasmic reticulum, 55,56 the ion channel small envelope (E) protein, 57 and the nucleocapsid (N) protein, 58−60 which assembles with viral RNA to form a ribonucleoprotein complexthe nucleocapsid. 61,62 A recent extensive NMR exploration provided thorough knowledge on the structure of the SARS-CoV-2 proteins, which is required for understanding the basic principles of the viral life cycle and processes underlying viral infection and transmission. 63 Moreover, protocols for the large-scale production of more than 80% of all SARS-CoV-2 proteins are provided, highly valuable for further explorations.
As it is known that viral proteomes typically contain noticeable levels of intrinsically disordered proteins and viral proteins utilize intrinsic disorder during host cell invasion, it is important to examine the intrinsic disorder of the viral proteins associated with the SARS-CoV-2 infection. 64 Intrinsically  disordered proteins and intrinsically disordered regions play key roles in vital biological processes, including DNA/RNA and protein binding. The RNA−protein recognition often needs conformational changes in both RNA and protein, which is facilitated by the structural flexibility of disordered residues. 65 Knowledge related to coronavirus proteomes from the viewpoint of the intrinsic disorder propensities can provide a useful outlook for the viral pathogenicity.
Disordered regions in viral proteins are generally associated with the viral infectivity and pathogenicity, thus it is of certain interest to evaluate the intrinsic disorder within the SARS-CoV-2 proteome. As a matter of fact, the SARS-CoV-2 proteome has been found to exhibit significant levels of structural order: except for the nucleocapsid (N) protein from the structural proteins (plus Nsp8 and ORF6 from the nonstructural proteins 6 ), the majority of SARS-CoV-2 proteins are highly ordered proteins containing a very few intrinsically disordered protein regions. 6 Noteworthy, however, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the existing ones contribute significantly to the functioning and virulence of the virus and are thus promising drug targets for antiviral drug discovery. 6,66,67 3.1. Nucleocapsid (N) Protein Exhibits Large IDRs. The RNA-binding nucleocapsid (N) protein stabilizes the genomic RNA inside the virus particle and regulates the viral genome transcription, replication, and packaging. It is composed of two structural domains: N-terminal RNA-binding domain (NTD, residues 45−181) and C-terminal dimerization domain (CTD, residues 248−365), bordered by three large intrinsically disordered regions (Figure 4). The primary role of the RNA-binding N protein in the coronavirus life cycle is to assemble with the genomic RNA into the viral RNA−protein complex and to mediate its packaging into 80−90 nm virions. 71 The N protein is highly disorderedits average percentage of predicted intrinsic disorder according to the various disorder prediction tools 27 has been estimated as 65%. 6 The IDRs in SARS-CoV-2 nucleocapsid protein comprise three segments, including residues 1−44 (IDR1), 182−247 (IDR2), and 366− 422 (IDR3) ( Figure 4A). 66 Analysis of the N protein disorder propensity using the protein disorder prediction system (PrDOS) 28,29 indicates a significant degree of disorder in these three domains ( Figure 4B). The highly flexible intrinsic disordered linker region, which connects the NTD and CTD, is rich in serine and arginine residues, both of which are highly disorder-promoting. 24 The middle IDR is known to be responsible for its RNA-binding activity. 68 An IDR that borders the CTD (C-terminal tail peptide) plays a significant role in dimer−dimer association in human coronaviruses, whereas nucleotide binding is largely mediated by the central NTD core. 69 In fact, molecular recognition domains have been identified in all three IDRs of the N protein. The RNA− protein recognition requires conformational changes in both RNA and protein, which is enabled by the structural flexibility of disordered regions. These findings support the key function of the N protein in a ribonucleoprotein core formation via interaction with the genomic RNA, which is a crucial step for RNA encapsulation and virus life cycle. Thus, the IDRs appear as appropriate targets to inhibit the interaction of N protein with viral genomic RNA. 70 Many IDPs are known to undergo liquid−liquid phase separation into dense intracellular organelles with a multitude of important cellular functions. 71 Liquid−liquid phase separation is a process by which biomolecules, such as proteins and/or nucleic acids, condense into a dense phase that often resembles liquid droplets. 72,73 It typically occurs within the cell, forming compartments termed membraneless organelles. 73 Eukaryotes contain numerous such membraneless compartments, and they are highly dynamic structures that can rapidly form on demand, thus concentrating proteins and biochemical reactions at distinct locations as needed. Virtually all such membraneless compartments contain a large proportion of IDPs. Their lack of defined secondary or tertiary structure and high conformational flexibility match the dynamic behavior of the membraneless compartments. One type of such membraneless aggregatesthe stress granulesare composed of proteins and RNAs and form when the cell is under stress and the translation initiation is limited. 74−77 Here again, it would be noteworthy to examine whether and how stress enhances or diminishes the degree of intrinsic disorder of the IDPs involved in the stress granules.
It has been shown recently that the N protein of the SARS-CoV-2 virus undergoes liquid−liquid phase separation into stress granules through its N-terminal intrinsically disordered region IDR1. 58,78,79 The condensation of the N protein into stress granules is possibly a way for SARS-CoV-2 to inhibit host cell innate immunity. A model has been developed in which phase separation of the SARS-CoV-2 N protein contributes both to suppression of the host immune response and to packaging genomic RNA during virion assembly. 78 Disruption of the N protein liquid−liquid phase separation process holds promise for antiviral intervention and offers new targets and strategies for the development of drugs to combat COVID-19. Recent NMR and crystal studies have confirmed the disordered nature of the N protein domains, including the NTD 80,81 and CTD 82,83 domains, whereas fluorescence recovery after photobleaching (FRAP) experiments confirmed the N protein liquid−liquid phase separation, 79 which concentrates components of the SARS-CoV-2 replication machinery, thus providing a means for enhanced viral transcription and replication.
A key to hold back the COVID-19 pandemic is to understand how SARS-CoV-2 manages to overcome host antiviral defense mechanisms. Stress granules, which are assembled throughout viral infection and function to sequester host and viral mRNAs and proteins, are part of the antiviral responses. It has been shown that the SARS-CoV-2 nucleocapsid (N) protein, an RNA binding protein essential for viral production, interacts with Ras-GTPase-activating protein SH3-domain-binding protein (G3BP) and disrupts stress granule assembly, both of which require the intrinsically disordered region 1 (IDR1) in the N protein. The N protein segregates into stress granules through liquid−liquid phase separation with G3BP and obstructs the interaction of G3BP with other stress-granule-related proteins. Furthermore, the N protein IDR domains important for phase separation with G3BP and stress granule disassembly are found to be essential for SARS-CoV-2 viral production. It has been suggested that N-protein-mediated stress granule disassembly is crucial for SARS-CoV-2 production. 84 It is thus implied that inhibition of the RNA-induced phase separation of the N protein provides a viable new strategy for the design of COVID-19 therapeutics. 79 3.2. Spike (S) Protein Exhibits Small but Critical IDRs. The spike (S) protein is a large multidomain 1273 amino acid fusion protein creating the exterior of the CoV particles. 85 It protrudes from the virion and ornaments the viral surface like a crown. It is anchored in the viral membrane and mediates fusion of the viral membrane with the host cell membrane. 86 The spike protein is critical for the entry of the coronaviruses into the host, so it is an attractive antiviral target. It contains two distinct domains, S1 and S2, each of which consists of a number of functional subunits ( Figure 5A).
The spike protein is cleaved by host proteases into the S1 and S2 subunits, which are responsible for receptor recognition and membrane fusion, respectively: subunit S1 activates viral infection by binding to host cell receptors, and S2 mediates the fusion of the virion and cellular membranes, thus promoting viral entry into the host cells. 62,86 The spike protein binds to a specific surface receptor angiotensin converting enzyme 2 (ACE2) on the host cell plasma membrane via its N-terminal receptor-binding domain (RBD). 87 The S1 subunit consists of NTD and RBD; the S2 subunit contains a fusion peptide (FP), a heptad repeat 1 (HR1), a central helix (CH), a connector domain (CD), a heptad repeat 2 (HR2), a transmembrane domain (TM), and a cytoplasmic tail (CT) ( Figure 5A). 88 Recent experimental studies confirmed that human ACE2 (hACE2) mediates SARS-CoV-2 S-protein-mediated entry into cells, establishing it as a functional entry receptor for this newly emerged coronavirus. 54 The site at the border between the S1 and S2 subunitsthe S1/S2 protease cleavage siteis where spike protein is cleaved into S1 and S2 subunits during entry into the infected cells. The attachment of the spike protein with the host cell is activated by the host cell enzymes trypsin, cathepsin L, furin, and TMPRSS2. Comparison of the sequence of SARS-CoV-2 against other coronaviruses indicates that a unique amino acid sequence pattern RRAR (Arg-Arg-Ala-Arg) is present at the S1/S2 junction of the spike protein, which is cleaved by the furin enzyme but is absent in other coronaviruses of the same clade, including SARS-CoV-1. 70,89 Moreover, the structure reported for SARS-CoV-2 spike protein (PDB code: 6VSB) shows that the S1/S2 junction is in a disordered, solventexposed loop, 90 which is hypothesized to be responsible for the effective viral transmission. 91,92 Another cleavage event at an additional site inside the S2 subunit, the S2′ site ( Figure 5A), brings the viral and cellular membranes together, eventually creating a fusion pore that lets the viral genome reach the cell cytoplasm. ACE2 binding by the virus exposes the S2′ site. The S2′ site cleavage by transmembrane protease serine 2 (TMPRSS2) results in the release of the fusion peptide. 86 In the SARS-CoV-2 S protein, the first S1/S2 cleavage site is at residue R685 89,93 (or R684 from a different study 64 ), whereas the second cleavage site generating the S2′ subunit is located at residue R816 94 (or R815 64,95 ), bordering the FP located at residues 816−855. 96 An intrinsic disorder profile generated for the spike protein of the SARS-CoV-2 virus by the disorder predictor PONDR-VLXT 97,98 ( Figure 5B) indicates that both the S1/S2 cleavage site and the FP are located within in IDRs. Considering the notion that the proteolytic digestion is considerably faster in unstructured relative to structured protein regions, 45,99 this structural specificity of the SARS-CoV-2 spike protein might be of high functional importance ( Figure 5B).
The S1 subunit of the S protein contains a RBD and an amino-terminal (N-terminal) domain. The RBD (residues 333−526 100 ) has a β-sheet core, bordered on either side by a short helix, and contains a receptor-binding motif (RBM, residues 438−508 101 ), which makes extensive contact with the ACE2 receptor, thus accounting for the interaction with ACE2. The tyrosine-rich RBM, which is stabilized by disulfide bonds, is also characterized by recognizable structural flexibility 101 it does not possess a regular secondary structure except for the two small β-sheets. 62 During SARS-CoV-2 virus infection, intrinsically disordered regions are detected at the interface of the spike protein and ACE2 receptor, providing a shape match to the complex. The key residues of the spike protein have strong binding affinity to ACE2, which can be a likely reason for the higher transmission rate of SARS-CoV-2. 70 Thus, the receptor binding and its membrane fusion are the initial and important steps in coronavirus infection and serve as primary targets for inhibiting the viral entry. Both exhibit regions of substantial intrinsic disorder, supposedly responsible for the viral cycle and pathogenicity. 70 Altogether, even though the spike protein of SARS-CoV-2 exhibits significant levels of structural order, its predicted percentage of intrinsic disorder is still higher than the spike protein of SARS-CoV (cf. 1.41 for SARS-CoV-2 vs 1.12 for SARS-CoV 6 ), which may correlate with the higher infectivity and pathogenicity of SARS-CoV-2. Moreover, it was recently shown that the SARS-CoV-2 S glycoprotein exhibits a furin cleavage site at the borderline between the S1/S2 subunits, which sets this virus apart from other SARS-CoVs, supposedly enhancing its transmissibility due to the ubiquitous distribution of furin-like proteases in host cells. 54 A recent study made an advance in approaching anti-COVID-19 drug discovery by specifically focusing on targeting disordered protein regions in the virus proteome. 67 It was demonstrated how these IDRs can be targeted through molecular docking. As a result, 11 new drug candidates were identified that exhibited high binding and activity scores, as well as good antiviral properties. 67 3.3. M Protein Is Highly Ordered, Which May Contribute to the Greater Virulence of the SARS-CoV-2. The membrane (M) protein, one of the four structural proteins in the coronavirus virion, is a major transmembrane protein that is found in large numbers in the virion. It is a part of the protective proteinaceous layer responsible for the virus survival upon its transmission between the hosts. Estimates of the intrinsic disorder indicate that SARS-CoV-2 has one of the hardest protective outer shell among coronaviruses ( Figure  6)the percentage of intrinsic disorder of the M protein of only 6% (cf. 8% for SARS-CoV, 9% for MERS-CoV, 11% for HCoV-NL63, etc.). 103 Structural studies have confirmed the highly ordered organization of the M protein. 104 Thus, it might be anticipated for the SARS-CoV-2 virus to be highly resistant to antimicrobial substances in saliva and/or other bodily fluids, as well as in the environment, outside of the body. It is therefore expected to remain active for a longer time, which may account for greater contagiousness of SARS-CoV-2. Indeed, correlation has been confirmed between the virulence of various viruses and the percentage of intrinsic disorder of their M protein, with the less disordered viruses being more contagious. 105 A model that estimates the percentage of intrinsic disorder of viral proteins demonstrates that the degree of shell disorder in coronaviruses correlates with the levels of their fecal−oral and respiratory transmission. 106 The N protein is also important for the model, as it exhibits the greater disorder in the inner shell related to the mode of infection and virulence in other viruses. 103 In fact, it seems reasonable that the performance of the virus depends on a delicate balance between the rigidity of its protective outer shell on one side, with the M protein as a major constituent, responsible for the virus survival, and the flexibility and adaptability of the inner core on the other side, especially the N nucleoprotein, upon RNA encapsulation throughout the virus life cycle, promoting binding to host proteins. Thus, the M protein orderliness and the N protein disorder seem to contribute to the SARS-CoV-2 high virulence and infectivity. It is also worth noting that, despite that the spike protein of SARS-CoV-2 exhibits significant levels of structural order, its predicted percentage of intrinsic disorder is still higher than that of the spike protein of SARS-CoV, 6 which may also contribute to the higher infectivity and pathogenicity of SARS-CoV-2.
In the group of the nonstructural SARS-CoV-2 proteins, several moderately disordered proteinsNsp8, ORF6, and ORF9bhave been identified, as well. 6 The C-terminal region of the nonstructural protein Nsp2 has been predicted to be disordered. 107 Because it interacts with host proteins, which regulate translation initiation and endosome vesicle sorting, compounds that block these interactions could be valuable candidates for drug development. Although the other proteins exhibit lower disorder content, nearly all contain at least one IDR. Moreover, intrinsic disorder has been reported at the cleavage sites of replicase protein 1ab of SARS-CoV-2. 6 Generally, there is a large variation in the distribution of disordered residue fractions in the viral proteomes, with certain viral species highly enriched in intrinsic disorder. 40 Examples of viral proteins with a high degree of intrinsic disorder (>80%) according to the DisProt database 108 are shown in Table 1. Indeed, the high content of intrinsic disorder predicted in viruses agrees with a recent study which showed that viral proteins were significantly enriched in polar residues and depleted in hydrophobic residues compared with that of archaea and bacteria, 109 which correlates with their disorder- edness. 110,111 The high intrinsic disorder in viral proteins is supposedly linked to important functional implications, helping the viruses to highjack various pathways of the host cells and/or to accommodate to their hostile habitats. 110,111 The roles of IDPs/IDRs in certain viral proteins upon viral infections have been explored but are not yet completely understood, including membrane-binding protein λN of bacteriophage, horde virus protein TGBp1, influenza virus nonstructural protein 2, basic protein δAg of hepatitis B virus, and human adenovirus type 5. 1 4. OUTLINE OF THE PROSPECTIVE DRUG TARGET SITES As discussed above, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the existing ones contribute significantly to the functioning and virulence of the virus and are thus promising drug targets for antiviral drug discovery. 6,66,67 Therefore, it might be productive to focus on targeting those intrinsically disordered protein regions that lack a stable structure instead of following the common drug discovery approach along the conventional protein structure− function paradigm, thus designing drugs to bind to fixed 3D structures. Moreover, such an approach already has proven valuable in designing new SARS-CoV-2 drug candidates targeting IDRs. 67 • The nucleocapsid (N) protein plays a key function for the ribonucleoprotein core formation via interaction with the genomic RNA, a critical step in the virus life cycle. Thus, the three IDRs in the N protein appear as appropriate targets to inhibit its interaction with the viral genomic RNA. 70 • The N protein segregates into stress granules through liquid−liquid phase separation with Ras-GTPase-activating protein SH3-domain-binding protein (G3BP) and obstructs the interaction of G3BP with other stressgranule-related proteins. The process requires active involvement of the intrinsically disordered region 1 (IDR1) of the N protein. Disruption of N protein liquid−liquid phase separation process holds promise for antiviral intervention and offers new targets and strategies for the development of drugs to combat COVID-19. • Because of the vital function of the spike (S) protein for the entry of coronaviruses into the host, it is a reasonable target for inhibition by neutralizing antibodies, and characterization of the spike protein structure provides strategic information for rational vaccine design. The receptor binding and its membrane fusion are the initial and important steps in the coronavirus infection, both of which involve specific intrinsically disordered regions of the protein as key players and serve as primary targets for inhibiting the viral entry. Neutralizing antibodies targeting the SARS-CoV-2 S trimer have exhibited protection from viral infection in animal models and are currently being assessed as therapeutics in humans. 62 Such antibodies include human monoclonal antibodies isolated from COVID-19 recovering donors and singledomain nanobodies. Although some neutralizing antibodies target the NTD or the S2 domain, most of them bind to the RBD, producing steric hindrance and thus blocking ACE2 attachment. 84 Recent biophysical modeling suggested that the SARS-CoV-2 spike protein may provide an adaptable allosteric mechanism utilized to fine-tune the response to antibody binding, which may be useful for therapeutic intervention by targeting specific hotspots of allosteric interactions. 112 • Research consortia have been established with the goal to bring together scientists with different expertise to advance our understanding of COVID-19 and speed-up drug discovery. 113,114 A COVID-19 NMR research consortium was set up in 2020, aiming to support the search for antiviral drugs using an NMR-based screening and establishing protocols for large-scale production of all druggable SARS-CoV-2 proteins and RNAs for rational drug design and fast mapping of compound binding sites. 63

CONCLUSIONS AND OUTLOOK
The notion of the abundance of IDPs and IDRs in proteomes is altering protein science. The concept of protein intrinsic disorder provides answers for many scientific problems that cannot be easily explained based on the classic structure− function paradigm. Functions of IDPs/IDRs can be controlled by multiple processes, such as post-translational modifications, interaction with various other molecules, etc. The multilevel structural and functional complexity of IDPs/IDRs makes them particularly sensitive to the environment. Furthermore, intrinsic disorder may be related to the emergent behavior of several systems characterized by the presence of specific patterns and can be used to explain the biochemical reaction− diffusion processes. Understanding how structured and disordered proteins work in concert is crucial for understanding protein functions. Appearance of novel viruses and associated epidemics around the globe are currently a major concern. Knowledge on the structures and functions of the viral proteins is of utmost significance for identification of novel therapeutic targets for prevention and treatment of the viral diseases. 63 In this Perspective, we summarized information available on SARS-CoV-2 proteome with regards to the occurrence of intrinsic disorder in SARS-CoV-2 proteins. It has been recognized that the SARS-CoV-2 proteome exhibits substantial levels of structural order. From the SARS-CoV-2 proteome, only the nucleocapsid (N) protein is highly disordered, with an average percentage of predicted intrinsic disorder of 65%. The spike (S) protein of SARS-CoV-2 exhibits significant levels of structural order, yet its predicted percentage of intrinsic disorder is still higher than that of the spike protein of SARS-CoV. Furthermore, although the other structural proteins of SARS-CoV-2 apart from the N protein are characterized by a low degree of disorder, their existing IDRs contribute   (28) Protein DisOrder prediction system, https://prdos.hgc.jp/cgibin/top.cgi (accessed 2021-10-07).