Screening for Inhibitors of Main Protease in SARS-CoV-2: In Silico and In Vitro Approach Avoiding Peptidyl Secondary Amides

In addition to vaccines, antiviral drugs are essential for suppressing COVID-19. Although several inhibitor candidates were reported for SARS-CoV-2 main protease, most are highly polar peptidomimetics with poor oral bioavailability and cell membrane permeability. Here, we conducted structure-based virtual screening and in vitro assays to obtain hit compounds belonging to a new chemical space, excluding peptidyl secondary amides. In total, 180 compounds were subjected to the primary assay at 20 μM, and nine compounds with inhibition rates of >5% were obtained. The IC50 of six compounds was determined in dose–response experiments, with the values on the order of 10–4 M. Although nitro groups were enriched in the substructure of the hit compounds, they did not significantly contribute to the binding interaction in the predicted docking poses. Physicochemical properties prediction showed good oral absorption. These new scaffolds are promising candidates for future optimization.


■ INTRODUCTION
Caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Coronavirus disease 2019 (COVID- 19) became a pandemic in 2020 and is still highly prevalent. 1 Although some effective vaccines have been developed 2−4 and are being widely administered, 5,6 the disease is far from being completely eradicated, because of poor compliance by the public with containment protocols, vaccine breakthrough infections, and the emergence of mutant strains. 7,8 In addition, although antiviral drugs such as remdesivir have shown some efficacy in drug repositioning studies, 9 no effective and specific SARS-CoV-2 antiviral drugs are available.
The most widely used attempts to identify new anticoronaviral drugs involve targeting RNA-dependent RNA polymerase and main protease (3-chymotrypsin like protease). 10,11 The main protease is an enzyme that cleaves the viral polyprotein and is essential for viral replication. 12 It shows glutamine-specific cleavage activity that has not been observed in human proteases, 13,14 and it is highly conserved among coronaviruses such as severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome, 15 making it a suitable target for drug discovery. 16 Here, we list the representative main protease inhibitors that have been discovered earlier. The structural formulas of them, including the earliest and most recent ones, are shown in Figure  1a. Many of these inhibitors are peptidomimetics.
• N3 is a substrate-mimicking covalent inhibitor identified in a study of SARS-coronavirus (CoV). 17 This inhibitor covalently binds to a cysteine residue in the active site as a Michael acceptor; however, because of its high polarity, it exhibits low membrane permeability and is not effective in vivo. 18,19 • GC376 is a dipeptide-based inhibitor of main protease that was originally developed for treating feline infectious peritonitis and is a broad-spectrum anticoronaviral drug. 20−22 • Pfizer is currently conducting clinical trials for oral (PF-07321332; nirmatrelvir) and intravenous (PF-07304814; lufotrelvir) candidate inhibitors. 23−25 The optimization of PF-07321332 started from PF-00835231 (active form of PF-07304814). A nitrile group was introduced as a covalent warhead to react with the cysteine residue in the active site ( Figure 1b). 26 After several optimization steps, the final structure of PF-07321332 resulted in a derivative of the merged pharmacophores of boceprevir 19 and PF-00835231. • GRL-2420 is a tripeptide-based inhibitor 27 that was originally found in a study of SARS-CoV.
• CVD-0013943 is an inhibitor discovered as part of the COVID Moonshot project, 28 an open science challenge to fight the global pandemic. 29−31 CVD-0013943 is smaller in size than other peptidic inhibitors and was shown to have low toxicity but also low metabolic stability. 32,33 Various other inhibitors are currently being developed by pharmaceutical companies; 34−36 however, some of the structural formulas of these inhibitors have not been disclosed.
The above survey of inhibitors is unfortunately consistent with the fact that protease inhibitors have a tendency to have an amide structure, 37,38 and many active compounds identified to date are highly polar compounds containing amides, as shown in Figure 1a. The main protease is an enzyme that functions inside of the virus-infected cell; 16 thus, compounds must penetrate the cell membrane to inhibit the protease. Amide compounds sometimes show low membrane permeability, because of their polarity or are degraded by proteases, 39,40 and structural conversion is necessary in some cases, particularly to ensure oral bioavailability. Therefore, nonamide active compounds should be identified to expand the chemical space of hit compounds and increase the success rate of novel drug discovery.
In this study, we created a subset of a screening compound library that excludes the peptidyl secondary amide structure and conducted a hit search for nonamide compounds using structure-based virtual screening (SBVS), a rational in silico physicochemical simulation method. Candidate compounds for the assay were selected by SBVS, which is more useful than ligand-based virtual screening (LBVS) for identifying novel scaffolds. 41−44 The 180 compounds extracted by SBVS were subjected to enzyme inhibition assays to confirm their activity, and six compounds showing activity were obtained.

■ RESULTS
In Silico Screening. The Enamine library (3 341 762 compounds) was filtered into 99 765 compounds to avoid peptidyl secondary amides. These compounds were ranked by conventional rigid docking simulation, 45 using three Protein Data Bank (PDB) structures as targets: 6M0K.PDB, 46,47 7JKV.PDB, 27,48 and Mpro ligand x12073 49 . According to the docking scores and visual inspection, 180 compounds were selected for in vitro assays. The list of 180 compounds is shown in Table S1 in the Supporting Information. The PDB ID of the protein structure of the docking target, from which the selection of each compound was derived, is also shown in this table. A comparison of the properties of the 180 compounds evaluated in this study and two sets of known hit compounds is shown in Figure 2. The two sets consist of ChEMBL registered compounds and COVID Moonshot compounds with submicromolar activities. The 180 compounds assayed in this study do not contain peptidyl secondary amides; however, compounds containing lactams or tertiary amides were not excluded ( Figure  2a). Principal component analysis (PCA) plots of each group of hit compounds based on Morgan fingerprints, are shown in Figures 2b and 2c. Although there was some overlap between the sets in the chemical space, the 180 compounds assayed in this study were generally located in a new space.
Primary Assay. The 180 compounds selected by in silico screening were examined by an in vitro fluorescence assay. The mechanism of the assay system is shown in Figure 3a. The assay system was validated using GC376 20 as a positive control ( Figure 3b). All test compounds were assayed at 20 μM. Among the 180 compounds, nine compounds showed inhibition rates of >5% (see Table S1 in the Supporting Information).
Dose Response Experiment. Dose−response experiments were conducted for compounds whose inhibition rates in the primary assay were >5%. The dose−response curves are shown in Figure 3c. The concentrations of the compounds that reduced enzyme activity by 50% (IC 50 ) were determined for six compounds: Z391132396, Z166626994, Z819866548, Z2094146478, Z1159100304, and Z324552662. In the counter assay, no signal interference was detected for the six compounds. The structural formulas of these six hit compounds are shown in Figure 4a. Figure 4b shows the positions of these hit compounds in the PCA plot in Figure 2c.
Redocking of the Six Hit Compounds. Since five of the six hit compounds were candidates obtained from docking against ligand Mpro-x12073, each hit compound was redocked against ligand Mpro-x12073. A superimposed image of the docking poses of the six compounds (white) is shown in Figure 4c. Ligand x12073 is colored yellow. All ligands except for Z324552662 were in positions that roughly overlapped with ligand x12073 and occupied the P1−P2 pocket. Z324552662 protruded into the P1′ pocket. The nitro groups faced the outside of the cavity. The 2D diagrams of the docking poses are shown in Figure S1a in the Supporting Information. ADME Prediction of Compounds. Absorption, distribution, metabolism, and excretion (ADME) predictions for the compounds assayed in this study and sets of known inhibitors were performed using SwissADME 50 (see Table S2 in the Supporting Information). The six compounds identified in this study generally satisfied Lipinski's Rule of Five and were predicted to be orally absorbable; structural alerts include nitro groups, whereas no structures corresponded to pan-assay interference compounds (PAINS; compounds prone to false positives). 51 IC 50 values, structural alerts, and c log P values for each compound are shown in Table 1.

■ DISCUSSION
Of the 180 compounds assayed, we obtained six compounds with main protease inhibitory activity at high concentrations and for which IC 50 values could be determined. We identified a tertiary amide compound, a sulfonamide compound, a compound containing a lactam structure, and three compounds without an amide bond. According to the prediction by SwissADME, a certain level of membrane permeability seems to be guaranteed, because of the avoidance of amides in the screening process. Some hit compounds were weakly reactive, such as a weak electrophile containing a nitro group. As we did not perform a counter assay using other enzymes, the target specificity of these compounds remains unknown. Twelve of the 180 compounds assayed contained nitro groups, and 4 of the 6 hit compounds contained nitro groups. Because the percentage of compounds containing nitro groups was enriched, the electrophilic nature of the nitro groups may have conferred reactivity to the cysteine protease. Formation of a thiohydroximate adduct in the reaction of a nitro group with a cysteine residue in the active site has been reported. 52 In fragment screening conducted prior to the COVID Moonshot project, 53 an electrophile library 54,55 oriented toward covalent bonding was used. Since these electrophile libraries did not cover nitro groups, they evaluated a different chemical space from that examined in the present study. Based on the predicted docking poses ( Figure S1a in the Supporting Information), covalent interactions were not suggested by our results, because the nitro groups of each hit compound were not located close to the cysteine residues in the active site. In addition, because these nitro groups are not responsible for strong interactions in the docking poses, they are considered bioisosterically substitutable 56,57 or removable if necessary ( Figure S1b in the Supporting Information). In the docking poses, the residue interactions were generally consistent with important hot spots reported previously. 58 All hit compounds showed IC 50 values on the order of 10 −4 M, which is weaker than those of currently known sets of amide compounds with submicromolar activity. On a case-by-case basis, hit-to-lead optimization can increase activity by hundreds or thousands of folds. 59 For example, the activity of an amide bond can be improved by restricting the dihedral degree of freedom of the amide bond to the active conformation by cyclization in some cases. 60 However, because the structure of the active site of the main protease widely fluctuates, 61 it may be desirable for the compound to have some degree of freedom in its conformers to accommodate fluctuations in the active site. In contrast, compounds without a peptidyl secondary amide structure are thought to be more stable against cleavage by proteases, and structural optimization may yield a more stable and active inhibitor in vivo. 62 As our hit compounds did not contain a peptidyl secondary amide at the P1−P2 position (Figure 4c), the hit compound structures may be useful as reference scaffolds if amide substitution is necessary to optimize other peptidomimetic inhibitors. In addition, most assay datasets reported to date for main proteases, both positive and negative, consist of compounds containing peptidyl secondary amide structures; thus, our dataset is valuable because it expands the compound chemical space to be assayed. Some inhibitors are currently being evaluated in clinical trials; 23,24 however, it is important to have many candidate compounds, in view of the emergence of resistant viruses in the future. Expanding the number of hit compounds with higher structural diversity is beneficial for drug discovery.
In conclusion, because of their predictedly good physicochemical properties as oral drugs, our new scaffolds identified in this study will contribute to the advancement of anticoronaviral drug research.

■ EXPERIMENTAL SECTION
Filtering of Screening Compound Library. The September 2020 version of the Enamine Collection, 63 consisting of 3 341 762 compounds, was used as the compound library. To obtain a set of compounds without peptidyl secondary amide bonds, the library was filtered using the following criteria.
• 20 ≤ heavy atom count   Structure-Based Virtual Screening. Compound Conformer Generation. After filtering as described above, conformer generation was performed for the 99 765 remaining compounds using GYPSUM-DL software (version 1.1.7). 64 The following execution options were used: -max_variants_per_compound 1 use_durrant_lab_filters The resulting structures were saved as SDF files.
Protein Model Preparation. The protein models for docking simulation were prepared using Molegro Virtual Docker (version 7.0.0). 45 The source PDB structures were 6M0K, 47 7JKV, 48 and Mpro-x12073 49 (COVID Moonshot project 31 ). The Models were prepared using the Protein Preparation Wizard in Molegro Virtual Docker (default settings).
Compound Selection by Docking Simulation. Using the compound conformers and protein models, docking simulation was performed using Molegro Virtual Docker. 45 The search space was set as an 8 Å sphere centered at the active site. Docking simulation was performed via the Docking Wizard in Molegro Virtual Docker (Scoring function: PLANTS score, Algorithm: MolDock SE, After docking: Energy Minimization enabled, H-bonds optimization enabled. The compounds were ranked by the Rerank score (linear combination of steric, van der Waals, hydrogen bonding, and electrostatic interactions) and LE3 score (Rerank score divided by heavy atom count). A 2D scatter plot was drawn using the Rerank score and LE3, and we manually chose compounds that were outliers with better (lower) values in the distribution. High-scoring compounds were further assessed by visual inspection to check for key hydrogen bonds and shape fitting. Briefly, docking poses fixed by hydrogen bonds at both ends or three or more points of the compound in the cavity of the active site were selected. Docking poses with geometric centers too close to the walls of the cavity and showing a low filling rate of the cavity were avoided. We also avoided compounds with unfavorable torsions in the conformer of the docked pose. Finally, 180 compounds were chosen for in vitro analysis ( Primary Assay Protocol. In one 384-well plate, 150 nL of compounds was placed in columns 3−22 and dimethyl sulfoxide was added to columns 1−2, except for wells A1−2 and B1−2, which were filled with the reference protease inhibitor GC-376 at the IC 50 . Columns 23 and 24 were filled with 150 nL of the reference compound GC-376 at a saturating concentration of 3.7 μM. Next, 7.5 μL of 3CL protease 2× solution (4.5 μg/mL in 1× assay buffer with 1 mM DDT) was added to all wells, using a Multidrop Combi Reagent Dispenser (Thermo Fisher Scientific, Waltham, MA, USA). The enzyme was preincubated with the compounds for 30 min at room temperature (25°C) with slow shaking. Each well of the plate was filled with 7.5 μL of 3CL substrate 2× solution (30 μM in assay buffer with 1 mM DDT) using a Multidrop Combi Dispenser (Thermo Fisher Scientific, Waltham, MA, USA). The final concentration of test compounds was 20 μM. The plate was incubated for 20 min at room temperature. The fluorescence (excitation 360 nm, emission 460 nm) was read on a Paradigm reader (Molecular Devices, Sunnyvale, CA, USA).
Data Analysis of Primary Assay. Each high-throughput screening plate contained a single test compound in columns 3− 22, controls (enzyme, no compound) in columns 1 and 2, and blanks (saturating concentration of the reference compound GC-376) in columns 23 and 24. The high-throughput screening percent inhibition was calculated for each compound from the signal in fluorescence units, mean of the plate controls, and mean of the plate blanks using the following equation: The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c01087.
Predicted binding poses of the six hit compounds found in this study ( Figure S1) (PDF) List of assayed compounds in this study (Table S1) (XLSX) ADME prediction result of assayed compounds in this study (Table S2) (XLSX) Purity information on six hit compounds (