Analysis of Approaches to Anti-tuberculosis Compounds

Mycobacterium tuberculosis (Mtb) remains a deadly pathogen two decades after the announcement of tuberculosis (TB) as a global health emergency by the World Health Organization. Medicinal chemistry efforts to synthesize potential drugs to shorten TB treatments have not always been successful. Here, we analyze physiochemical properties of 39 TB drugs and 1271 synthetic compounds reported in 40 publications from 2006 to early 2020. We also propose a new TB space of physiochemical properties that may provide more appropriate guidelines for design of anti-TB drugs.


■ INTRODUCTION
Tuberculosis (TB), a communicable disease caused by Mycobacterium tuberculosis (Mtb), is one of the top 10 leading causes of death worldwide and the leading cause of death from a single infectious disease (ranked above HIV/AIDS). 1 It typically affects the lungs (pulmonary TB) but can also have effects on other sites (extrapulmonary TB). This disease can be spread when people who are infected with pulmonary TB expel it to air, for instance, by coughing or sneezing. TB has remained a global health problem because Mtb adopts different strategies to survive in a variety of host lesions. The pathogen has become resistant to currently available drugs, and this is one of the main reasons for the failure to control the spread of TB.
The current initial treatment of TB involves taking four drugs (isoniazid 1, rifampicin 2, and pyrazinamide 3, and ethambutol 4) daily for two months, followed by four months of rifampicin and isoniazid in the continuation stage. This regimen is currently used for most cases of TB and has been successful in treatment of 80−90% of patients with drug-sensitive TB. However, due to the increasing number of multi-drug resistant TB (MDR-TB) and HIV/TB co-infection cases, TB is still a leading cause of death worldwide. 1 MDR-TB caused by Mtb bacilli that are resistant to at least rifampicin and isoniazid requires at least 20 months of treatment with drugs that are more toxic, poorly efficient, and poorly tolerated, with cure rates of only 60−70%. 1 TB/HIV coinfection has more complications. In developing countries, TB is the main cause of death among HIV-infected people. As a result, interactions between anti-TB drugs and anti-retrovirals enhance the risk of adverse effects and make the treatment more complicated. TB treatment is challenging and requires early diagnosis, accurate and effective chemotherapy regimens, and drug-resistance screening. Development of shorter and simpler drug regimens that are safe, suitable for joint TB/HIV treatment, and well tolerated is essential.
Here, we review synthetic anti-mycobacterium compounds reported in 40 publications from 2006 to 2020, selected to contain diverse chemical classes, and present an analysis of the drug-like properties of the reported compounds to inform better strategies for synthesis of new anti-TB compounds.
Assessing Physiochemical Properties and Druggability. In 1997, Lipinski proposed the "Rule of Five" (Ro5) as a result of the analysis of around 90% of orally active drug candidates that were in phase II clinical trials in order to understand which factors contributed to compound attrition in clinical development. 2 Ro5 is a set of four simple physiochemical properties: hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), molecular weight (MW), and logarithm of partition coefficient of a molecule between aqueous and lipophilic phases, usually octanol and water (log P). According to this rule, to be orally bioavailable, a candidate molecule should have the cut-off numbers of five or multiples of five in these four factors. Cell permeable compounds should have less than five HBDs (HBD ≤ 5), less than ten HBAs (HBA ≤ 10), a MW ≤ 500, and a log P ≤ 5. If two or more properties are violated in the Ro5 by a compound, there is a high probability of lack of bioavailability and oral activity. 2 At the same time, there is no guarantee that the molecule is druggable if it passes the Ro5. The rule is used as a guide for better selection and design of compounds to reduce attrition in clinical development due to unsatisfactory pharmacokinetics, and it is not an absolute set of strict guidelines. 3 Lipinski's rule is sometimes misleading. For instance, some undesirable compounds can pass the Ro5 and, therefore, be considered druggable, whereas more appropriate compounds can fail due to the violation of one or more cut-offs. Subsequently, Hopkins and co-workers proposed "quantitative estimate of drug-likeness" (QED) which is a measure of druglikeness based on the concept of desirability. 4 QED is an integrated function of eight desirability functions which are calculated for each physiochemical property including MW, log P, HBD, HBA, polar surface area (PSA), rotatable bonds (ROTBs), aromatic ring count (RNG), and number of alerts. One of the useful properties to predict oral bioavailability and activity is rotatable bond count. Each two ROTBs decrease ligand affinity by 0.5 kcal on average. 5 If both a rigid and a flexible ligand bind to a protein with the same pattern of interaction (based on hydrogen and hydrophobic interactions), the rigid ligand will have much stronger binding due to lower entropic losses. 6 A good orally bioavailable drug usually has 10 or fewer ROTBs and polar surface equal to or less than 140 Å as well as following Lipinski's Ro5. 7 Although QED provides a richer and more reliable concept than Ro5, it is also not the final word in understanding drug-likeness features. In 1999, leadlikeness space was also proposed by Oprea in the following parameters: MW ≤ 350 and log P ≥ 3. 8 Herein, the four individual properties MW, log P, HBD, and HBA as well as PSA and ROTB were analyzed for 39 TB drugs (approved and candidates in clinical trials) and 1271 synthetic anti-tubercular compounds reported in 40 publications between 2006 to early 2020. 9 Table 1. 50 It is also noted that anti-TB oxazolidinone posizolid (AZD5847) which completed a phase II clinical trial showed similar potential activities to other anti-TB agents listed; however, it is excluded from this analysis as its trial is discontinued. The percentage of TB drugs and synthetic anti-tubercular compounds compliant with Lipinski's rule is shown in Figure 1. While almost 72% of TB drugs (28 drugs) follow all Ro5 parameters or have just one violation, 28% of the drugs (11 drugs) have two or more violations. These values were 77 and 23%, respectively, in our previous study published in 2014. 51 Two of the drugs which violate Lipinski's rule are intravenous/injectable drugs (streptomycin 5 and amikacin 6). Four are orally bioavailable drugs (rifampicin 2, bedaquiline 16, delamanid 17, and clarithromycin 19) and five The histogram for MW, calculated log P (clog P), HBDs and HBAs as well as PSA and ROTBs for TB drugs (blue bars) and the analyzed synthetic compounds (pink bars) are shown in Figure 2. The combined histograms for the most active compounds among the analyzed synthetic compounds can also be found in Supporting Information ( Figure S1). About 49% of TB drugs have MWs of 300−500 Da and are all synthetic compounds either having novel structures such as TBI-223 32, delpazolid 34, and SQ109 38 or semisynthetic compounds derived from natural products such as spectinamide 1810 24. About 28% of the TB drugs have the MWs of more than 500 Da. The lowest and highest MWs belong to nature-derived TB drugs. Cycloserine 8, pyrazinamide 3, isoniazid 1, and paminosalicylic acid 7 possess the lowest MWs of 100−150 Da, while the highest MWs of more than 700 Da occur in rifampicin 2 and clarithromycin 19.
The distribution of clog P shows a wide range ( Figure 2b). About 54% of TB drugs possess clog P values in the range −1 ≤ clog P ≤ 3. About 18% have clog P of more than 5, these include clofazimine 15, bedaquiline 16, delamanid 17, TBAJ-587 25, TBAJ-876 26, TBI166 31, and telacebec Q203 39. The most polar TB drug is the natural product streptomycin 5 with a clog P of about −7. Overall, synthetic drugs have higher clog P values than nature-derived TB drugs.
The distribution of HBDs shows a steady decrease starting from a maximum at 1, while a wide range of variability is observed in the distribution histogram of HBAs. About 10% of the TB drugs violate the HBD cut-off values, while this number is slightly higher for HBA with about 15% violation. Rifampicin 2, streptomycin 5, amikacin 6, and spectinamide 1810 24 are four TB drugs violating Lipinski's rule for both HBD and HBA values.
Two more physiochemical properties which have been analyzed are PSA and ROTBs. Violation of the desirable values for PSA and ROTB occur in about 13 and 8% of TB drugs, respectively. All TB drugs having PSAs of more than 140 Å are nature-derived drugs. All three TB drugs having more than 10 ROTBs are synthetic drugs currently in clinical trials: auranofin 33, TBAJ-587 25, and TBAJ-876 26.
TB drugs have molecular exceptions to the Ro5. In the previous studies by O'Shea and Moser, it has been found that antibacterial compounds, especially for those active against Gram-negative bacteria, have higher average MWs and polarity in comparison to non-antibacterial compounds. 52,53 This may be due to different cell wall architecture in Gram-positive and -negative bacteria which require different compound properties to be able to penetrate these two types of bacteria. 52 The marketed TB drugs are widely distributed within the physiochemical space.
Physiochemical Properties of Synthetic Anti-mycobacterium Compounds Reported in 40 Publications from 2006 to 2020. A data set of 1271 synthetic anti-TB compounds reported in 40 publications from 2006 to 2020, selected to contain diverse chemical classes, was compiled to analyze the strategies used in the design of these synthesized compounds. The majority of publications reported 15−50 compounds. Only two publications reported less than 10 compounds and only one publication reported more than 50 compounds (115 compounds).
Histograms of the calculated physiochemical properties are depicted in Figure 2. Almost 83% of the synthetic compounds follow Lipinski's Ro5, and the majority of the remainder have violations in either MW, clog P, or both.
Unlike the wide variable distribution of TB drugs in the MW histogram, the synthetic compounds show a Gaussian distribution with a maxima at 350−400 Da. This distribution is similar to the distribution of anti-TB natural products analyzed in our previous studies. 51,54 However, the percentage of natural products with MWs of more than 700 Da was much higher than for the synthetic compounds. There is no synthetic compound with a MW of more than 1000 Da, while about 4% of the analyzed natural products were located in this region. About 21% of synthetic compounds violate Lipinski's rule with MWs of more than 500 Da. All 68 most active compounds reported in the analyzed 40 publications show a clog P value between 200 and 650 Da with about 80% following Lipinski's rule cut-off for MW ( Figure S1).
The histogram of clog P shows the biggest variance between TB drugs and the synthetic compounds. The histogram of clog P for the synthetic compounds shows a Gaussian distribution as well as a maxima at 4−5; however, the synthetic drugs are shifted to a higher clog P values compared to TB drugs. This shift has also been observed in our previous studies for antimycobacterium natural products compared to all natural products. 51 The histogram of clog P for TB drugs and anti-mycobacterial natural products depicted a bimodal pattern in our previous studies; however, this is not observed in the analysis of synthetic anti-mycobacterium compounds using the new updated TB drugs.
In contrast to TB drugs where about one-third have negative values of log P, only 1.5% of synthetic compounds show negative clog P values. The TB drugs distribution has the majority of compounds (54%) with a clog P value between −1 and 3, while 74% of synthetic compounds are observed in the range of the clog P 3 to 6. 39% of the synthetic compounds violate Lipinski cut-off, whereas this is much smaller in TB drugs with 18% violation. The clog P histogram for the most active compounds reveals Gaussian distribution as well with a peak at 5−6. Out of 68 most active synthetic compounds, almost half (51%) possess clog P values between 4 and 6. Only one compound shows a minus value (−0.5) for clog P, and none of them has clog P of more than 9.
The distribution of HBDs for the synthetic compounds is similar to that of TB drugs, showing a maximum at 1 followed by a steady decrease. The histogram of HBAs for the synthetic compounds reveals a wide range of variability similar to that of TB drugs. The noticeable difference is the percentage of violation from Lipinski cut-off between TB drugs and synthetic compounds in both HBD and HBA. The percentage of TB drugs violating Ro5 are 10 and 15% in HBD and HBA, respectively; however, this percentage is much less in synthetic compounds with less than 1% violation in both HBD and HBA. The histograms of HBD and HBA for most active compounds are also very similar to the related histograms of all synthetic compounds. About 6% of the most active compounds show

ACS Omega
http://pubs.acs.org/journal/acsodf Article HBD of more than 5, while all of them follow Lipinski's cut-off for HBA. PSA shows Gaussian distribution with a peak at 70−80 Å. Only 3% of the synthetic compounds have PSA of over 140 Å, while this percentage is higher in TB drugs with about 13% violation. The distribution of ROTBs is widely variable in synthetic compounds and TB drugs. The percentages of synthetic compounds and TB drugs which have more than 10 ROTBs are 5 and 8%, respectively. The histograms of PSA and ROTB for most active compounds have also the same distribution as the related histograms of all synthetic compounds. The violations from Lipinski's rule in both cases are less than 5% in both PSA and ROTB.
Comparison of Anti-mycobacterial Synthetic Compounds Space Versus Current TB Drugs. We previously mentioned that antibacterial compounds have been reported to have higher average MWs and polarity in comparison to nonantibacterial compounds. 52 The largest difference in our analysis was between the clog P values with many TB drugs having lower clog P values. Herein, we investigated a putative TB space with MW ≤ 700 and −4 ≤ clog P ≤ 3. Compounds with clog P's higher than 5 are often problematic from a safety perspective, and Figure 2b shows that the majority of TB drugs have a clog P ≤ 3. Scatterplots of TB drugs, the synthetic compounds in each publication and in total, and the most active compound(s) in each publication using MW and clog P as two variables were analyzed (Figure 3).
There are 27 TB drugs in the Lipinski space and 23 in the putative TB space. Out of 1271 synthetic compounds, 719 are located in the Lipinski space and 266 are observed in the TB space. A large number of synthetic compounds possess large MWs and clog P. Out of 68 most active compounds in the analyzed publications, 56 compounds are compliant with Ro5 with none or just one violation. Half of the most active compounds (34 compounds) are located in the Lipinski space and 12 are observed in the TB space. While they are identified as the most active compounds in the respective series, it does not mean that they are active enough to be further evaluated or considered as the potent anti-TB compounds.
Scatterplots of some synthetic compounds in each publication using MW and clog P as two variables are depicted in Figure 4. Remaining scatterplots are accessible in the Figure S2.
Different patterns have been observed in the analyzed scatterplots. In some publications such as publications 1, 7,13,17,18,21,25,28,29,30,38, and publication 5 shown in Figure  4a, the synthetic compounds possess very similar MW and clog P values making clusters. The clusters are completely inside, partially inside, or completely outside the Lipinski or TB spaces in different publications. For example, publication 28 reported synthesis of anti-TB compounds making a cluster located in both Lipinski and putative TB spaces (Figure 4b).
Since the scatterplots of the compounds using MW and clog P provided a great concept of the physiochemical properties, we decided to evaluate the scatterplots of TB drugs and the synthetic compounds using clog P and PSA as they are the two physiochemical properties in which the most differences between TB drugs and the analyzed synthetic compounds are observed. Figure 5a shows the scatterplots of TB drugs using clog P and PSA as two variables. A QED space with −5 ≤ clog P ≤ 5 and PSA ≤ 140 Å and a putative TB space with −4 ≤ clog P ≤ 3 and 30 ≤ PSA ≤ 140 Å were selected for further evaluation. The putative TB space is selected based on the appropriate ranges for clog P and PSA properties and the cluster of TB drugs observed in this region. About 70% of TB drugs are located in a QED space, and 56% (22 drugs) are in the putative TB space. Five TB drugs which have PSA of more than 140 Å are all naturederived TB drugs: streptomycin 5, amikacin 6, rifampicin 2, clarithromycin 19, and spectinamide 1810 24. Three marketed TB drugs (clofazimine 15, bedaquiline 16, and delamanid 17) and four TB drugs in clinical trials (TBAJ-587 25, TBAJ-876 26, TBI166 31, and telacebec Q203 39) are outside and above both QED and the putative TB spaces due to higher clog P values than 5. Pretomanid 20, BTZ043 27, macozinone 36, and SQ109 38 are four TB drugs located outside and above the putative TB space but still in the QED space, with 3 ≤ clog P ≤ 5.
Meropenem 18 is the only TB drug observed outside and below the putative TB space but still is in the QED space, with clog P of less than −4.
The scatterplots of synthetic compounds showing in Figure  5b reveal that 59% of synthetic compounds are in the QED space, while this percentage is only 18% in the putative TB space. The majority of violated compounds outside the QED space have higher clog P values, while only 3% of synthetic compounds show PSA of more than 140 Å. Figure 5c shows that about half of the most active synthetic compounds (51%) are in the QED space and only 13% are located in the putative TB space. Similar to the scatterplots for all synthetic compounds, the majority of violated active compounds outside the QED space have higher clog P values, and only one compound is out of the QED space due to a higher PSA value.
Scatterplots of the selected publications using clog P and PSA as two variables are also depicted in Figure 6. The clusters observed in publications 5 and 28 in Figure 4 are replaced by a spread in the PSA dimension in those publications in Figure 6. This pattern is also observed in almost half of the analyzed publications due to synthesis of the series of similar analogues ( Figure S3). The same clog P variation is also detected in publications 16 and 19 similar to Figure 4 with a further resolution in the PSA dimension. Similar to scatterplots of compounds using MW and clog P, the remaining publications present a more diverse pattern or no specific pattern. In publication 4, compound 19 and one more compound have −4 ≤ clog P ≤ 3; however, they have very low PSA values of less than 30 Å. All reported synthetic compounds in publications 18, 29, and 30 are located in the QED space; however, no compound is observed in either QED or the putative TB space in Figure 6. Scatterplots of the selected publications using clog P and PSA as two variables. Green rectangle determines the QED space and pink rectangle represents our putative TB space.

ACS Omega
http://pubs.acs.org/journal/acsodf Article publications 9, 22, and 27. In some publications such as publications 5 and 19, some compounds are in the QED space; however, no compound is detected in the putative TB space.
TB drugs categorized according to their physiochemical properties are shown in Table 2. Note that some drugs are repeated in different categories. The first 22 TB drugs are located in the putative TB space. Meropenem 18, streptomycin 5, and amikacin 6 are three drugs which are administered intravenously with clog P < −4. Four drugs have 3 < clog P < 5, and seven drugs have clog P > 5 including bedaquiline 16 and its derivatives, TBAJ-587 25 and TBAJ-876 26. Streptomycin 5, amikacin 6, rifampicin 2, clarithromycin 19, and spectinamide 1810 24 are five drugs with PSA > 140 Å.
The analyzed synthetic compounds in the reported publications were evaluated by in vitro and in vivo assays, in vitro assays against possible targets, and in silico studies ( Figure  7). All publications used in vitro assays against M. tuberculosis, except one publication which did not report any evaluation of the synthetic compounds. Only three out of 40 publications performed in vitro assay against the possible target/enzyme. Molecular docking of all or selected synthetic compounds in a series against the possible target/enzyme was reported in 10 publications. Only 7 publications evaluated the most active compound in the series using in vivo assays.  In this analysis, we evaluate 1271 synthetic anti-tubercular compounds using their physiochemical properties. In our analysis, we revealed that log P is a critical property which showed the most difference between TB drugs and the synthetic compounds. A huge shift to larger values for clog P was observed among the synthetic compounds compared to TB drugs ( Figure  2b). Selection of the second most apparent variation (Figure 2e) led to an analysis of clog P against PSA. Combining the 3 parameters led us to propose clog P, MW, and PSA as the three important properties arising from this analysis. Clog P−MW (Figures 3 and 4) and clog P−PSA (Figures 5 and 6) provide the largest discrimination. Also, a new TB space with more ACS Omega http://pubs.acs.org/journal/acsodf Article appropriate values of MW ≤ 500, −4 ≤ clog P ≤ 3 and 30 ≤ PSA ≤ 140 Å is proposed. For example, bedaquiline 16, a second TB drug, is highly lipophilic and has a cardiac liability (prolongation of the QT interval) due to its potent inhibition of the cardiac potassium channel protein hERG. Therefore, synthesis of bedaquiline 16 analogues TBAJ-587 25 and TBAJ-876 26 was reported with lower lipophilicity, higher clearance, and lower risk for QT prolongation. 55 The clog P value for bedaquiline 16 is 7.13, and a significant decrease is observed in clog P values of its derivatives TBAJ-587 25 and TBAJ-876 26 with 5.78 and 5.15 respectively; however, they may be further improved by modification to log P while retaining the PSA values.
The proposed TB space may be a useful and reliable guide to design new anti-mycobacterium compounds. Evaluation of TB drugs showed that about 56% of TB drugs are in the proposed TB space (Figure 8). Out of 39 TB drugs, 21 are marketed drugs and 18 are in clinical trials. Almost the same percentage of marketed TB drugs and those in clinical trials are observed in the proposed TB space that is 57 and 56%, respectively. This number is significantly smaller in the anti-TB synthetic compounds. Only 18% of all synthetic compounds and 13% of the most active synthetic compounds are located in the proposed TB space.
The analysis of the evaluation methods reported in the selected publications also reveals that there is a lack of identification of the molecular target. Only a low percentage of the publications report targets for synthetic anti-mycobacterium compounds. This issue may be caused by some difficulties in identification of mode of action.
In conclusion, we have identified an area of the physicochemical space that is relatively underexplored in efforts to develop new TB drugs. Complete contact information is available at: https://pubs.acs.org/10.1021/acsomega.0c03177

Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Notes
The authors declare no competing financial interest.