F f f ec

Figure 2. Statistically significant biophores derived from the thiophene dataset highlighted within compounds from the TS in which they are contained. Numbers above the molecules are the CAS identifiers. Atoms represented explicitly in bold are part of the biophore.

Polycyclic Aromatic Compounds

The polycyclic aromatic compounds used in this study were representative of many different classes of polycyclic compounds. Using the bagging approach employed previously in analyzing this data set (He, Jurs et al. 2003), three model triplets were generated, each containing an equivalent distribution of active and inactive compounds. The biophores and statistics for the consensus model are presented in Table 7 and are depicted in Figure 3. Of the nine significant fragments, four were found in more than one model and all were based upon at least three molecules in the TS (with a median of 4.7 TS molecules/fragment). The positive accuracy of the MultiCASE derived biophores ranged from a high of 80% (biophore cH =cH -c."-NH -c = contained in molecule 312-73-2 as depicted in Figure 3) to a low of 0% (biophore cH =n -c = contained in molecule 153284-9 as depicted in Figure 3). Unlike the thiophene model above, this consensus model had high specificity (87%), but low sensitivity (28%), failing to identify 33 of the 46 genotoxic compounds in the data set, giving an overall concordance of 77%.

Interestingly, most of the biophores identified by this approach contain a nitrogen atom (see Table7 and Figure 3), either as part of the heterocycle or as a primary amine. Amines can undergo P450 (1A2) activation to form the reactive nitrenium species to yield the ultimate N-hydroxylated form (Colvin, Hatch et al.

KLN Fragment

Average Frequency in TS

Average Activity

Models Deriving Fragment

Frequency in PS

Positive Accuracy

NH -c =c -

2.8

35 CU

6

32

69%

NH2-c =cH -c. =

6.3

42 CU

4

26

73%

cH =cH —n =c. -cH =

4.0

27 CU

2

17

24%

n =n -

4.5

33 CU

2

12

50%

cH =c. -c <=cH -

7.0

39 CU

1

10

30%

cH =cH -c-NH -c =

6.0

28 CU

1

5

80%

cH =n -c =

5.0

39 CU

1

5

0%

N -c. =cH -cH =cH -

4.0

30 CU

1

5

60%

cH =cH -c =c. -cH = <3-OH >

3.0

30 CU

1

10

50%

Table 7. Analysis of Statistically Significant Polycyclic Aromatic Fragments (nine models). CU is the CASE Unit of activity.

Table 7. Analysis of Statistically Significant Polycyclic Aromatic Fragments (nine models). CU is the CASE Unit of activity.

1998). Polycyclic aromatic amines generally exhibit greater mutagenic potency than other aromatic amines provided that the polycyclic system does not exceed six or more rings in size (Trieff, Biagi et al. 1989; Benigni, Passerini et al. 1998). The extended aromatic system offers a greater degree of stabilization to the nitrenium ion and may allow for a more efficient activation due to the planarity of the structure (Lai, Woo et al. 1996; Benigni, Passerini et al. 1998). The aromatic and secondary amines are considered in the following section at length.

Aromatic and Secondary Amines

The aromatic and secondary amines were chosen to be fairly diverse. Using the data subsetting approach employed to develop the ADAPT QSAR models for this in data set (Mattioni, Kauffman et al. 2003), nine individual models were generated, each containing an equivalent distribution of active and inactive compounds. The biophores and statistics for the consensus model are presented in Table 8 and are depicted in Figure 4. Of the nine fragments found to be statistically significant, six were derived by more than one model and all were based upon, on average 9.5 TS molecules/fragment, with a minimum of 2.8 TS molecules/fragment. The positive accuracy of these biophores ranged from a high of 80% (biophore cH =cH -c. =c <-cH = represented by molecule 118-44-5 in Figure 4) to 0% (biophore cH =cH -c <=cH -c <= represented by molecule 10256-7 in Figure 4). The MultiCASE consensus model had the second highest specificity (72%) and concordance (69%) of all of the models reported for this data

32566-01-1 H2N

32566-01-1 H2N

312-73-2

Figure 3. Statistically significant biophores derived from the polycyclic aromatic compounds data set highlighted within compounds from the TS in which they are contained. Numbers above the molecules are the CAS identifiers. Atoms represented explicitly in bold are part of the biophore.

set. The low positive accuracy of the fragments is reflected in the overall sensitivity of the consensus model (58%).

The biophores generated by this fragment based approach are the only ones from three models which show a correlation between overall positive accuracy and activity of the underlying molecules upon which the fragments are based. The Pearson correlation coefficient is 0.79, indicating a statistically significant correlation between these two quantities. In the polycyclic aromatic compounds the overall positive accuracy and activity correlation coefficient is negative and in thiophenes, there is a correlation coefficient of 0.45, indicating an uncorrelated system. The correlation is a product of several features of this data set. The overall population of this set is larger than the other datasets (334 compounds; polycyclic aromatic compounds: 277 compounds; thiophenes: 140 compounds), with a correspondingly larger number of active compounds. This leads to several benefits in model creation. First, as the pool of TS molecules is larger, the number of two to ten atom fragments generated and associated modulating fragments is larger, allowing for a more robust fragment selection process. The

KLN Fragment

Average Frequency in TS

Average Activity

Models Deriving Fragment

Frequency in PS

Positive Accuracy

NH2-c =cH -c. =

12.0

42 CU

6

28

57%

NH2-c =cH -cH =cH -

29.4

31 CU

5

66

42%

S -c < =

8.8

29 CU

4

25

28%

N" -N -

2.8

37 CU

4

5

40%

cH =cH -c. =c <-cH =

6.5

48 CU

2

5

80%

NH2-c =cH -cH =c <-

4.5

30 CU

2

7

14%

cH =c -cH =c. -<2-NH2>

8.0

47 CU

1

9

44%

cH =cH -cH =c -cH =c -

7.0

33 CU

1

8

25%

cH =cH -c <=cH -c < =

7.0

31 CU

1

6

0%

Table 8. Analysis of Statistically Significant Aromatic Amine Fragments (nine models). CU is the CASE Unit of activity.

Table 8. Analysis of Statistically Significant Aromatic Amine Fragments (nine models). CU is the CASE Unit of activity.

mean activity of the significant fragments generated is approximately ten percent higher than that of the other data sets, with more fragments having activities in the moderate range (forty to sixty CASE units).

Discussion

The evaluation of independent vendor solutions for prediction of the genotoxic potential of drug-like compounds is presented here. The availability of genotoxicity data from a single lab on several series of congeneric compounds which had been previously assessed by DEREK (a well curated expert system), ADAPT (one of the more advanced neural network based QSAR tool) and MultiCASE (a widely used hybrid QSAR-expert system based modeling tool) made these attractive starting points. We utilized the fragment based approach as implemented in the MultiCASE algorithm to develop a library of genotoxic substructures to analyze these three congeneric series of compounds: thiophenes, polycyclic aromatic compounds and a set of secondary and aromatic amines.

The substructural fragments generated for these data sets were found to vary widely in their predictive ability. For compounds with PS coverage greater than five compounds, the positive accuracy ranged from a high of 80% to a low of 0%). Except for fragments generated from the secondary and aromatic amines, the predictive ability was not correlated with either representation in the PS, the number of training set compounds giving rise to a specific fragment, or the mean activity of the fragment. In the one case with correlation, the number of training

,C—NH2

99-07-0 HO

99-07-0 HO

102-56-7 y

HC"C

NhHj

Figure 4. Statistically significant biophores derived from the secondary and aromatic amines dataset highlighted within compounds from the TS in which they are contained. Numbers above the molecules are the CAS identifiers. Atoms represented explicitly in bold are part of the biophore.

Figure 4. Statistically significant biophores derived from the secondary and aromatic amines dataset highlighted within compounds from the TS in which they are contained. Numbers above the molecules are the CAS identifiers. Atoms represented explicitly in bold are part of the biophore.

set compounds per fragment was found to have significant correlation with activity (correlation coefficient of 0.79).

Developing a QSAR using solely substructural fragments (as is the procedure with MultiCASE) was never as predictive as adding additional QSAR factors, as is evidenced by the difference in predictive ability between ADAPT and MultiCASE as ADAPT uses not only path-based substructural fragments but a variety of additional descriptors. The authors would like to provide a cautionary note to researchers utilizing the power of ensemble based modeling methods (such as many of the ADAPT models presented here, machine-learning algorithms, neural networks and the Random Forest algorithm) in that it is very easy to over-train a model, making it no more than a mapping of the training set. There are over sixty descriptors for the set of 334 compounds in the ADAPT ensemble models for the Secondary and Aromatic Amines. A much simpler model, with nine substructural-fragments as descriptors, developed using MultiCASE has nearly identical predictive ability.

Explore Validation & Related Data

Input Structure(s)

Figure 5. Screenshots of the Toxicology Assessment web portal which has been implemented on the corporate intranet at Bristol-Myers Squibb. It allows the user to import SD formatted files, draw novel structures or reference samples in the corporate database. Once input compounds are run through a battery of safety assessment models and a report is generated. All possibly toxic substructures are hyper-linked to all institutional information regarding this activity.

Drill-down on Assessments

Explore Validation & Related Data

Input Structure(s)

Figure 5. Screenshots of the Toxicology Assessment web portal which has been implemented on the corporate intranet at Bristol-Myers Squibb. It allows the user to import SD formatted files, draw novel structures or reference samples in the corporate database. Once input compounds are run through a battery of safety assessment models and a report is generated. All possibly toxic substructures are hyper-linked to all institutional information regarding this activity.

The system described here was implemented in an internal toxicology assessment web portal at Bristol-Myers Squibb (see Figure 5). This system allows any authorized user on a discovery team or in safety assessment to analyze both synthesized and virtual compounds at any phase in the development process, allowing them to prioritize testing in a second tier genotoxicity assay (the SOS Chromotest) or to provide supporting information for other genotoxicity assays (i.e. in-vitro micronucleus or bacterial-reverse-mutation assay). Other toxicological endpoints such as hepatotoxicity, non-genotoxic and genotoxic carcinogenicity, environmental toxicity (respiratory, skin and eye irritation and sensitization and lachrymation) can also be added. The ability to query IVS models (such as the standard MultiCASE carcinogenicity and mutagenicity models) as well as internally developed models has been added to the web tool. In addition, this system allows the user to generate reports based on the assessment and provides all supporting documentation, including minireviews on each of the mutagenic endpoints with model statistics and literature references at the users' desktops. While no toxicology model will be as accurate as an in vitro model system, if developed with a clear goal in mind and used within its scope, models can be used to intelligently triage compounds from the testing queue (thereby increasing the overall capacity of the safety assessment assays). At Bristol-Myers Squibb, using this multi-tiered approach, we have virtually eliminated the mutagenic liability as a cause of attrition during the early development phase.

References

Agresti, A. (1996). An Introduction to Categorical Data Analysis. New York, Chichester, Brisbane, Toronto and Singapore, John Wiley & Sons, Inc.: 246-249.

Benigni, R., L. Passerini, et al. (1998). "QSAR models for discriminating between mutagenic and nonmutagenic aromatic and heteroaromatic amines." Environ Mol Mutagen 32(1): 75-83.

Colvin, M. E., F. T. Hatch, et al. (1998). "Chemical and biological factors affecting mutagen potency." Mutat Res 400(1-2): 479-92.

Corey, E. J., A. K. Long, et al. (1985). "Computer-assisted analysis in organic synthesis." Science 228(4698): 408-18.

Greene, N. (2002). "Computer systems for the prediction of toxicity: an update."

Greene, N., P. N. Judson, et al. (1999). "Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR." SAR QSAR

He, L., P. C. Jurs, et al. (2003). "Predicting the genotoxicity of polycyclic aromatic compounds from molecular structure with different classifiers." Chem Res Toxicol 16(12): 1567-80.

Hofnung, M. and P. Quillardet (1988). "The SOS Chromotest, a colorimetric assay based on the primary cellular responses to genotoxic agents." Ann N Y Acad Sci 534: 817-25.

Judson, P. N., C. A. Marchant, et al. (2003). "Using argumentation for absolute reasoning about the potential toxicity of chemicals." J Chem Inf Comput Sci 43(5): 1364-70.

Jurs, P. C., M. N. Hasan, et al. (1983). "Computer-assisted studies of molecular structure and carcinogenic activity." Fundam Appl Toxicol 3(5): 343-9.

Kier, L. B. and L. H. Hall (1990). "An electrotopological-state index for atoms in molecules." Pharm Res 7(8): 801-7.

Klopman, G. and O. T. Macina (1985). "Use of the Computer Automated Structure Evaluation program in determining quantitative structure-activity relationships within hallucinogenic phenylalkylamines." J Theor Biol 113(4): 637-48.

Lai, D. Y., Y.-t. Woo, et al. (1996). "Carcinogenic Potential of Organic Peroxides: Prediction Based on Structure-Activity Relationships (SAR) and Mechanism-Based Short Term Tests." Journal of Environmental Science and Health, Part C— Environmental Carcinogenesis & Ecotoxicology Reviews.

Lipinski, C. A., F. Lombardo, et al. (2001). "Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings." Adv Drug Deliv Rev 46(1-3): 3-26.

Mattioni, B. E., G. W. Kauffman, et al. (2003). "Predicting the genotoxicity of secondary and aromatic amines using data subsetting to generate a model ensemble." J Chem Inf Comput Sci 43(3): 949-63.

Mosier, P. D., P. C. Jurs, et al. (2003). "Predicting the genotoxicity of thiophene derivatives from molecular structure." Chem Res Toxicol 16(6): 721-32.

Pearl, G. M., S. Livingston-Carr, et al. (2001). "Integration of computational analysis as a sentinel tool in toxicological assessments." Curr Top Med Chem 1(4): 247-55.

Quillardet, P. and M. Hofnung (1993). "The SOS chromotest: a review." Mutat Res 297(3): 235-79.

Snyder, R. D., G. S. Pearl, et al. (2004). "Assessment of the sensitivity of the computational programs DEREK, TOPKAT, and MCASE in the prediction of the genotoxicity of pharmaceutical molecules." Environ Mol Mutagen 43(3): 14358.

Sutton, M. D., B. T. Smith, et al. (2000). "The SOS response: recent insights into umuDC-dependent mutagenesis and DNA damage tolerance." Annu Rev Genet 34: 479-497.

Trieff, N. M., G. L. Biagi, et al. (1989). "Aromatic amines and acetamides in Salmonella typhimurium TA98 and TA100: a quantitative structure-activity relation study." Mol Toxicol 2(1): 53-65.

0 0

Post a comment