## Trial Design Types

Multistage Designs

Because it is ethically undesirable to expose patients unnecessarily to an ineffective agent, Phase II studies are usually designed to allow early stopping in the event that the drug appears inactive. This idea goes back to Gehan,27 and one of the most popular designs incorporating early stopping is Simon's optimal two-stage design.28 For a specified p0, pA, a, and b, this design calls for enrollment of n1 patients in the first stage. If r1 or fewer responses are observed, the trial is terminated due to lack of activity. Otherwise, an additional n2 patients are entered for a total n = n1 + n2. Then, if r or fewer total responders among the n patients are observed, the regimen is rejected, whereas if more than r responses are observed, the regimen is declared sufficiently active to warrant further study. The optimal design is the one that minimizes the expected sample size under the null hypothesis. For example, if p0 = 0.10, pA = 0.25 and a = b = 0.10 (90% power), then 21 patients would be enrolled in the first stage and if 2 or fewer responses were observed, the trial would be terminated. Otherwise, an additional 29 patients would be accrued for a total of 50, and if 7 or fewer responses were observed, the drug would be rejected, whereas 8 or more responses (an observed response rate of 16% or more) would be sufficient to deem it worthy of further investigation. The probability of stopping at the first stage if, in fact, the true response rate is only 10%, is fairly high, 0.65; hence, the attraction of the two-stage design.

Simon also provides tables for a minimax design, which is that having the smallest total sample size that satisfies the design constraints, although it often requires a relatively large sample size in the first stage. Jung et al.29 note that there are typically many designs that satisfy the design constraints, and present graphical software that allows one to easily search for a trial design that is a good compromise between different options. Green et al.,2 on the other hand, prefer a more flexible two-stage approach in which an approximately equal number of patients are recruited in each stage and the power is set to 90%. In a typical example, if, after the first stage the alternative hypothesis (e.g., that a response of some specified size exists) is rejected at significance level a = 0.02, the trial is discontinued for lack of activity. Otherwise, the study proceeds through the second stage where the null hypothesis is tested at a = 0.055. Roughly, this design will stop early if the observed response rate after the first stage is less than p0 and will declare the agent sufficiently active to warrant further study if, after the second stage, the observed response rate exceeds (p0 + pA)/2. Three-stage designs have also been proposed.30,31 The logistic difficulties of suspending recruitment between stages, as well as the increase in study duration that this entails, lead most investigators to opt for two-stage designs. However, in situations in which it is highly desirable to minimize the number of patients exposed to an inactive therapy, a three-stage design may be attractive. Some have argued that complete and partial responses should not necessarily be combined, because complete responses are far rarer and are much more likely to confer a survival advantage.32,33 They propose two-stage designs that distinguish complete from partial responses, giving more weight to the former.

As mentioned previously, a secondary objective of Phase II trials is the collection and reporting of toxicity data. Usually, toxicity analyses are carried out separately from the analysis of response rates, but designs have been proposed that incorporate toxicity and response simultaneously. Conaway and Petroni34 and Bryant and Day,35 for example, consider trials to establish whether a new drug is "sufficiently promising" in the sense that it has both a response rate that is greater and a toxicity rate that is no worse than standard treatment. If we let pR denote the true response rate of the new treatment and pT the true rate of DLT, the null and alternative hypotheses can be written:

H0:pR £ pRi or pt > pt0 Ha : pr > pR0 and pt £ pt0

where pR0 and pT0 are the response and toxicity rates associated with standard therapy. Thus, the null hypothesis is rejected only if the response rate is sufficiently high and the toxicity rate is not unacceptably high. One must model the association between response and toxicity by introducing another parameter, 0, corresponding to the odds ratio for toxicity among responders relative to nonresponders. Fortunately, however, the design characteristics are fairly insensitive to the assumed value for 0. Finally, as one might be willing to accept greater toxicity with higher response rates and vice versa, Conaway and Petroni36 propose a related design that incorporates such trade-offs.

### Bayesian Trial Designs

The previously mentioned designs are frequentist in nature, in that power and significance probabilities refer to the probability of events under given hypotheses about the parameters of interest. A Bayesian approach offers an alternative inferential framework, for which proponents argue is particularly suited to situations involving accumulating data.37 For example, Thall and Simon38 present a Bayesian design for Phase II trials in which a "moderately informative" prior distribution is assigned to pS, the response rate associated with standard therapy. A flat or "weakly informative" prior is assigned to pE, the response rate for the experimental treat ment, to reflect the limited knowledge available for pE before the study is begun. A maximum sample size for the trial, nmax, is specified, patients are enrolled, and the trial is continued until the new drug is shown with high posterior probability to be either promising or not promising or until nmax is reached, in which case the study is deemed inconclusive. Thus, if Xn denotes the number of responders observed among the first n patients enrolled, n = 1,2, ... nmax, the posterior probability that pE exceeds pS by some minimally interesting amount, 8, is computed. If this probability is very high (say, greater than 0.95) or very low (say, less than 0.05), the trial is terminated and the drug is declared promising or not promising, respectively. Otherwise, the study is continued provided nmax has not been reached.

Thall and Simon38 evaluate the frequentist operating characteristics of this design under continuous monitoring and a maximum sample size of nmax = 65. They also suggest setting a minimum sample size, nmin, of 10 patients so that, in effect, monitoring does not begin until the 10th patient is enrolled. Results show, for example, that under a fairly informative prior for pS centered, say, at 0.20, a weak prior for pE, and a minimally interesting difference of 8 = 0.15, that if pE = pS there is a 7.1% chance that the outcome would erroneously lead to a conclusion that the experimental drug is promising, an 83.5% chance that it will correctly be declared non-promising, and a 9.4% chance that the results will be inclusive. If the true effect is positive (pE = 0.40), there is an 87.5% chance that the drug will be declared promising, a 7.7% chance that it will be declared nonpromising, and a 4.8% chance that the trial would be inconclusive. Subsequent work describes a Bayesian sequential design for more complicated situations involving multiple outcomes, such as response and toxicity.39

Another proponent of the Bayesian approach is Heitjan,40 who points out that in multistage, frequentist designs, the evidence required for terminating the trial is not the same at all analysis times, and that a drug can be rejected as inactive even though there is no strong evidence that the response rate is any less than that of the standard. He describes a Bayesian approach designed either to convince a skeptic that the drug is beneficial or to convince an enthusiast that it is not. This is accomplished by using different prior probabilities corresponding to these two states of belief and, after outcomes have been observed, calculating the posterior probability that

(a) the new drug is better than the standard given the skeptic's prior (the "persuade-the-pessimist probability") and (b) the standard is better than the new drug given the enthusiast's prior (the "persuade-the-optimist probability"). Thus, this method requires that the evidence be sufficient "to choose between hypotheses [favorable or unfavorable] to the satisfaction of all interested parties" and, if not, the results are regarded as inconclusive.40

### Randomized Phase II Trial

When there are multiple candidate agents to consider advancing to further development, randomized Phase II trials, sometimes called selection designs, provide a means to select agents for further study.41-43 These trials, which allocate patients to different treatments under consideration by random assignment and compare outcomes between groups, have several advantages with respect to patient selection and other biases present in studies without parallel comparison groups (discussed in detail in the next section). However, as a means to determine efficacy in any absolute sense, these trials lack statistical power and a error control at the sample sizes typically envisaged for Phase II trials. Nonetheless, the initial intended use of randomized Phase II trials as "selection designs" has been broadened to encompass small-scale randomized trials with a standard therapy comparison group. Although this approach has some advantages, positive results emerging from these trials cannot be deemed sufficiently conclusive as to preclude Phase III investigation (see Liu in Reference 1 for discussion). Table 8.3 illustrates advantages and disadvantages of one-arm and randomized comparative Phase II trials.

### Recent Design Concepts

For cytostatic agents, where frank tumor shrinkage is not anticipated, there may be a need for alternative Phase II designs based on endpoints other than response rates. For trials enrolling patients who have failed prior therapy, Mick et al.44 propose a method that uses each patient as his/her own control, comparing the time to progression under the new agent with the time to progression under prior therapy. Rosner et al.45 propose a randomized discontinuation design, in which all patients are initially treated with the experimental agent. After a specified interval, responders remain on

TABLE 8.3. Advantages and disadvantages of single-arm versus randomized Phase II trials.

Single-arm trial

Pilot randomized trial

Advantages • Maximum adverse event information for new agent

• Can offer new agent to all participants

• Simple endpoint that is rapidly ascertained

Disadvantages • Historical control group response rate must be used

• Tumor response endpoint may be poor surrogate for survival extension

• Time to event endpoints may be difficult to define and do not fit into multistage framework

Concurrent control group

Randomization provides for rigorous ancillary studies of tumor response markers

Can use time to event endpoints more readily in this comparative setting

Low power, high a for feasible sample sizes Necessity to randomize patients in terminal disease situation Quantity of adverse event information for experimental agent is reduced

Positive findings may interfere with conduct of appropriately powered Phase III trial

Here, we are considering a randomized phase II trial as a relatively small (100 patients or fewer) study intended to serve as a pilot trial for potential efficacy. Note that the original proposal for use of randomized phase II trials was for selection of potentially superior candidates from among multiple new agents (see Simon et al.41 and Liu et al.42), and not to compare new therapies to existing standards. More recently, Bayesian designs for phase II randomized selection design trials have been proposed (see Esty and Thall43).

treatment and those who progress discontinue, while those patients with stable disease are randomized to either continued treatment with the drug or placebo. The idea behind this design is that the randomized comparison allows one to assess whether the drug is truly slowing the rate of growth of the tumor, as opposed to having simply selected patients for study with slow-growing tumors. Because the patients with stable disease form a more homogeneous subgroup, this design also generally requires a smaller sample size than would a trial that randomized all patients at entry. It is important to bear in mind—and the authors also emphasize this point—that the purpose of this design is mainly to determine whether the drug is active in an explanatory sense. Obviously, it matters a great deal whether the initial percentage of patients exhibiting stable disease is high or low, as in the latter case the total sample size required may be quite large and a demonstration of activity in the randomized component would mean only that there is benefit in a small subset of the population. Korn et al.46 point out other caveats with this design. For example, patients may find it unattractive to potentially discontinue a treatment that appears to be working. They describe a number of other approaches, including single-arm trials with time to progression as an endpoint and trials with appropriately validated biologic response markers as surrogates for tumor response.

## A Disquistion On The Evils Of Using Tobacco

Among the evils which a vitiated appetite has fastened upon mankind, those that arise from the use of Tobacco hold a prominent place, and call loudly for reform. We pity the poor Chinese, who stupifies body and mind with opium, and the wretched Hindoo, who is under a similar slavery to his favorite plant, the Betel but we present the humiliating spectacle of an enlightened and christian nation, wasting annually more than twenty-five millions of dollars, and destroying the health and the lives of thousands, by a practice not at all less degrading than that of the Chinese or Hindoo.

## Post a comment