## Analysis

Definitive analysis commences when either the requisite number of events indicated in the trial design has been observed, or the DSMC has deemed that the trial results should be disclosed due to early stopping conditions being met.

An important aspect of Phase III trial analysis is the definition of the analysis cohort. The concept of analysis by intention-to-treat is often discussed, but the definition of this term can sometimes be unclear, so it is best to explicitly describe which patients are included.62 In the strictest sense, the intention-to-treat cohort includes all patients randomized, irrespective of eligibility, acceptance of and adherence to assigned treatment, and any and all other postrandomiza-tion deviations from protocol. However, it is often the case that patients found ineligible for the trial after randomization due to having been incorrectly staged or for other reasons are excluded from the primary analysis, and this practice (used with caution) is sometimes advocated, as it allows for evaluation of the therapy in the population for which it was intended.2 A more controversial and rarely acceptable form of patient exclusion involves removal of patients who did not or could not comply with assigned therapy regimens or received nonprotocol therapy. Such exclusions can easily lead to biased comparisons, and in general, any post hoc analysis of treatment benefit in relation to dose received is fraught with interpretational difficulties and should be avoided in primary analysis.63

For time-to-event data, the principal summaries are the survival distribution (or survival curve) and the estimated hazard ratio. The survival curve S(t) represents the probability of remaining event free until at least time t, or S(t) = Pr(T > t). This function is plotted from randomization (e.g., at t = 0, S(t) = 1.0) to some follow-up time that a remaining fraction of patients has reached. This points to an important feature of time-to-event data, that is, censoring, where it is known that a patient is event free as of some follow-up time, but the (potential) failure time is not yet observed. Censoring is a natural consequence of staggered enrollment into the trial, so that at any given time, some patients have less follow-up than others (administrative censoring), but also may occur because patients may withdraw from or be lost to follow-up before experiencing the event of interest. Administrative censoring can be reasonably assumed to be independent of failure risk except in cases where characteristics of participants enrolled changes over time (which is why well-defined entry criteria are necessary). Any censoring associated with propensity for failure (e.g., sicker patients more often withdrawing) that results in different rates of loss per treatment arm can bias treatment comparisons.

In the case in which censoring is assumed independent of probability of failure, then estimating S(t) using available information per patient, including follow-up to censoring, is straightforward. For a set of ordered observed times where one or more patients had an event t1 < t2 < t3 < ... < tj, define d, as the number of events at time t, and Y, as the number of patients available to possibly fail (all those who have not yet failed and were not censored before t,; by convention, those censored exactly at t, are considered at risk to fail). The Kaplan-Meier (KM) estimator64 is the product of the quantities (1 - dj/Yj) over the J failure times

Although the KM curve is the typical graphical summary, the relative hazard of failure between groups is the principal measure of efficacy. The HR (and associated statistical tests) pertains to the entire span of follow-up, as opposed to a test of difference in the survival curves at a specific time point, in which case the result would depend on which time was chosen. The log-rank test, which frequently accompanies the KM curve, in fact compares underlying failure probabilities between groups over the J failure times.65 Issues related to this and other tests for comparing survival time distributions are discussed here.

The HR can be estimated by computing the incidence density or average failure rate in each treatment group. For two treatment arms with nA and nB patients, respectively, the average failure rate for treatment arm A is IA = DA/TA, where Ta = SiTi, the sum of times to event or censoring for each patient, and DA is the total number of events in arm A. For IB similarly computed, HR = IA/IB. More commonly, the HR is estimated via the Cox proportional hazards model,66 which relates the hazard of failure to covariates through the equation

1(t,x) = I0(t) -exp(/31 • X1 + /2 • x2 + ... bp • Xp) (4)

where 10(t) is an unspecified "baseline hazard" and the x's represent covariates, which may include indicators for treatment group and other factors. For example, for a single covariate, x1, representing treatment group with x1 = 0 for the standard

treatment and x1 = 1 for the new treatment, 10(t) exp(Pi)/1o(i) = exp(p1) equals the HR. From this model, a significance test for HR = 1.0 and confidence interval for the HR are obtained. With additional prognostic factors included in the model, exp(b1) gives the HR adjusted or controlling for these factors. Prognostic factor analysis using this model or other techniques often follows primary analysis of Phase III trials. The modeling process, which entails deciding which factors to include, determining the correct way to represent a given factor (i.e., in categories, on a continuous scale, and so forth), consideration of interrelationships (e.g., interactions) among factors, and many other issues, can be complex, as can using model results for prediction of individual patient outcomes or classification into prognostic risk classes. A comprehensive review of current modeling methods applied to oncology data is provided by Schumacher et al. in Reference 1.

One important issue concerning the HR and tests used to compare hazards pertains to how failure events occur over time in the groups being compared. The proportional hazards condition, whereby the HR is constant over time, is implicit in the previously described model (hence the name). Under this condition, the quantity loge (SA(t))/loge(SB(t)), where SA and SB are the KM estimates at time t in the two treatment groups, will be approximately the same at different time points on the survival curves. Note, however, that under this condition the absolute difference in proportions event free from the KM curve will not be constant, but in fact the curves will diverge over time. The log-rank test65 gives equal weight to failure events across the time span and is the optimal test under proportional hazards. For failure patterns that deviate from proportional hazards, there are a number of alternatives to the log-rank test that are more sensitive to differences between survival curves. The Wilcoxon67 test places more weight on failures occurring early, and so is more sensitive to the case where survival curves separate early but may later converge. Other tests also tend to weight earlier failures,68,69 and a generalized class of tests exists that encompass the standard logrank and Wilcoxon test as well as tests with other weighting schemes.70 Choice of test should be determined by whether there is specific interest in or expectation that differences will emerge under some pattern other than proportional hazards. In any case, when different tests differ, it is usually the case that the treatment effect is changing over time and thus a single HR may be an inadequate summary, and separate HR estimates for specific time intervals may be more appropriate. One of several formal tests for proportionality can be used when there is empirical evidence of nonproportionality from the KM curve plot. Heuristically, a single HR under strongly nonproportional hazards is akin to the mean of a highly skewed distribution, in that it may be computed but does not serve as a readily interpretable summary of the data. Figure 8.2 illustrates how statistical tests may differ under some different patterns of failure among survival cures being compared.

Another issue with KM survival curve displays relates to the follow-up period shown. Often the curves are plotted until a time point for which very few patients are under observation. This practice can create the misleading illusion of a large expanse between the curves, when in fact variability on the estimated proportion event free when few patients remain is very large, and if a few patients or even one patient were to fail, the estimate would change substantially. It is more

## A Disquistion On The Evils Of Using Tobacco

Among the evils which a vitiated appetite has fastened upon mankind, those that arise from the use of Tobacco hold a prominent place, and call loudly for reform. We pity the poor Chinese, who stupifies body and mind with opium, and the wretched Hindoo, who is under a similar slavery to his favorite plant, the Betel but we present the humiliating spectacle of an enlightened and christian nation, wasting annually more than twenty-five millions of dollars, and destroying the health and the lives of thousands, by a practice not at all less degrading than that of the Chinese or Hindoo.

## Post a comment