Advances in Clinical Trial Designsfor Predictive Biomarker Discoveryand ValidationRichard Simon, DScCorresponding authorRichard Simon, DScBiometric Research Branch, National Cancer Institute,9000 Rockville Pike, Bethesda, MD 20892, USA.E-mail: [email protected] Breast Cancer Reports 2009, 1:216–221Current Medicine Group LLC ISSN 1943-4588Copyright 2009 by Current Medicine Group LLCCancers of the same primary site are in many casesheterogeneous in molecular pathogenesis, clinical course, and treatment responsiveness. Currentapproaches for treatment development, evaluation,and use result in treatment of many patients with ineffective drugs and lead to the conduct of large clinicaltrials to identify small, average treatment benefits forheterogeneous groups of patients. New genomic andproteomic technologies provide powerful tools forthe identification of patients who require systemic oraggressive treatment and the selection of those likelyor unlikely to benefit from a specific regimen. In spiteof the large literature on developing prognostic andpredictive biomarkers and on statistical methodologyfor analysis of high dimensional data, there is considerable uncertainty about proper approaches for thevalidation of biomarker-based diagnostic tests. Thisarticle attempts to clarify these issues and provide aguide to recent publications on the design of clinicaltrials for evaluating the clinical utility and robustnessof prognostic and predictive biomarkers.IntroductionThis article reviews the developments in clinical trialdesigns for development of predictive biomarkers. First,however, it is important to clarify terminology. Biomarkersare biological measurements that are used for diverse purposes. Validation has meaning only in the sense of “fit forpurpose” and so general definitions of the term biomarkersometimes create confusion by mixing the standards forvalidation of one type of biomarker with those of others.Biomarkers can be subdivided into measurementsthat are made once at baseline and those that are maderepeatedly. Here, the focus is on baseline biomarkers.They are usually divided into prognostic biomarkers andpredictive biomarkers.Prognostic biomarkers are often defined as measurements made at diagnosis that provide information aboutpatient prognosis in the absence of treatment or in the presence of standard treatment. In many prognostic markerstudies, heterogeneous groups of patients are included without regard to stage or treatment. Such studies rarely resultin the development of markers that are used in clinicalpractice . Markers are not usually measured in practiceunless they have utility in the sense of informing physiciansto make improved treatment decisions. Perhaps the mostbasic problem with the design of many prognostic markerstudies is their lack of focus on a medical indication. Thislack of focus results in heterogeneous patient selection andan exploratory approach to data analysis.Predictive biomarkers are measured at baseline to identify patients who are likely or unlikely to benefit from aspecific treatment. Estrogen receptor (ER) overexpressionis probably both a prognostic and a predictive biomarker.Patients with ER-positive tumors have longer survival inthe absence of systemic therapy, making ER a prognosticmarker. ER positivity is a predictive marker for benefitfrom anti-estrogens such as tamoxifen. ER negativity isalso a predictive marker for benefit from several cytotoxicchemotherapy regimens. Human epidermal growth factorreceptor 2 (HER-2) amplification is a predictive markerfor benefit from trastuzumab and perhaps also from doxorubicin [2,3] and taxanes . A predictive biomarker canalso be used to identify patients who are poor candidatesfor a particular drug. For example, advanced colorectalcancer patients whose tumors have KRAS mutationsappear to be poor candidates for treatment with epidermal growth factor receptor (EGFR) antibodies .Developing Predictive BiomarkersPredictive biomarkers based on single gene/protein measurements are attractive because they are often closelylinked to the target of the drug and are thus biologically interpretable. In some cases, the target of the drugis known but it is not clear how to best measure target
Clinical Trial Designs for Predictive Biomarker Discovery and Validationinhibition or whether the target is driving tumor growthand invasion for an individual patient. In other cases, thedrug may have several targets and the options for measurement will be more numerous. Sawyers [6 ] has stated“One of the main barriers to further progress is identifying the biological indicators, or biomarkers, of cancerthat predict who will benefit from a particular targetedtherapy.” If a predictive biomarker is to be co-developedwith the drug, then the phase 1 and phase 2 studiesshould be designed to evaluate the candidate markersand assays available, to select one, and then to performanalytical validation of the assay prior to launching thephase 3 trial. Accomplishing all of this prior to initiationof a phase 3 trial can be very challenging.The term classifier refers to a test that translates biomarker measurements to a set of predicted categories. Fora predictive classifier, the categories often refer to patientsmost likely to respond to the new regimen and those lesslikely to respond. A biomarker based on a measurementinvolving a single gene or protein can be converted to aclassifier by introducing one or more cut-points, dependingon how many categories are desired. Classifiers can also bedefined by introducing cut-points to the summary score,which combines the expression levels of many genes .Gene expression profi lingMany algorithms have been used effectively with DNAmicroarray data for predicting a binary outcome. Dudoitet al.  compared algorithms using several publicly available data sets. The simplest methods, such as diagonallinear discriminant analysis and nearest neighbor classification, generally performed as well or better than themore complex methods.A gene expression–based classifier may involve measurement of the expression of many genes, but it is adiscrete indicator of two or more classes and can be usedfor selecting or stratifying patients in a clinical trial justlike a classifier based on a single gene or protein. A geneexpression classifier is not just a set of genes, however,and investigators who develop prognostic or predictivegene expression–based signatures should publish howthe genes were weighted and what cut-points they used todevelop risk groups, not just the list of their genes [9 ].Potti et al.  and Bennefoi et al.  reported thedevelopment of predictive biomarkers of response to standard chemotherapeutic agents using human tumor celllines. Coombes et al.  and Baggerly et al.  wereunable to confi rm those fi ndings. There is substantialinterest in using cancer cell lines to develop single genepredictive biomarkers for molecularly targeted drugs .van’t-Veer and Bernards  indicated that geneexpression–profiling studies have been more successfulfor developing prognostic markers than for predictingresponses to particular therapies. They offer several reasons for this, including that the latter is a more difficultchallenge and that sufficient numbers of tumor samplesare rarely available for patients with metastatic diseaseISimonI217who have received a specific therapy. They point outthat in developing a predictive signature of benefit froma specific adjuvant regimen, samples from patients on arandomized clinical trial comparing the regimen with acontrol are necessary. Correlating gene expression levelsto disease-free survivals for patients who have received aspecific regimen does not ensure that the marker is predictive and not just prognostic.Sample size for marker developmentThe problem of having a sufficient number of respondersis most severe in attempting to develop a de novo geneexpression–based predictive classifier based on wholegenome profi ling. Pusztai et al.  described a simulationexperiment attempting to discover HER-2 overexpressionas a predictive biomarker of response to trastuzumabbased on data from a phase 2 trial and concluded that thelikelihood of successful discovery was small. They recommend using candidate predictors based on the mechanismof action of the drug. They propose a tandem two-stepphase 2 trial design for use with a single prespecifiedcandidate predictor. During the fi rst stage, unselectedpatients are treated. If insufficient responses are seen, thetrial remains open to marker-positive patients only untilthere are sufficient numbers of them for separate analysis.This approach could be generalized for use with severalcandidate predictive biomarkers. If enough responseswere not observed during the initial unselected stage, thenaccrual would remain open to patients who were positivefor any of the candidate predictors.The simulations by Pusztai et al.  were based onsynthesizing phase 2 trials containing 60 patients (45patients with normal HER-2 [including no responders]and 15 patients with HER-2 amplifications [including 5responders]). Dobbin and Simon  studied sample sizerequirements for development of binary predictors basedon de novo gene expression profi ling. They recommendedthat at least 20 patients per group (responders and nonresponders) be included [17,18].Phase 3 Clinical Trial Designs for EvaluatingNew Treatments and Predictive BiomarkersA phase 3 therapeutic clinical trial should evaluate a newtreatment with regard to a measure of patient benefit fora defi ned target population . The role of a predictivebiomarker is in the defi nition of the target patient population for whom the treatment is evaluated. For a defi nedpopulation, the evaluation involves comparing outcomesfor the patients treated with the new regimen to outcomesof patients in the control group. A predictive biomarkeris a marker for which the treatment versus control difference (ie, treatment effect) differs between marker-positiveand marker-negative patients. Comparing disease-freesurvival for marker-positive and marker-negative patientstreated with the new regimen is not part of the evaluationof a predictive biomarker.
218ITranslational ResearchThere are many challenges in using candidate predictive biomarkers in the design of phase 3 trials for evaluatingtreatments. In some cases, there may be so much biological and phase 2 evidence that marker-negative patients donot benefit from the new treatment that it would not beappropriate to include such patients in the phase 3 trial.In other cases, it will be appropriate to include markernegative patients, but the challenge is to limit the numberof such patients or to design the clinical trial in a manner that supports claims for the overall population if itturns out that the candidate marker is not useful but thatthe treatment is effective. It is also important to be ableto design the phase 3 trial without including vastly morepatients than would have been necessary without usingthe biomarker. As Jorgensen  points out: “If personalized medicine is to have a real breakthrough there needsto be incentives for those who are going to do the researchand development work – the pharmaceutical and diagnostic companies.” The following sections review a variety ofclinical trial designs that have been proposed for phase 3trials of new drugs and predictive biomarkers.Marker strategy designThe marker strategy design is sometimes considered forevaluating the medical utility of a predictive marker forinforming the use of approved chemotherapy [21,22]. Withthis design, patients are randomized to be tested or not.For those who are not tested, their treatment is determinedbased on stage and standard clinical prognostic factors andpractice standards. For those patients randomized to betested, the results of the test can be used in conjunction withstage and standard prognostic factors to inform treatmentdecisions. Although the marker strategy design is regardedby some as a gold standard, it is often inefficient becausemany patients may receive the same treatment regardlessof the group to which they are randomized . In orderto have reasonable statistical power to detect differences inoutcome among the two randomization groups as a whole,a very large number of patients may have to be randomized.This inefficiency is particularly problematic for prognosticmarkers for identifying low-risk patients for whom chemotherapy may be withheld because the prospective study is atherapeutic equivalence trial involving detection of a smalltreatment effect.The defects in the marker strategy design can beavoided by performing the test in all patients and onlyrandomizing patients for whom the treatment assignmentis influenced by marker result. This was the approachused in the Microarray in Node-Negative Disease MayAvoid Chemotherapy (MINDACT) trial for evaluating a70-gene signature for guiding the use of standard chemotherapy in women with node-negative breast cancer .Enrichment designsWith an enrichment design, a diagnostic test is used torestrict eligibility for a randomized clinical trial comparinga regimen containing a new drug with a control regimen.This approach played a crucial role in the development oftrastuzumab. Patients with metastatic breast cancer whosetumors expressed HER-2 in an immunohistochemistry testwere eligible for randomization. Because the drug has littleeffect for test-negative patients and because about 75% ofpatients are negative, a standard clinical trial randomizingall comers would require an enormous sample size to detectthe diluted treatment effect. Pusztai et al.  describesimulations that illustrate this dilution effect.Simon and Maitournam [26–28] studied the efficiency of this approach relative to the standard approachof randomizing all patients without using the test at all.They found that the efficiency of the enrichment designdepended on