1m ago
393.11 KB
34 Pages

Concepts of Experimental DesignDesign Institute for Six SigmaA SAS White Paper

Table of ContentsIntroduction . 1Basic Concepts . 1Designing an Experiment . 2Write Down Research Problem and Questions. 2Define Population . 2Determine the Need for Sampling . 2Define the Experimental Design. 3Experimental (or Sampling) Unit. 4Types of Variables . 4Treatment Structure . 5Design Structure . 6Collecting Data. 7Analyzing Data . 7Types of Effects. 8Assumptions . 8Inference Space . 10Experimental Design Examples . 10Example 1: Completely Randomized Design. 10Determining Power and Sample Size and Generating a Completely RandomizedDesign . 11Generating a Completely Randomized Design . 13Analyzing Data from a Completely Randomized Design . 16Example 2: Randomized Complete Block Design . 19Determining Power and Sample Size and Generating a Randomized CompleteBlock Design. 19Generating a Randomized Complete Block Design. 22Analyzing a Randomized Complete Block Design . 23Conclusion . 28References. 28

Concepts of Experimental DesignIntroductionAn experiment is a process or study that results in the collection of data. The results ofexperiments are not known in advance. Usually, statistical experiments are conducted insituations in which researchers can manipulate the conditions of the experiment and cancontrol the factors that are irrelevant to the research objectives. For example, a rental carcompany compares the tread wear of four brands of tires, while also controlling for the type ofcar, speed, road surface, weather, and driver.Experimental design is the process of planning a study to meet specified objectives. Planningan experiment properly is very important in order to ensure that the right type of data and asufficient sample size and power are available to answer the research questions of interest asclearly and efficiently as possible.Six Sigma is a philosophy that teaches methodologies and techniques that provide theframework to define a business strategy that focuses on eliminating mistakes, waste, and rework. Six Sigma establishes a measurable status for achieving a strategic problem-solvingmethodology in order to increase customer satisfaction and dramatically enhance financialperformance. For more information about the Design Institute for Six Sigma at SAS, see Six Sigma training programs include some information about experimental design.However, the amount of training in these programs can vary from nothing about experimentaldesign to one-week of instruction about this subject. The purpose of this paper is tosummarize the basic concepts of traditional experimental design that would apply to a SixSigma project. These basic concepts also apply to a general experimental setting. In addition,this paper shows how to apply some of these basic concepts by using examples of commonexperimental design and analysis.This paper is written for people who have a basic understanding of experimental design.Basic ConceptsThis section discusses the basic concepts of experimental design, data collection, and dataanalysis. The following steps summarize the many decisions that need to be made at eachstage of the planning process for the experiment. These steps are not independent, and itmight be necessary to revise some earlier decisions that were made. A brief explanation ofeach step, which will help clarify the decisions that should be made during each stage, is givenin the section that follows this list.1

Concepts of Experimental DesignDesigning an ExperimentPerform the following steps when designing an experiment:1.Define the problem and the questions to be addressed.2.Define the population of interest.3.Determine the need for sampling.4.Define the experimental design.Write Down Research Problem and QuestionsBefore data collection begins, specific questions that the researcher plans to examine must beclearly identified. In addition, a researcher should identify the sources of variability in theexperimental conditions. One of the main goals of a designed experiment is to partition theeffects of the sources of variability into distinct components in order to examine specificquestions of interest. The objective of designed experiments is to improve the precision of theresults in order to examine the research hypotheses.Define PopulationA population is a collective whole of people, animals, plants, or other items that researcherscollect data from. Before collecting any data, it is important that researchers clearly define thepopulation, including a description of the members. The designed experiment shoulddesignate the population for which the problem will be examined. The entire population forwhich the researcher wants to draw conclusions will be the focus of the experiment.Determine the Need for SamplingA sample is one of many possible sub-sets of units that are selected from the population ofinterest. In many data collection studies, the population of interest is assumed to be muchlarger in size than the sample so, potentially, there are a very large (usually considered infinite)number of possible samples. The results from a sample are then used to draw validinferences about the population.A random sample is a sub-set of units that are selected randomly from a population. A randomsample represents the general population or the conditions that are selected for theexperiment because the population of interest is too large to study in its entirety. Usingtechniques such as random selection after stratification or blocking is often preferred.2

Concepts of Experimental DesignAn often-asked question about sampling is: How large should the sample be? Determining thesample size requires some knowledge of the observed or expected variance among samplemembers in addition to how large a difference among treatments you want to be able to detect.Another way to describe this aspect of the design stage is to conduct a prospective poweranalysis, which is a brief statement about the capability of an analysis to detect a practicaldifference. A power analysis is essential so that the data collection plan will work to enhancethe statistical tests primarily by reducing residual variation, which is one of the key componentsof a power analysis study.Define the Experimental DesignA clear definition of the details of the experiment makes the desired statistical analysespossible, and almost always improves the usefulness of the results. The overall data collectionand analysis plan considers how the experimental factors, both controlled and uncontrolled, fittogether into a model that will meet the specific objectives of the experiment and satisfy thepractical constraints of time and money.The data collection and analysis plan provides the maximum amount of information that isrelevant to a problem by using the available resources most efficiently. Understanding how therelevant variables fit into the design structure indicates whether the appropriate data will becollected in a way that permits an objective analysis that leads to valid inferences with respectto the stated problem. The desired result is to produce a layout of the design along with anexplanation of its structure and the necessary statistical analyses.The data collection protocol documents the details of the experiment such as the datadefinition, the structure of the design, the method of data collection, and the type of analyses tobe applied to the data.Defining the experimental design consists of the following steps:1.Identify the experimental unit.2.Identify the types of variables.3.Define the treatment structure.4.Define the design structure.In our experience in the design and implementation of previous studies, often, a number ofextenuating circumstances arise that require last minute adjustments to the data collectionplan. Therefore, contingency plans should be available to keep the structure of the designintact in order to meet the stated objectives.The following discussion provides further insight into the decisions that need to be made ineach of the steps for defining the experimental design.3

Concepts of Experimental DesignExperimental (or Sampling) UnitThe first step in detailing the data collection protocol is to define the experimental unit. Anexperimental or sampling unit is the person or object that will be studied by the researcher.This is the smallest unit of analysis in the experiment from which data will be collected. Forexample, depending on the objectives, experimental or sampling units can be individualpersons, students in a classroom, the classroom itself, an animal or a litter of animals, a plot ofland, patients from a doctor's office, and so on.Types of VariablesA data collection plan considers how four important variables: background, constant,uncontrollable, and primary, fit into the study. Inconclusive results are likely to result if any ofthese classifications are not adequately defined. It is important to consider all the relevantvariables (even those variables that might, at first, appear to be unnecessary) before the finaldata collection plan is approved in order to maximize confidence in the final results.Background variables can be identified and measured yet cannot be controlled; they willinfluence the outcome of an experiment. Background variables will be treated as covariates inthe model rather than primary variables. Primary variables are the variables of interest to theresearcher. When background variables are used in an analysis, better estimates of theprimary variables should result because the sources of variation that are supplied by thecovariates have been removed. Occasionally, primary variables must be treated as covariatesin order to keep the size of the experiment to a manageable level. Detailed measurements ofall relevant variables should be made, preferably at the time the actual measurements arecollected.Constant variables can be controlled or measured but, for some reason, will be held constantover the duration of the study. This action increases the validity of the results by reducingextraneous sources of variation from entering the data. For this data collection plan, some ofthe variables that will be held constant include: the use of standard operating procedures the use of one operator for each measuring device all measurements taken at specific times and locationsThe standard operating procedure of a measuring device should be used in the configurationand manner in which the developer and the technical representative consider mostappropriate. Operator error might also add to the variability of the results. To reduce thissource of variation, one operator is often used for each measuring device, or specific training isgiven with the intent of achieving uniform results.Uncontrollable (Hard-to-Change) variables are those variables that are known to exist, butconditions prevent them from being manipulated, or it is very difficult (due to cost or physicalconstraints) to measure them. The experimental error is due to the influential effects ofuncontrollable variables, which will result in less precise evaluations of the effects of the4

Concepts of Experimental Designprimary and background variables. The design of the experiment should eliminate or controlthese types of variables as much as possible in order to increase confidence in the finalresults.Primary variables are independent variables that are possible sources of variation in theresponse. These variables comprise the treatment and design structures and are referred toas factors. Primary variables are referred to as factors in the rest of this paper.Treatment StructureThe treatment structure consists of factors that the researcher wants to study and about whichthe researcher will make inferences. The primary factors are controlled by the researcher andare expected to show the effects of greatest interest on the response variable(s). For thisreason, they are called primary factors.The levels of greatest interest should be clearly defined for each primary factor. The levels ofthe primary factors represent the range of the inference space relative to this study. The levelsof the primary factors can represent the entire range of possibilities or a random sub-set. It isalso important to recognize and define when combinations of levels of two or more treatmentfactors are illogical or unlikely to exist.Factorial designs vary several factors simultaneously within a single experiment, with orwithout replication. One-way, two-way, three-way, 2n, 3n, D-optimal, central composite, andtwo-way with some controls are examples of treatment structures that are used to define howdata are collected. The treatment structure relates to the objectives of the experiment and thetype of data that’s available.Drawing a design template is a good way to view the structure of the design factors.Understanding the layout of the design through the visual representation of its primary factorswill greatly help you later to construct an appropriate statistical model.Fixed effects treatment factors are usually considered to be "fixed" in the sense that all levelsof interest are included in the study because they are selected by some non-random process,they consist of the whole population of possible levels, or other levels were not feasible toconsider as part of the study. The fixed effects represent the levels of a set of precisehypotheses of interest in the research. A fixed factor can have only a small number of inherentlevels; for example, the only relevant levels for gender are male and female. A factor shouldalso be considered fixed when only certain values of it are of interest, even though other levelsmight exist (types of evaluation tests given to students). Treatment factors can also beconsidered "fixed" as opposed to "random" because they are the only levels about which youwould want to make inferences.Because of resource limitations or missing data, all combinations of treatment factors might notbe present. This is known as the missing or empty cell problem.Certain designs, known as fractional factorials, are designed to enable you to study a largenumber of factors with a relatively small number of observations. Analyses of such dataassume that specific interactions among factors are negligible.5

Concepts of Experimental DesignIf a treatment condition appears more than one time, it is defined to be replicated. Truereplication refers to responses that are treated in the same way. Misconceptions about thenumber of replications have often occurred in experiments where sub-samples or repeatedobservations on a unit have been mistaken as additional experimental units.Notice that replication does not refer to the initial similarity among experimental units; theimportant issue is not the similarity of the experimental units but how many will be needed pertreatment, given their current differences and the differences to be expected as the experimentproceeds.Replication is essential for estimating experimental error. The type of replication that’spossible for a data collection plan represents how the error terms should be estimated. Two ormore measurements should be taken from each experimental unit at each combination ofconditions. In addition, it is desirable to have measurements taken at a later period in order totest for repeatability over time. The first method of replication gives an estimate of pure error,that is, the ability of the experimental units to provide similar results under identicalexperimental conditions. The second method provides an estimate of how closely the devicescan reproduce measurements over time.Design StructureMost experimental designs require experimental units to be allocated to treatments eitherrandomly or randomly with constraints, as in blocked designs (Montgomery 1997).Blocks are groups of experimental units that are formed to be as homogeneous as possiblewith respect to the block characteristics. The term block comes from the agricultural heritageof experimental design where a large block of land was selected for the various treatments,that had uniform soil, drainage, sunlight, and other important physical characteristics.Homogeneous clusters improve the comparison of treatments by randomly allocating levels ofthe treatments within each block.The design structure consists of those factors that define the blocking of the experimental unitsinto clusters. The types of commonly used design structures are described next.Completely Randomized Design. Subjects are assigned to treatments completely at random.For example, in an education study, students from several classrooms are randomly assignedto one of four treatment groups (three new types of a test and the standard). The total numberof students in 4 classrooms is 96. Randomly assign 1/4 of them, or 24 students, to each of the4 types of tests.Note: Numbers that are evenly divisible by 4 are used here; equal sample size in every cell,although desirable, is not absolutely ------------------ Test Method ------- Standard New Test 1 New Test 2 New Test 3 ------------- -------------- --------------- ------------ 24 students 24 students 24 students 24 students ---------6

Concepts of Experimental DesignRandomized Complete Block Design. Subjects are divided into b blocks (see description ofblocks above) according to demographic characteristics. Subjects in each block are thenrandomly assigned to treatments so that all treatment levels appear in each block. Forexample, in the education study that involved several classrooms, the classrooms might differdue to different teaching methods. Students are randomly assigned to one of the four types oftests within each classroom. There might be significant variability between the subjects ineach classroom, each of which contains 24 students. Randomly assign 6 students to each ofthe three types of test s and the standard. The classroom is now the 'block'. The primaryinterest is in the main effect of the ------------------- Test Method Class- ------ room Standard Test 1 Test 2 Test 3 ------ ------------- -------------- -------------- ------------ 1 6 students 6 students 6 students 6 students ------ ------------- -------------- -------------- ------------ 2 6 students 6 students 6 students 6 students ------ ------------- -------------- -------------- ------------ 3 6 students 6 students 6 students 6 students ------ ------------- -------------- -------------- ----------- 4 6 student 6 students 6 students 6 students ---------------The improvement of this design over a completely randomized design enables you to makecomparisons among treatments after removing the effects of a confounding variable, in thiscase, different classrooms.Collecting DataIt is important to follow the data collection protocol exactly as it is written when the data arecollected. Prior to collecting the data, it is important to double check that all the instrumentsare valid, reliable, and calibrated. After that is confirmed, take time to explain the datacollection procedures to the person who will be doing the actual data collection. It might seemcounter-intuitive to technicians and machine operators to execute a randomized design. Theymight re-organize the data collection scheme in an effort to be more efficient, without realizingthe impact that it might have on the experiment.Analyzing DataThe basis for the analysis of the data from designed experiments is discussed in this section.There are many thousands of experimental designs. Each design can be analyzed by using aspecific analysis of variance (ANOVA) that is designed for that experimental design. One ofthe jobs of a statistician is to recognize the various experimental designs, and to help clientscreate the design and analyze the experiments by using appropriate methods and software.The examples at the end of this paper will generate the experimental design and respectiveanalysis by using JMP software.7

Concepts of Experimental DesignTypes of EffectsAn effect is a change in the response due to a change in a factor level. There are differenttypes of effects. One objective of an experiment is to determine if there are significantdifferences in the responses across levels of a treatment (a fixed effect) or any interactionbetween the treatment levels. If this is always the case, the analysis is usually easilymanageable, given that the anomalies in the data are minimal (outliers, missing data,homogeneous variances, unbalanced sample sizes, and so on).A random effect exists when the levels that are chosen represent a random selection from amuch larger population of equally usable levels. This is often thought of as a sample ofinterchangeable individuals or conditions. The chosen levels represent arbitrary realizationsfrom a much larger set of other equally acceptable levels.Elements of the design structure (for example, the blocks) are usually treated as randomeffects. Blocks are a sub-set of the larger set of blocks over which inference is to be made. Itis helpful to assume that there is no interaction among elements of the design structure andelements of the treatment structure if blocks are considered a fixed effect. If blocks are treatedas random effects, you can determine interaction among elements of treatment structure anddesign structure.The number of potential human subjects that are available is often very large compared to theactual number of subjects that are used. Subjects who are chosen are likely to be just asreasonable to collect data from as potential subjects who were not chosen, and inferences forhow individual subjects respond is usually not of primary importance, whereas a measure ofthe variation in responses is important to know.One additional consideration that is essential in the evaluation of the treatment and designstructure with two or more treatment/design factors is to differentiate whether the levels of thefactors are either crossed or nested with each other.Two factors that are crossed with one another means that all levels of the first factor appear incombination with all levels of the second factor, which produces all possible combinations. Forexample, in an education program, male and female students receive the same educationaltests, thus, gender is crossed with test.One factor that is nested in a second factor implies a hierarchy. This means that a given levelof the nested factor appears in one level of the nesting factor. For example, in a study ofeducational programs, teachers are usually nested within schools because, usually, teachersteach only at one school.AssumptionsData from experiments are analyzed using linear regression and analysis of variance. Thestandard assumptions for data analysis that apply to linear regression and the analysis ofvariance are now summarized. To the degree that they are not satisfied implies that theresults are merely numbers on an output listing. However, when the assumptions are8

Concepts of Experimental Designsatisfied, the estimates from sample data inform us of the structure of relationships in the realworld. The following list briefly describes the assumptions behind linear regression:I. No model specification error The response Y is the dependent variable. The independent variables, x1,.,xp, influence Y. The form of the relationship between Y and (x1,.,xp) is linear (not nonlinear) in theparameters.II. No measurement error The dependent variable(s) are interval or ratio data (not ordinal or nominal). The independent variables are measured accurately.III. No collinearity (a high correlation between any two independent variables is not allowed).IV. The error term, residuals, are well-behaved when the following conditions hold: a zero mean homoscedasticity no autocorrelation (usually of most concern with time series or spatial data) no 'large' correlations between any of the independent variables normal distributionWhen any of these assumptions are not met, then statistical significance tests lose theirmeaning. What if some of them are not met? In practice, assumptions from I, II, and III areusually reasonably satisfied. It is the behavior of the errors in IV that can cause the mostsevere problems with estimation and hypothesis testing.The majority of these assumptions also apply to the analysis of variance because ANOVA isnothing more than a special case of regression.Although a few of the assumptions can be relaxed (such as measurement error, which usuallydoes not apply with categorical data), you should not categorize continuous data because theprocess of doing this creates a source of measurement error in the model that is notnecessary just use regression. Additional assumptions or desirable conditions for dataanalysis with analysis of variance include: Homogeneous variances across levels of data categories. Elements of the design structure are random effects (blocks).9

Concepts of Experimental Design Nearly equal sample sizes with outliers absent and missing data can be consideredmissing at random (MAR). No interaction exists among elements of the design structure and elements of thetreatment structure.Inference SpaceThe inference space of experimental results defines the range of conditions for which thesubjects form a representative sample, and are assumed to be randomly selected from thiswell-defined population. The first question to ask before any experimental design isadministered is, "What population do I want to apply the final results to?" By strict definition,the results of the experiment are only applicable to the population that the experimental unitswere selected from in the beginning. This means that the relevant range of the independentvariables that are covered by the experiment is limited by the types of experimental units in thesample. Relaxing this assumption requires considerable caution because it assumes that themodel remains valid outside the inference space, which in many cases might not be true.Factors that are defined as fixed are limited to those factors that are specified in theexperimental design. Therefore, extrapolation to other values, if any, is not justified.Inferences from a study should be confined to the type and range of independent variables thatare included in the design.Don't allow generalizations that are made for a group to extend far beyond the legitimate scopeof the research; that is, the sampling method that’s used is not consistent with the larger groupto which inferences are made (for example, generalizing to high school students when, in fact,only 6th graders were studied).Experimental Design ExamplesJMP software provides the capabilities to design experiments and perform the statisticalanalyses. JMP provides capabilities to create screening, mixed-level, response surface,mixture, Taguchi, and custom designs. This section of the paper demonstrates the twodesigns that are discussed in the design structure portion of this paper and the respectiveanalyses. These designs are among the most commonly used designs. The first design is acompletely randomized design that begins with a power analysis. The second design is arandomized complete block design. The examples come from Montgomery (1997).Example 1: Completely Randomized DesignA