When you want to compare the mean of a population to a particular value, you can use PROC UNIVARIATE to perform a one-sample t-test. But sometimes you want to compare the means of more than one group, so you need to use a different kind of test.
In this lesson, you learn to use PROC TTEST to perform a two-sample t-test, and determine whether the means of two populations are statistically different from each other.
A way to test for differences in means when you have two or more groups is analysis of variance, or ANOVA. With ANOVA, you have a continuous response variable and at least one categorical predictor variable, which can have multiple levels, and you can use PROC GLM to analyze your data.
Your goal is to determine whether the differences in means are significant, and if they are significant, which specific groups differ from each other. You also learn to use PROC GLM to analyze group means in other situations, such as when you have a continuous response variable and two categorical predictor variables.
In this lesson, you learn to explore the difference between groups of one or more variables. You learn to graphically represent the difference between groups in order to explore associations between a categorical predictor variable and a continuous response variable. To do this, you use PROC SGPLOT to produce box plots. You learn to explore the difference between the means of only two groups by performing two-sample t-tests with PROC TTEST.
For two or more groups, you learn to use ANOVA with PROC GLM. You might want to store your PROC GLM results for future analysis. You learn to do this by adding a STORE statement and then performing additional analysis with PROC PLM.
Finally, you learn how to test the assumptions associated with your test, and how to make conclusions about the significance of the differences in group means.
Before you analyze your data, you need to know it well. Part of knowing your data is to get a general idea of any associations between predictor variables and the response variable. One method for doing this is to conduct a graphical analysis of your data. To graphically explore associations between a categorical predictor variable and a continuous response variable, you can use the SGPLOT procedure to produce box plots.
Hereâs what you learn in this topic.
explain what an association is
graphically explore associations by using the SGPLOT procedure
Suppose you are a data analyst who is studying data about the sale of residential properties in Ames, Iowa from 2006 to 2010.
The variable that you are interested in is SalePrice. So, SalePrice is the response variable. The categorical predictor variables in the datas set include house style, overall quality, overall condition, central air, and many more.
The original Ames housing data set contains 2,930 observations. You work primarily with a sample of 300 houses in the data set AmesHousing3, which is a subset of the full data set, AmesHousing.
For more information about the Ames housing data, click the Information button on the course interface.
An association exists between two variables when the expected value of one variable differs at different levels of the other variable. For instance, suppose the average sale price of homes with central air conditioning is markedly different than that of homes without central air. This would imply that there could be an association, or relationship, between sale price and central air.
A simple way to look for possible associations in your data is to create box plots. In a box plot, the response variable is typically on the Y axis and the categorical predictor variable is on the X axis.
The diamond within each box is the mean of Y. (The horizontal line in each box represents the median.) There is a box for each value of X. So, if there are two X values, the plot contains two boxes. Extending from the top and bottom of each box are whiskers that represent the spread of the data points. For this reason, some people refer to box plots as box-and-whisker plots.
You can include a regression line on your box plot to connect the means of Y at each value of X. If the regression line is not horizontal, then there might be an association between X and Y. In other words, the value of Y differs at different levels of X.
A horizontal regression line indicates that there is no association between X and Y. In other words, knowing the value of X does not tell you anything about the value of Y. So for each value of X, your best guess as to the value of Y would simply be the mean of Y, or Y-bar.
Before you begin analyzing the Ames housing data, you want to perform a graphical analysis of the associations between each categorical predictor variable and the continuous response variable, SalePrice. To do this, you create box plots with the SGPLOT procedure.
Letâs use PROC SGPLOT to explore the statdata.ameshousing3 data set for associations between categorical predictor variables and the continuous response variable. To do this, weâll use PROC SGPLOT to create comparative box plots.
proc sgplot data=statdata.ameshousing3; vbox SalePrice / category=Central_Air connect=mean; title “Sale Price Differences across Central Air”; run;
proc sgplot data=statdata.ameshousing3; vbox SalePrice / category=Fireplaces connect=mean; title “Sale Price Differences across Fireplaces”; run;
proc sgplot data=statdata.ameshousing3; vbox SalePrice / category=Heating_QC connect=mean; title “Sale Price Differences across Heating Quality”; run;
Our program contains three SGPLOT steps. Letâs look at the first one in detail.
In the SGPLOT statement, we specify the statdata.ameshousing3 data set.
Next, we include a VBOX statement. The VBOX statement creates a vertical box plot that shows the distribution of your data. After the keyword VBOX, we specify the variable for the Y axis. The response variable is typically on the Y axis, so we specify SalePrice here, followed by a forward slash. The CATEGORY= option creates different box-and-whisker plots for each level of the category variable. We want to see if there is an association between sale price and whether the home has central air conditioning, so we specify Central_Air.
We also want SAS to include a straight regression line that joins the means of Y from group to group. So, we include the CONNECT= option and specify mean. Finally, we include a TITLE statement and a RUN statement.
The second SGPLOT step specifies Fireplaces as the category variable and the third specifies Heating_QC.
Letâs run the code.
The SGPlot Procedure The SGPlot Procedure The SGPlot Procedure
In the box plot for Central_Air, the regression line is definitely not horizontal. Looks like houses with central air sold for higher prices than those without. Clearly there appears to be an association between Central_Air and SalePrice. Letâs look at the plot for Fireplaces. Looks like there isnât much difference between the sale price of houses with one fireplace and those with two.
In the heating quality box plots, we see that there are four levels for the variable Heating_QC: excellent, fair, good, and TA, which stands for typical/average. We see marked differences in sale price among homes with excellent, fair, and good heating quality. But the average sale price for homes with good heating quality is about the same as that of homes in the typical/average category.
Exploring associations with box plots helps prepare you for what you might see as you analyze your data. But donât use these plots to determine which variables to include in your model. They represent only simple relationships between one predictor variable and the response variable. When you start putting multiple variables in the model, the picture of associations can become very different.
To get an idea of the simple relationships in your data, you can run PROC SGPLOT for each predictor variable. If you are comfortable using macros, you can use a single macro program to create box plots for all of your predictor variables. To see the code for this macro, click the Information button. You can learn about macros in the SAS Macro Language 1: Essentials course.
Sometimes you want to make comparisons between two different populations or groups. For example, do males, on average, have higher salaries than females? Do females, on average, have lower blood pressure than males? Do patients who receive a new medication have higher T-cell counts than patients who receive a placebo during a drug trial? When you compare two different groups, you usually want to know if the means of the two groups are different. You can use the two-sample t-test to determine the answer. In this topic, you learn how to
analyze differences between two population means using the TTEST procedure
verify the assumptions of and perform a two-sample t-test
perform a one-sided t-test
You can use a one-sample t-test to determine if the mean of a population is equal to a particular value or not. When you collect a random sample of independent observations from two different populations, you can perform a two-sample t-test. The two-sample t-test is a hypothesis test for answering questions about the means of two populations. This test enables you to examine the differences between populations for one or more continuous variables. You can assess whether the means of the two populations are statistically different from each other. As you know, in statistics, the null hypothesis is your initial assumption and is usually one of equality.
The null hypothesis for the two-sample t-test is that the means for the two groups are equal, or that μ1 - μ2 equals 0. The alternative hypothesis is the logical opposite of the null hypothesis and is typically what you suspect or are trying to show. It is usually a hypothesis of inequality. The alternative hypothesis for the two-sample t-test is that the means for the two groups are not equal, or μ1 - μ2 does not equal 0.
When you compare the means of two populations using a two-sample t-test, you make three assumptions:
You need to examine your data and verify these assumptions before you run any statistical analyses. If any one of these assumptions is not valid, then the probability of drawing incorrect conclusions from the analyses could increase. Let’s examine these assumptions further.
The first assumption is one of independent observations. What does this mean? This means that one observation doesn’t affect another observation, that is, no observation provides information about any other observation. For example, if your data contains several observations on each subject or if your data contains observations on sets of twins, then the assumption of independent observations is not valid and you shouldn’t use the two sample t-test. You verify this assumption in the design phase of the experiment: if you have a random, representative sample and you collect the data correctly, this assumption should be true.
The next assumption is one of normality. Do you have normally distributed data for each group? If the populations from which you obtained your samples are normally distributed, then your sample data will most likely look normal too; it will be approximately symmetric and have close to a bell shape. You can examine plots of the data to verify this assumption.
The last assumption is homogeneity of variance. As you know, the variance is a measure of spread in your population. In the two-sample t-test, you assume that the variances in the two populations are equal. To verify this assumption, you can check to see if the variances in your two samples are approximately equal. If your sample variances are not too different, then you are safe to assume that the population variances are equal. The F-test is a formal way to verify this assumption.
To evaluate the assumption of equal variances in the two populations, you can use the F-test for equality of variances. The null hypothesis for this test is that the population variances are equal. The formula is sigma 1 squared equals sigma 2 squared, where sigma squared is the population parameter for the variance. The alternative hypothesis is that the population variances are not equal, which is sigma 1 squared does not equal sigma 2 squared.
To test the hypothesis, you calculate the F statistic, which is the ratio of the maximum sample variance of the two groups to the minimum sample variance of the two groups.
F=max(s21,s22)/min(s21,s22)
By construction, the F statistic is always greater than or equal to 1. If the variances in the populations really are equal, then you expect the variances in the samples to be nearly equal too.
Here’s a question. When the null hypothesis is true, what value will the F statistic be close to? If the variances in the two populations are equal, then the F statistic tends to be close to 1. Consequently, a large value for the F statistic is evidence against the assumption of equality.
You know the three assumptions for the two-sample t-test, so now let’s look at an example. Suppose you want to compare female and male test scores. For example, the students in Ms. Chao’s statistics course want to determine whether girls or boys in Carver County magnet schools scored higher on the SAT.
What do you think the null hypothesis is for this test? The null hypothesis is that the mean SAT score for girls is equal to the mean SAT score for boys. So, you’re not concerned with which group scored higher, but whether the difference in population average scores equals 0 or does not equal 0.
Do you think this is an example of a one-sided, two-sample t-test or a two-sided, two-sample t-test? This is an example of a two-sided two-sample t-test because youâre testing to see whether two group means are significantly different from each other. This would be a one-sided test if your alternative hypothesis was that the mean test score for girls is higher than the mean test score for boys, or the mean SAT score for boys is higher than the mean SAT score for girls.
The SAS data set TestScores contains the SAT score information and the variables Gender, SATScore, and IDNumber.
Gender SATScore IDNumber Male 1170 61469897 Female 1090 33081197 Male 1240 68137597 Female 1000 37070397 Male 1210 64608797 Female 970 60714297 Male 1020 16907997 Female 1490 9589297 Male 1200 93891897 Female 1260 85859397
Which of these variables is the categorical grouping variable, or classification variable? Gender is the categorical grouping variable with two values, male and female. Which of these variables is the continuous variable that you want to analyze for mean? You want to analyze and compare the mean values of the variable SATScore for each gender. Click the information button on the course interface to view the sample data.
You can use PROC TTEST in SAS for the two-sample t-test. PROC TTEST performs the two-sample t-test by default. It also computes confidence limits and uses ODS graphics to create graphs as part of its output. It automatically tests the assumption of equal variances, and provides an exact two-sample t-test when the assumption is met, and an approximate t-test when it is not met. Here’s the syntax.
PROC TTEST DATA=SAS-data-set
In the PROC TTEST statement, you specify your input SAS data set. In the CLASS statement, you specify the classification variable for the analysis, and in the VAR statement, you specify the continuous response variables.
In the following PROC TTEST step, the SAS data set is TestScores.
proc ttest data=statdata.testscores plots(shownull)=interval; class Gender; var SATScore; title “Two-Sample t=Test Comparing Girls to Boys”; run;
You add the PLOTS= option to the PROC TTEST statement to control the plots that ODS graphics produces. You want to examine the default plots, which are the histogram and the Q-Q plot, as well as the plot of confidence intervals. When you add the SHOWNULL option, SAS places a vertical reference line at the mean value of the null hypothesis, which is 0 by default, in the interval plot. Remember that you’re testing to see if μ1 - μ2 = 0. In the CLASS statement, you specify the variable Gender, and in the VAR statement, you specify the continuous variable SATScore. When you run this code in SAS, you’ll have all the information you need to determine if the group means are statistically different.
This program uses PROC TTEST to calculate a two-sample t-test. Let’s submit this program and take a look at the output SAS produces.
proc ttest data=statdata.testscores plots(shownull)=interval; class Gender; var SATScore; title “Two-Sample t-Test Comparing Girls to Boys”; run; title;
Two-Sample t=Test Comparing Girls to Boys
The TTEST Procedure
Variable: SATScore Gender N Mean Std Dev Std Err Minimum Maximum Female 40 1221.0 157.4 24.8864 910.0 1590.0 Male 40 1160.3 130.9 20.7008 890.0 1600.0 Diff (1-2) 60.7500 144.8 32.3706
Gender Method Mean 95% CL Mean Std Dev 95% CL Std Dev Female 1221.0 1170.7 1271.3 157.4 128.9 202.1 Male 1160.3 1118.4 1202.1 130.9 107.2 168.1 Diff (1-2) Pooled 60.7500 -3.6950 125.2 144.8 125.2 171.7 Diff (1-2) Satterthwaite 60.7500 -3.7286 125.2
Method Variances DF t Value Pr > |t| Pooled Equal 78 1.88 0.0643 Satterthwaite Unequal 75.497 1.88 0.0644 Equality of Variances Method Num DF Den DF F Value Pr > F Folded F 39 39 1.45 0.2545 Summary Panel for SATScore Difference Interval Plot for SATScore Q-Q Plots for SATScore You can ensure the code ran successfully by checking the log for warnings and errors. This code ran without any issues. Now let’s take a moment to examine the output. As you can see, the TTEST procedure produces summary statistics, confidence limits, standard deviations, and hypothesis tests. It also includes the graphical output we specified in the program. There’s quite a bit here to analyze, and there’s actually an order to how you should interpret these results. But before you look at the t-test results to compare the group means, you need to learn how to interpret them.
Let’s learn how to interpret the results of PROC TTEST by examining some generic data. We’ll focus on the order of your analysis. First, let’s verify the assumptions for the test. For the purposes of this example, let’s assume that you have independent observations and normally distributed data. Next, you need to verify the assumption of equal variances, so you analyze the F-Test for Equal Variances results. The p-value of the F-test is 0.7446, and this probability is greater than 0.05, your alpha, so you fail to reject the null hypothesis and can proceed as if the variances are equal between the groups.
Here’s a question. Because you are assuming the population variances are equal, which t-test should you use to determine if the means are equal? You should use the equal variance t-test, or the Pooled t-test. By default, SAS shows the 95% intervals for both the Pooled method, assuming equal variances for group 1 and group 2, and the Satterthwaite method, assuming unequal variances. SAS calculates a Pooled t-test that uses a weighted average of the two sample variances. You use this p-value to test that the means for the two groups are significantly different under the assumption that the variances are equal.
Here’s a question. Are the means of the two groups significantly different? The p-value of 0.0003 is less than 0.05, so you reject the null hypothesis and can conclude that the means between the two groups are significantly different.
Now consider this. What if you can’t verify the assumption of equal variances? For example, the p-value of this F-test is 0.0185. Is this value greater than your alpha? No, it’s not, so you have enough evidence to reject the null hypothesis of equal variances. Knowing this, which t-test should you use to determine if the means are equal? You should use the unequal variance t-test, or Satterthwaite test, to test the group means. SAS calculates a Satterthwaite t-test that compensates for unequal variances and allows you to move forward with the equality of means test when the variances are not equal. You use this p-value to test that the means for the two groups are significantly different without assuming that the variances are equal. Here’s a question. Are the means of the two groups significantly different? The p-value of 0.032 is less than 0.05, so you reject the null hypothesis and can conclude that the means between the two groups are significantly different.
Now you’re ready to interpret the two-sample t-test results and determine if the mean SAT score for females is equal to the mean SAT score for males. Let’s begin by verifying our assumptions. Remember that you satisfy the assumption of independent observations with good data collection. To assess the normality assumption for each gender, you examine the histograms and normal probability plots. The top histogram is of the SAT scores for females and the bottom histogram is of the SAT scores for males. Here’s a question. Do the data appear to have come from normal populations? Both histograms have a blue normal reference curve superimposed on the plots to help you determine if the distributions are normal, and these look approximately normal.
You can additionally examine the Q-Q plot, or quantile-quantile plot, at the bottom of the results to assess normality. If the data in a Q-Q plot comes from a normal distribution, the points cluster tightly around the reference line. Here you can see that the females are on the left and the males are on the right. Do the data appear to have come from normal distributions? Yes. The sample histograms and the Q-Q-plots show that the samples have an approximate normal shape, so it’s probably safe to assume that the populations are normal.
At the top of the results, you can see the statistical tables for the TTEST procedure. Let’s look at the Equality of Variances table. The F-test for equal variances has a p-value of 0.2545. Do you reject or fail to reject the null hypothesis? This p-value is greater than 0.05, so you fail to reject the null hypothesis. The variances are not statistically different at the 95% significance level.
Now you use the p-value for the t-test where the variances are equal, or the Pooled method t-test. This p-value shows the results of the t-test where the null hypothesis is that the average SAT scores of females and males are equal. The p-value of 0.0643 is greater than 0.05, so what can you conclude? You don’t have enough evidence to conclude that the average SAT scores for females and males are significantly different at the 95% significance level. Here’s a question. Do you notice anything interesting about the p-value in the Satterthwaite test? It’s almost equal to the Pooled p-value. Also, notice the t statistic values for both tests, 1.88. What can you generalize from these equalities? The Pooled and Satterthwaite t-tests are equal when the variances are equal. You can also see the 95% confidence intervals of the means and standard deviations for females and males.
For the differences between the means, you can see both Pooled and Satterthwaite 95% intervals. If you look at the confidence interval for the differences between the means using the Pooled method, which is -3.695 to 125.2, it includes 0, so you don’t have enough evidence to say that the difference of the means is significantly different from 0 at the 95% confidence level. This is equivalent to the p-value being greater than 0.05. In this scenario, the differences between the mean SAT score for females and males is the same whether you use the Pooled method or Satterthwaite method. The confidence intervals are what are now different, but just barely different, as are these p-values, because the sample variances for males and females are so similar.
At the top of the TTEST output, you see descriptive statistics separated by the levels of the CLASS variable Gender. What does the mean Diff value represent? It’s the result of subtracting the sample mean of group 2, the males, from the sample mean of group 1, the females. So the actual sample mean difference in SAT scores is 60.75 points. Here’s another question. Why does SAS label the females group 1? Females are group 1 because they come first alphabetically.
Let’s finally look at the confidence interval plot. Because the variances here are so similar between males and females, the Pooled and Satterthwaite intervals, and p-values, are very similar. Notice that the lower bound of the Pooled interval extends past 0, so you don’t have enough evidence to say that the difference of the mean SAT score for females and the mean SAT score for males is significantly different from zero.
proc ttest data=statdata.testscores plots(shownull)=interval; class Gender; var SATScore; title “Two-Sample t-Test Comparing Girls to Boys”; run; title;
With a two-sided two-sample t-test, the alternative hypothesis is that one population mean is not equal to the other population mean. In many situations, however, you might be more interested in one particular direction of the inequality, say, that the mean of the first population is greater than the mean of the second population. For instance, if you work for a drug company, you might only want to test for positive differences between your new drug and placebo and not the negative differences. You can use one-sided tests to determine this.
With one-sided tests, you look for a difference in one direction. So the null hypothesis could be that the mean of population 1 is less than or equal to the mean of population 2, or that the mean of population 1 is greater than or equal to the mean of population 2. Remember that the null hypothesis is usually one of equality, so it’s actually easier to determine the alternative hypothesis first because the alternative is what you want to test. If you’re testing to see if the mean of population 1 is less than the mean of population 2, you use the less than sign. The direction of the sign in the alternative hypothesis is the same as the direction you are testing, so you perform a lower-tailed test. The critical region is on the lower side. An advantage of a one-sided test is that it can increase the power of a statistical test, meaning that if you are right about the direction of the true difference, you will more likely detect a significant difference with a one-sided test than with a two-sided test. Here’s a question. What is power? Power is the probability that your test will reject the null hypothesis when the null hypothesis is false, or the probability that you will detect a difference when a difference actually exists.
In the previous scenario, you tested to see if there was a statistical difference between the mean SAT scores for males and females. Because you wanted to determine if the means were equal or not, you performed a two-sided, two-sample t-test. But what if you suspect that the mean of one group is less than or greater than the other group? You can perform a one-sided t-test.
For example, the students in Ms. Chao’s statistics course collected gender information during their analysis. Click the information button to view the sample data. They read a study published in the 1980s about girls who, on average, scored lower on standardized tests than boys. They don’t believe that this is still the case, particularly in their school. In fact, from their experience, they hypothesize the opposite, that the girls’ average scores now exceed the boys’ average scores. To find out if the average SAT score for females is greater than the average SAT score for males in Carver County magnet schools, what type of test can you use? This is a case for using a one-sided upper-tailed t-test. The reason it’s an upper tail is because the critical region is on the upper side. Because the females are group 1 and the males are group 2, the null hypothesis is that the female mean minus the male mean is less than or equal to 0. The alternative hypothesis, or what you believe to be true, is that the female mean minus the male mean is greater than 0, and greater than points to the direction of your tail. So the shaded region in the graph is the rejection region; you will reject the null if the t statistic falls in the shaded region.
You can use PROC TTEST to perform a one-sided upper-tailed t-test. You add the SIDES=U option to the syntax to specify an upper one-sided test. This test produces a one-sided test and one-sided confidence intervals. In an upper-tailed test, the intervals will always have an upper bound of positive infinity. Here’s a question. What do you think the SIDES=L option produces? This option specifies a lower one-sided test and produces confidence intervals with a lower bound of negative infinity.
Here’s the program for calculating a one-sided upper-tailed t-test. In this PROC TTEST step, the SAS data set is TestScores. You add the PLOTS= option to the PROC TTEST statement to control the plots that ODS graphics produces. You suppress the default plots with the keyword ONLY. You add the SHOWNULL option and specify a vertical reference line at the mean value of the null hypothesis, which is 0 by default, in the interval plot. You also see the H0=0 option, which is the default. This requests tests against the null value of 0, meaning that there is no difference between the means. The SIDES=U option declares this to be an upper one-sided t-test. Because female appears first alphabetically, the difference score in PROC TTEST will be for female minus male by default. In the CLASS statement, you specify the variable Gender, and in the VAR statement, you specify the continuous variable SATScore.
To calculate a one-sided upper-tailed t-test and determine if the mean SAT score for females is greater than the mean SAT score for males, let’s submit this PROC TTEST step.
proc ttest data=statdata.testscores plots(only shownull)=interval h0=0 sides=U; class Gender; var SATScore; title “One-Sided t-Test Comparing Girls to Boys”; run; title;
The log shows that SAS processed the code without errors. Here are the statistical tables. Let’s start by looking at the mean Diff value of 60.75. This is the point estimate for the difference between female and male SAT scores, and it confirms that it makes sense to perform an upper tailed test. Great!
Next, because you failed to reject the assumption of equal variances in the previous demonstration, which p-value should you examine? You use the Pooled method p-value, which is 0.0321. Assuming your alpha is 0.05, do you reject or fail to reject the null hypothesis? The p-value of 0.0321 is less than 0.05, so you reject the null. This means that statistically speaking, the average SAT score for females is greater than the average SAT scores for males.
If you look at the confidence interval for the differences between the means, 6.8651 to positive infinity, it does not include zero, so the difference of the means is significantly different from zero, with females scoring higher than males on average at the 95% confidence level. You learned the same thing from the p-value.
Lastly, the Difference Interval Plot reflects the one-sided nature of our analysis. This time, the confidence interval does not cross over zero, and this also tells you that there is a difference in male and female SAT scores, specifically that females scored higher on average than males.
You use the SIDES=L option because this is a lower-tailed, one-sided t-test. Control comes before Treatment alphabetically, so the test for differences is for Control minus Treatment. A negative value for the difference score indicates treatment improvement.
proc ttest data=statdata.german plots(shownull)=interval h0=0 sides=L; class Group; var Change; title ‘German Training, Comparing Treatment to Control’;
title2 ‘One-Sided t-Test’; run; title;
Sometimes you want to test hypotheses about the means in more than two groups. For example, suppose you want to look at average test scores for students exposed to one of three different background sounds during an examination: constant sound, random sound, and no sound. You collect data by recording test scores from different groups of students whom you randomly assign to take a test under one of the three sound conditions. You then calculate the mean test score for students within each of the different sound conditions. The sample means for students taking the test under different sound conditions would almost certainly be different. But are these differences large enough to indicate that the different sound conditions actually have an effect on test performance? This is a framework for one-way ANOVA. Here’s what you learn in this topic.
When you want to determine whether there are significant differences between the means of two populations, you can use the two-sample t-test. When you want to determine whether there are significant differences between the means of two or more populations, you can use ANOVA, or analysis of variance.
Let’s begin with the one-way ANOVA model. In this model, you have a continuous dependent, or response variable, and a categorical independent, or predictor variable. You can have many levels of the predictor variable, but with the one-way ANOVA model, you can have only one predictor variable. Let’s look at some research examples.
Suppose you want to know if accountants, on average, earn more than teachers. Think for a moment about the parts of this example. What is the response variable? It’s salary or earnings. What is the predictor variable? It’s profession, or some name that represents job title. So in other words, salary is dependent upon the profession. In this example, how many job titles are you comparing? You’re comparing two job titles, accountants and teachers, so you have two levels of the predictor variable, profession. Now, you might be wondering, “isn’t this a case for a two-sample t-test?” And actually, it is. You can think of a two-sample t-test as a special case of one-way ANOVA. The squared value of the t statistic for a two-sample t-test is equal to the F statistic of a one-way ANOVA with two populations.
Let’s look at another example. Suppose you’re interested in the T-cell counts of patients taking one of three medications, including a placebo. The response variable is the T-cell count. What is the predictor variable? It’s the medication, which includes three levels. Now, you could simply run a t-test for each pair of medications, but a more powerful approach is to analyze all of the data at once. This approach is called a one-way ANOVA.
When you test if differences are statistically significant, you want to know if the differences among the means in your sample are larger than what would occur by chance if the population means really are equal. Small differences between sample means are usually present.
The ANOVA test helps you determine if the differences are large enough to indicate that the population means are different. With ANOVA, the null hypothesis is that all of the population means are equal. So in the case of comparing the T-cell counts of patients taking one of three medications, the predictor variable Medication has three levels. The null hypothesis in this example is that the mean T-cell counts for patients taking any one of three medications are all equal. The alternative hypothesis with ANOVA is that not all of the population means are equal. In other words, at least one mean is different from the rest. In the T-cell count example, the alternative hypothesis is that for at least one medication, the mean T-cell count of patients is different than the others. If any one of the three means is different, you reject the null hypothesis.
The ANOVA Model
You want to use a model that explains as much of the variability in the T-cell counts as possible. This mathematical model is a way to represent the relationship between the response and predictor variables in ANOVA, that is, the dependent and independent variables. Let’s go over each part of this model. Yik is the response, where k stands for the observation number and i indexes the treatments, or in this case, the three types of medications. μ is the overall population mean of the response variable T-cell count, regardless of the medication used. Τi is the effect of each medication. Because you have three medications, i can be 1, 2, or 3. Τ1 is the difference between medication 1’s mean and the overall mean. Τ2 is the difference between medication 2’s mean and the overall mean. Τ3 is the difference between medication 3’s mean and the overall mean. εik is the error term in the model, also known as the unaccounted-for or within-group variation. This is a way to represent all variation in the T-cell counts that has not been accounted for due to the different types of medications. Remember that this is just one way to set up the ANOVA model with a mathematical equation.
As its name implies, analysis of variance analyzes the variances of the data to determine whether there is a difference between the group means. Think back to the T-cell count example. This line represents the average T-cell count for medications 1, 2, and 3 combined. With ANOVA, you can determine if the variation of the means is large enough relative to the variation of observations within the groups. To do this, you calculate the variability between the means and the variability of observations within each group, and then calculate a ratio between these two measurements. If the between-group variability is significantly larger than the within-group variability, you reject the null that all of the group means are equal. So, you partition out the variability using sums of squares. For ANOVA, you calculate three types of sums of squares: Between Group Variation, Within Group Variation, and Total Variation. Let’s examine each of these.
Between-Group Variation is also called the Model Sum of Squares, or SSM. Here’s the formula. To calculate a measure of variability between the means, you take the difference between the mean for one group and the mean of all the observations, square it, and then multiply it by the number of observations for the group. You do the same thing for the other groups and then add each summation to get the model sum of squares.
Within-Group Variation is also called the Error Sum of Squares, or SSE. To calculate a measure of within group variation, meaning within each medication, you take the difference between each observation for one group and the mean for the group, square each of these values and add them together. You do the same thing for the other groups and then add each summation to get the error sum of squares.
Total Variation is also called the Total Sum of Squares, or SST. To calculate the total sum of squares, which is a measure of the total variability in the response variable, you take each data point and calculate the difference between the observed value and the overall mean. You then square each of these differences and add them together. You do the same thing for the other groups and then add each summation to find your SST.
This diagram shows a slightly easier way to understand the breakdown of variability in ANOVA. As you can see, SSM and SSE represent pieces of the total variability. Remember that the SSM is the variability explained by the type of medication, and the SSE is the variability not explained by the type of medication. You want the larger piece of the total to be better represented by what you can explain versus what you can’t explain, which means you want the SSM to be the larger of the two pieces. The SST represents the overall variability in the response variable, T-Cell Count.
Let’s now look at the assumptions for ANOVA that you need to verify before you perform the hypothesis test. These assumptions are just slightly different from the t-test assumptions.
What about the residuals? Residuals are estimates of the error term in the model. You calculate the residuals from ANOVA by taking each observation and subtracting its group mean. You do this to verify the two assumptions involving the normality and equal variances of the errors. If your sample sizes are reasonably large and approximately equal across groups, then only severe departures from normality are considered a problem. Take a look at this table. Here’s a question. What is the predicted value for medication 2 in observations 2 and 5? The predicted value in both observations is 905 because this value is the group mean. Notice also that residuals can be positive or negative.
It’s time to learn about growing garlic. The farmers at the Montana Gourmet Garlic ranch want to know if the type of fertilizer they use affects the average bulb weight of their organic garlic, so they design an experiment. They test three different organic fertilizers and one chemical fertilizer, which is the control. They blind themselves to the fertilizers by numbering the fertilizer containers 1 through 4. They use 32 beds of garlic within the total one acre farm, and then randomly assign fertilizers to the beds. Then they calculate the average weight of garlic bulbs in each of the beds. Here’s a question. What do you think the response and predictor variables are in this scenario? Bulb weight is the response variable and fertilizer, which includes four levels to represent the four fertilizers, is the predictor variable. You want to set up a one-way ANOVA to test the effects of the four fertilizers on the average garlic bulb weight. Can you identify the null hypothesis? The null hypothesis is that the mean bulb weights for each of the fertilizers are equal: μ1=μ2=μ3=μ4. The alternative hypothesis is that the mean bulb weight for at least one fertilizer is different than the others.
It’s always a good idea to get to know your data before you set up a one-way ANOVA. Let’s begin with a simple PROC PRINT step of the data set MGGarlic, which contains the data from the Montana Gourmet Garlic ranch experiment. Let’s submit this step and take a look at the first 10 observations.
proc print data=statdata.mggarlic (obs=10); title “Partial Listing of Garlic Data”; run;
You can see the Fertilizer variable, which represents the type of fertilizer, the BulbWt variable, which represents the average garlic bulb weight in pounds in the bed, the Cloves variable, which we will not use, and the BedID variable, which is a randomly assigned bed identification number.
Now let’s go back to the editor and add a PROC MEANS statement to calculate descriptive statistics for BulbWt for each type of Fertilizer. Let’s specify printalltypes to display the means for the overall bulb weight and the means of bulb weight by fertilizer.
We also want to create box plots for each Fertilizer, so we’ll add a PROC SGPLOT statement to the program. The CATEGORY option in the VBOX statement produces separate box and whisker plots for each level of Fertilizer. The DATALABEL option identifies potential outliers with the variable BedID.
Let’s submit these two steps.
proc means data=statdata.mggarlic printalltypes maxdec=3; var BulbWt; class Fertilizer; title “Descriptive Statistics of Garlic Weight”; run;
ods graphics on / width=700; proc sgplot data=statdata.mggarlic; vbox BulbWt / category=Fertilizer datalabel=BedID; format BedID 5.; title “Box Plots of Garlic Weight”; run; title;
The log verifies that the code ran successfully, so let’s first examine the PROC MEANS output. The mean bulb weight for all 32 beds of garlic is 0.219 pounds with a standard deviation of 0.029. Look at the breakdown of the means for the different fertilizers. Which fertilizer has the highest mean? Fertilizer 3 has the highest mean at 0.23, though its mean is fairly close to fertilizers 1 and 2. When you look at the number of observations for each fertilizer, what stands out to you? You should recognize that this design is not balanced. In other words, the groups are not equally sized. Fertilizer 4 has the least number of observations. It also appears just by looking at these statistics that fertilizer 4 has the lowest mean and the largest variability.
Let’s move on to the box and whisker plot now. These box plots show us graphically what we saw in the table of statistics. You can see how the means, represented by the diamonds, for the four fertilizer types compare to one another. Notice that the plot for fertilizer 4 reinforces what we saw with PROC MEANS: it has the lowest mean and the most variability. What was our original question? Are the bulb weight means for the four different fertilizers statistically different from one another? You might have a hunch at this point, but let’s use ANOVA to find the answer.
You can use PROC GLM to verify the ANOVA assumptions and perform the ANOVA test. PROC GLM fits a general linear model, of which ANOVA is a special case. PROC GLM also displays the sums of squares associated with each hypothesis it tests.
Here’s the PROC GLM syntax and the program for your analysis. In the PROC GLM statement, you specify your SAS data set, MGGarlic. You add the PLOTS= option and specify ONLY to suppress the default plots in PROC GLM. You request diagnostics plots and specify UNPACK. This means that SAS puts each plot on a separate page; otherwise, you will see all of the plots in a grid or panel display. In the CLASS statement, you specify the classification variable for the analysis. This is your predictor variable, Fertilizer. In the MODEL statement, you specify the variables as indicated in the ANOVA model: BulbWt = Fertilizer. The MEANS statement computes unadjusted means, or arithmetic means, of the dependent variable BulbWt for each value of the specified effect. So the MEAN statement computes the average bulb weight for each type of fertilizer. You can also use the MEANS statement to test the assumption of equal variances. To do that, you add the HOVTEST option, which is the homogeneity of variance test option. This option performs Levene’s test for homogeneity of variances by default. If the resulting p-value of Levene’s test is greater than some critical value, typically 0.05, you fail to reject the null hypothesis of equal variances.
Here’s a question. What is the purpose of the QUIT statement in the code? PROC GLM supports RUN-group processing, which means that the procedure stays active until SAS encounters a PROC, DATA, or QUIT statement. RUN-group processing enables you to submit additional statements, followed by another RUN statement without resubmitting the PROC statement.
How will you verify your analysis of variance assumptions? The first ANOVA assumption is satisfied because fertilizers were randomly assigned to plots in an appropriate manner. You can use diagnostic plots of the residuals in PROC GLM to verify the assumption that the errors are normally distributed. And you can use Levene’s test for homogeneity in PROC GLM to verify that the variances are equal across the fertilizers. The GLM procedure also produces a plot of residuals versus their predicted values, the group means, to visually verify the equal variance assumption.
For more information about what the CLASS statement does in PROC GLM, click the Information button.
Let’s use this PROC GLM statement to first verify the ANOVA assumptions. Then let’s test our ANOVA model to answer the question of whether the type of fertilizer the farmers use affects garlic bulb weight. Let’s submit this code.
ods graphics on / width=700;
proc glm data=statdata.mggarlic plots(only)=diagnostics(unpack); class Fertilizer; model BulbWt=Fertilizer; means Fertilizer / hovtest; title “Testing for Equality of Means with PROC GLM”; run; quit; title;
Let’s first check the log and verify that the code ran successfully. It looks good. Now let’s move on to our results. SAS provides everything we need to verify our ANOVA assumptions and the ANOVA test results. It’s a good idea to check our assumptions first, but that means we have to examine the output in a slightly different order.
The Class Level Information table specifies the number of levels, the values of the class variable, and the number of observations SAS read. If any row has missing data for a predictor or response variable, SAS drops that row from the analysis.
You assume that the farmers did a good job at sampling garlic bulbs to weigh by randomly selecting them, so you assume that the observations are independent and check that off your list.
To verify if the variances are equal across fertilizers, you can first examine the Residuals by Predicted plot. This plot helps you to see graphically if the equal variance assumption has been met. You don’t want to see any patterns or trends, but rather, you want to see a random scatter of residuals above and below 0 for the four fertilizer groups. The plot looks good.
You can examine Levene’s test for homogeneity to more formally test the equal variance assumption. You don’t want to reject the null because that would be rejecting one of your assumptions, so you want a large p-value for this test. Because the p-value of 0.4173 is greater than 0.05, you fail to reject the null and conclude that the variances are equal. This is good. You verified the equal variance assumption.
To verify the assumption of the errors being normally distributed, you check the normal probability plot and histogram of the residuals. Because the residuals follow the diagonal reference line fairly closely, you can say that they are approximately normal.
The histogram of residuals looks approximately normal as well. It has no unique peak and it has short tails, but it’s approximately symmetric, so you verify the assumption that the error terms are normally distributed. Now you can look at the ANOVA table and feel comfortable interpreting your p-value.
This is the ANOVA output from PROC GLM. First you see the overall ANOVA table. The first column is information about the Degrees of Freedom. You can think of Degrees of Freedom as the number of independent pieces of information, or the number of values in the final calculation of a statistic that are free to vary. The next column is information about the sum of squares. The model sum of squares is 0.00458, the error sum of squares is 0.0218, and the total sum of squares is 0.0264. The mean square model is 0.0015. SAS calculates this by dividing the model sum of squares by the model Degrees of Freedom, which gives you the average sum of squares for the model. The mean square error is 0.00078, which is an estimate of the population variance. SAS calculates this by dividing the error sum of squares by the error Degrees of Freedom, which gives you the average sum of squares for the error. SAS calculates the F-statistic by dividing the MSM by the MSE. The F statistic is 1.96. Because the corresponding p-value of .1432 is greater than 0.05, you can conclude that there is not a statistically significant difference between the mean bulb weights for the four fertilizers. Remember, you are testing if the means for the four fertilizer types are equal, so you fail to reject the null. At this point, it’s important for you to realize that the one-way ANOVA is an omnibus test statistic and cannot tell you which specific groups are significantly different from each other, only that at least two groups are different. To determine which specific groups differ from each other, you need to use a post-hoc test.
The next table contains the R-Square, which is the proportion of variance in the response accounted for by the model. The R-square is between 0 and 1. It’s close to 0 if the independent variables do not explain much variability in the data, and it’s close to 1 if the independent variables explain a relatively large proportion of the variability in the data. This R-Square is 0.1734, so approximately 17% of the variation in bulb weight can be explained by fertilizer. Fertilizer doesn’t explain much of our variability. Interesting.
The coefficient of variation expresses the root MSE as a percentage of the mean bulb weight. It is a unit-less measure that is useful in comparing the variability of two sets of data with different units of measure. The Root MSE is the estimate of the standard deviation of bulb weights for all fertilizers. The BulbWt Mean is the mean of all of the data values in the variable BulbWt without regard to Fertilizer.
Now let’s look at information about our class variable in the model, Fertilizer. When you have one predictor variable in an ANOVA model, the breakdown of the variable in this table is the same as the model line in the overall ANOVA table and the information for Type I and Type III sums of squares is the same. All in all, the PROC GLM output supports your conclusion that there’s not a statistically significant difference between the mean bulb weights for the four fertilizers.
Suppose you’re interested in the T-cell counts of patients taking one of three medications, including a placebo, and you conduct an experiment. In this design, your primary variable, or factor of interest, is the type of medication. But you later realize that other factors, nuisance factors, play a role in the results of your experiment. For example, you believe that the ages of the patients greatly affect the responses to the medications. In an effort to control the factors that contribute to the outcome you want to measure, you redesign your experiment to group the patients by age groups: under 30, 30 to 50, and over 50. You’re not interested in the effect of age on T-cell counts, but you’re interested in grouping, or blocking by age to minimize variability, thus leading to greater precision. Here’s what you learn in this topic.
An observational, or retrospective, study is when you want to draw inferences about the effects of a treatment on subjects, but the assignment of the subjects into a treated group versus a controlled group is outside of your control. For example, in an observational study, the gender or ethnicity of the subjects naturally occurs, so you simply record this data as you observe it because it is what it is. Oftentimes in an observational study, you look back at data that’s already been collected because it’s the best that you can do based on resource issues or ethical issues. For example, it’s unethical to assign someone to a smoking or a non-smoking group, so you attempt to answer questions like “Does smoking cause lung cancer” by looking at data where subjects have not been randomly assigned to smoke, but where you merely observe whether someone smokes. In an observational study, you sometimes have very little control over other factors that contribute to the outcome you’re measuring.
In a controlled experiment, you have the flexibility to design the analysis prospectively and control for other factors that contribute to the outcome that you’re measuring. You can do this by blocking, or grouping, to minimize variability. You might also design a controlled experiment with the intention of reducing selection bias. For example, you randomly assign each subject to a treatment group or a control group before the start of the experiment. Randomization lessens the effects of things you can’t control for in your experiment.
Nuisance factors are factors that can affect the outcome of your experiment but are not of interest in your study. For example, in the T-cell count scenario, a factor that might affect the measured results, that is, patient responses to the medications, is patient age, but the effect of age isnât your primary interest. The variation due to Age becomes part of the random variation. In a randomized block design, you can use a blocking variable to control for the nuisance factors and reduce or eliminate their contribution to the experimental error. You typically need to spend some time deciding which nuisance factors are important enough to keep track of or, if possible, to control during the experiment.
This mathematical model is a way to represent the relationship between the response and predictor variables in ANOVA. This model includes a blocking variable. Including a blocking variable in the model is in essence like adding a second treatment variable to the model in terms of the way you write it. The way you set up your experiment and the way you collect the data is what defines it as a blocking factor. Let’s look at the parts of this model.
Yijk is the response, where i indexes the different age groups, j indexes the different levels of the predictor, and k represents the observation within a given age group and treatment. μ is the overall population mean or base level of the response, regardless of the age or medication. αi represents the effect of the block. Τj represents the effect of the predictor. εijk is the unaccounted for variation, or error term in your model.
In a model that does not include Age Group, the Age Group effects are lumped into the error term of the model, and it becomes harder to detect the effect of Medication. When you include Age Group in the model, you have now explained some of the variability in the error term that was previously unaccounted for. Although you’re not specifically interested in the effect of age, controlling for Age Group has made it easier to detect an effect of the medication.
Along with the three original ANOVA assumptions of independent observations, normally distributed errors, and equal variances across treatments, you make two more assumptions when you include a blocking factor in the model. First, you assume that the treatments are randomly assigned within each block. In the T-cell count example, this means that you assume the three medications are randomly assigned to each of the three age groups. Next, you assume that the effects of the treatment factor are constant across the levels of the blocking factor, meaning that the effects of the treatment factor don’t depend on the block they are in. When the effects of the treatment factor are not constant across the levels of another variable, it’s called interaction. But when you use a randomized block design, you assume that the effects are the same within each block. In other words, you assume that there are no interactions with the blocking variable.
In their original study, the Montana Gourmet Garlic farmers randomly assigned their fertilizers to plants in each of their 32 beds. Given the negative results of their study, meaning that there was no statistically significant difference between the mean bulb weights for the four fertilizers, the farmers consulted a statistician before planning their next study. They decide to rigorously control the influences on the growth of garlic. Here’s a question. Can you think of some possible nuisance factors in the growth of garlic? Sun exposure, the pH level of the soil, and rain are examples of possible nuisance factors. They likely affect the weight of the garlic bulbs, but they are not the primary concern. The statistician suggests ways to account for these nuisance variables in their experimental design. Although they can’t actually apply the nuisance factors randomly, in other words, they can’t change the weather or the soil pH or the sun exposure, they can control for these factors by blocking. He suggests that whatever the effects of the external influences are, the magnitudes of the nuisance factors should be approximately the same within sectors of the farm land. Therefore, instead of randomizing the fertilizer treatment across all 32 beds, he suggests that they randomize the application of the four fertilizer treatments within each of eight sectors. Based on this recommendation, the farmers divide the farm into eight sectors, each of which has four beds, and in each of the four beds, they randomly assign each of the four fertilizers. An experimental design like this is often referred to as a randomized block design. As you can see in this ANOVA model, Sector is the blocking variable.
Let’s submit this PROC PRINT step of the data set MGGarlic_Block, which contains the data from the redesigned Montana Gourmet Garlic ranch experiment, and get to know the data.
proc print data=statdata.mggarlic_block (obs=10); run;
Remember that the farmers divided the farm into eight sectors, each of which has four beds, and in each of the four beds, they randomly assigned each of the four fertilizers. Here’s a question. Which variable in this data set represents the beds in the experiment? Each sector is divided into 4 positions, and we randomly assign the fertilizers to those positions. So the variable Position is a number from 1 to 4, which identifies those positions or beds. What does the variable BedID represent? It is a 5-digit randomly assigned ID number given to each of the 32 beds in the experiment. Now let’s use PROC GLM to analyze the randomized block design of the garlic data. With this design, you’re testing for statistically significant differences between the mean bulb weights for the four fertilizers across sectors of land. Let’s start with the PROC GLM statement, followed by the CLASS statement. Here’s a question. How will SAS know what your blocking variable is? Where do you indicate this? You must list the blocking variable, Sector, in the CLASS statement. How should you set up the MODEL statement? In the MODEL statement, you specify the variables as indicated in the ANOVA model, which are BulbWt = Fertilizer Sector. You must also list the blocking variable in the model. Let’s add the TITLE, RUN, and QUIT statements and submit this program.
proc glm data=statdata.mggarlic_block plots(only)=diagnostics(unpack); class Fertilizer Sector; model BulbWt=Fertilizer Sector; title “ANOVA for Randomized Block Design”; run; quit; title;
Let’s first check the log to ensure the code ran successfully. Everything looks good here, so let’s move on to the output. As you can probably guess, you should examine the output in an order that makes sense for your purposes. Let’s start with verifying the ANOVA assumptions. Remember, you assume the farmers did a good job at sampling garlic bulbs to weigh by randomly selecting them, so you assume the observations are independent. Here’s the Q-Q Plot of Residuals for BulbWt. Do the errors appear to be normally distributed? Yes. Because the residuals follow the diagonal reference line fairly closely, you can see that they’re approximately normal.
You can examine the Residuals by Predicted plot to check if the variances are equal across each treatment and block combination. This plot helps you to see graphically if the equal variance assumption has been met. The plot looks good. Here’s a question. Can you also test the equal variance assumption more formally, such as with Levene’s Test for Homogeneity? Levene’s test is only available for one-way ANOVA models, so in this case, you have to use the Residuals by Predicted plot.
Now that you’ve verified the assumptions, you can look at the output and feel confident about interpreting the p-value. The overall F test, where F=5.86 and p=0.0003, indicates that there are significant differences between the means of the garlic bulb weights in each of the beds. However, because both Fertilizer and Sector are in the model, you can’t tell if the differences are due to differences among the fertilizers or just differences across sectors. To determine this, you generally use the Type III SS, and we’ll look at this in a moment. What have you gained by including Sector in the model? If you compare this MSE, which is 0.00039, to the MSE in the model that included Fertilizer only, 0.00077966, you see that it decreased. The drop in the MSE indicates that by adding the blocking factor, you were able to account for a bit more of the unexplained variability due to the nuisance factors. Also notice that the R-square for this model is much greater than that in the previous model without the blocking factor: 0.736 versus 0.173. To some degree, this is a function of just having more model degrees of freedom, but it’s unlikely that this is the only reason for this magnitude of difference. Most important to the Montana Gourmet Garlic farmers is that the effect of Fertilizer in this model is now significant. Its F-value is 4.31 and the corresponding p-value is 0.0162, and this is significant at the 0.05 level. So what can you conclude? You can conclude that at least one of the fertilizers is different from the others. This is great news.
The Type III SS test at the bottom of the output tests for differences due to each variable, controlling for or adjusting for the other variable. How about the blocking variable? Again you might ask: did it help the model? The rule of thumb that most statisticians use is that if the F-value is greater than 1, then it helped to add the blocking factor to your model. Because this F-value of 6.53 is greater than 1, adding Sector as a blocking factor helped to decrease the unexplained variability of the response, bulb weight. So adding Sector helps you have more precise estimates of the effect of Fertilizer. If the blocking factor was found not to be useful, would you still need to keep it in the model? Yes you would because this experiment, and therefore this analysis, was based on data the farmers collected using Sector as a blocking variable. You can, however, exclude it from future studies.
Let’s tie up the loose ends now. You determined from the randomized block design that one of the fertilizer types is different from the rest because your p-value for Fertilizer was significant. If you were to report this to the garlic farmers, they might say, “Well, which fertilizer is different from the rest?” Or “Which fertilizer is the best?” You could go back and conduct more t-tests to find the answer, but there are better techniques at your disposal.
The results of one-way ANOVA and randomized block design serve only to indicate whether at least one mean differs significantly from the others. However, they don’t tell you which means are statistically different. To isolate the differences, you could conduct a series of pairwise t-tests. For example, if you have three different treatments or groups, you could conduct three different statistical tests and compute the corresponding p-values. If you increase the number of times that you conduct a statistical test, you increase the likelihood that you will make a Type I error. Remember that when you make a Type I error, you reject the null hypothesis when the null hypothesis is actually true. The α level, or significance level, is the probability of a Type I error. To determine which means differ from other means, your next step is to conduct ANOVA post hoc tests and control the error rate using a multiple comparison method. Here’s what you learn in this topic.
When you have a fair coin, your probability of flipping the coin once and landing on heads is 0.5. Given this probability, what do you think the probability is of landing on heads on the second flip? Consider these choices. The correct probability is 0.5. The flips are independent of one another, so the probability doesn’t change. Now consider this. If you flip a coin twice, what is the probability of landing on heads at least once? The correct probability is 0.75. Landing on heads at least once means getting one or two heads. The only way to not get at least one head is to get two tails. The probability of landing on tails twice is 0.25, which is 0.5*0.5, so the probability of getting anything else, or at least 1 head, is 1-0.25, which is 0.75. This means that the probability of getting at least one head in two coin flips is greater than landing on heads in a single coin flip.
What does this have to do with ANOVA post hoc tests, you might ask? It’s the backbone and background for multiple comparison techniques. These same principles apply when performing multiple tests for differences between means in an ANOVA test. For example, suppose your ANOVA results suggest that you reject the null hypothesis that the means are the same across groups. You decide to conduct multiple pairwise comparisons in a post hoc analysis to learn which means differ. But consider this. When you perform a statistical test at the α level of 0.05, you have only a 5% chance of incorrectly rejecting the null hypothesis if the null hypothesis is true. Assuming the null hypothesis is true for your different comparisons, the probability that you conclude a difference exists at least one time when there really isn’t a difference increases with the more tests you perform. So, the chance that you make a Type I error increases each time you conduct a statistical test.
Take a look at this table. The comparisonwise error rate, or CER, is the probability of a Type I error on a single pairwise test. If you make no adjustments to your analysis procedure and continue to use the alpha equals 0.05 criterion for each pairwise comparison, then your probability of making a Type I error on at least one of your pairwise tests goes up dramatically as the number of comparisons increases.
What is the experimentwise error rate? The EER is the probability of making at least one Type I error when performing the whole set of comparisons. The EER takes into consideration the number of pairwise comparisons you make, so it increases as the number of tests increase. Presuming no differences exist, the chance that you falsely conclude at least one difference exists is much higher when you consider all possible comparisons. Let’s examine this concept further.
You calculate EER as 1-(1-α)nc, where nc is the number of comparisons, assuming that the tests are independent. If you are testing one hypothesis at a significance level of .05, meaning the CER is .05, then the overall EER is .05 as well. If you are testing three hypotheses, each at a significance level of .05, then the overall EER is .14, meaning that you have a 14% chance of rejecting at least one of your three null hypotheses just by chance, even if the null is true. You can see how the EER increases even more if you conduct 6 and 10 pairwise comparisons. If you want to make sure that the error rate is 0.05 for all of the comparisons, you need to use a method that controls the EER at a level like 0.05. Let’s learn about two of these methods.
The Tukey Method, which is also known as the Honestly Significant Difference test, is a popular multiple comparison test that controls the EER. This test compares all possible pairs of means, so it can only be used when you make pairwise comparisons. Remember that a pairwise comparison examines the difference between two treatment means. Tukey’s method controls the EER to equal the alpha level you specify when all possible pairwise comparisons are considered, and controls the EER to be less than the alpha you specify when fewer than all pairwise comparisons are considered. With the Tukey method, you can ensure that the EER is at most 0.05, regardless of how many pairwise comparisons you make.
EER = 1-(1-alpha)^nc where alpha = prob = -0.05 and nc is number of comparision
Dunnett’s Method is a specialized multiple comparison test that allows you to compare a single control group, such as a placebo in a drug trial, to all other groups or treatments. If you are interested in comparing all categories to a control group, then a test like Dunnett’s is more powerful than a test like Tukey’s that calculates all possible comparisons. Dunnett’s method controls the EER to be no greater than α when all treatments are compared to the control group, it accounts for the correlation that exists between the comparisons, and you can conduct one-sided tests of hypothesis against the control group.
Let’s look at an example. The Montana Gourmet Garlic farmers are trying to determine which of their fertilizers has the best effect on garlic bulb weight. They have three organic fertilizers and one chemical fertilizer, which is the control. In their last experiment, they performed a randomized block design to control for the nuisance factors, and after analyzing the results, they rejected the null hypothesis that all groups are the same. At least one of the fertilizer types is different from the others because the p-value for Fertilizer was significant. Here’s the ANOVA mathematical model. You add Sector to the model as your blocking variable, which means you’ll also need to modify your SAS program to determine which fertilizer is different. Let’s first look at the output SAS produces with this model.
You request all of the multiple comparison methods with options in the LSMEANS statement in PROC GLM. The PDIFF=ALL option requests p-values for the differences between ALL the means. The ADJUST= option specifies the adjustment method for multiple comparisons. If you don’t specify an option, SAS uses the Tukey method by default. When you specify the PDIFF=ALL option, SAS produces a diffogram automatically.
A diffogram displays all pairwise least squares means differences and indicates which are significant. You can use diffograms to visually assess whether two group means are statistically different. You can think of a diffogram as a least squares mean by least squares mean plot because SAS plots the least squares means on the vertical and horizontal axes. The point estimates for differences between the means for each pairwise comparison can be found at the intersections of the gray grid lines. The red and blue diagonal lines show the confidence intervals for the true differences of the means for each pairwise comparison, and the gray 45-degree reference line represents equality of the means. If the confidence interval for the two groups crosses over the reference line, then there is no significant difference between the two groups. In that case, the diagonal line for the pair will be broken and colored red. If the confidence interval does not cross the reference line, then there is a significant difference between the two groups, and the diagonal line for the pair will be solid and colored blue.
Here’s a question. In this diffogram, is there a significant difference between the means of treatments 1 and 2? Yes there is. This line represents the pairwise comparison of treatments 1 and 2. Because this line does not cross over the reference line, and because SAS made it a solid blue line, you know there’s a significant difference between these two treatments. Can you identify the pairwise comparisons that do not have significantly different means? As indicated by the red broken lines, the differences between treatments 3 and 4 and treatments 4 and 2 are not significant.
When you specify an LSMEANS statement with the ADJUST=Dunnett option, the GLM procedure produces multiple comparisons using Dunnett’s method and a control plot. A control plot displays the least squares mean and confidence limits of each treatment compared to the control group using Dunnett’s method. In this scenario, group 1 is the control group and the middle horizontal line represents its least squares mean value. You can see the arithmetic mean value in the upper right corner of the graph. SAS bounds the shaded area with the LDL, or lower decision limit, and the UDL, or upper decision limit. Notice that there is a vertical line for each treatment that you’re comparing to the control group. If a vertical line extends past the shaded area, then the group represented by the line is significantly different than the control group. In this case, which treatments are significantly different than the control? Treatments 2, 3, and 4 are all significantly different than the control. If a vertical line is longer, or further away from the shaded area, it represents a greater significance, that is, a smaller p-value.
Recall that to calculate multiple comparison tests and to produce the diffogram and control plot for the garlic bulb weight analysis in SAS, you use PROC GLM with the LSMEANS statement. Here’s the code. In the PROC GLM statement, you specify the SAS data set MGGarlic_Block. In the CLASS statement, you specify the classification and blocking variables for the analysis. In the MODEL statement, you specify the variables as indicated in the ANOVA model.
Now let’s look at the LSMEANS statements. You use these statements to run several multiple comparison tests on the means. In practice, you will typically use one method. The three shown here demonstrate how they compare to one another. In the first LSMEANS statement, you specify the classification, or predictor variable, Fertilizer. The PDIFF option requests p-values for the differences. You specify PDIFF=ALL, which is the default, to request all pairwise differences. You add the ADJUST= option to specify the adjustment method for multiple comparisons. If you don’t specify an adjustment method, SAS uses the Tukey method by default. Remember with the Tukey method, you can examine all pairwise differences.
In the second LSMEANS statement, you specify the ADJUST=Dunnett option to calculate multiple comparisons using Dunnett’s method. In the PDIFF=control option, you specify fertilizer 4 as the control group. In this analysis, the garlic growers are not blind to the fertilizers and know that number 4 is the chemical fertilizer. You specify controlu here, but control and controll are valid PDIFF options as well. When you use controlu, you are testing if the non-control levels, that is, fertilizers 1, 2, and 3, are greater than the control. You want to see if fertilizer 1 is statistically greater than fertilizer 4, and if fertilizer 2 is statistically greater than 4, and so on. What do you know about the direction of the sign in a comparison? The direction of the sign in the alternative hypothesis is the same as the direction you are testing, so this is a one-sided upper-tailed t-test.
The third LSMEANS statement requests all pairwise t-tests on the differences and requests that SAS make no adjustments for multiple comparisons.
Let’s use the LSMEANS statement in PROC GLM to run several multiple comparison tests on the means of fertilizer treatments and determine which fertilizer is significantly different from the others. Before we submit this code, what does the ODS SELECT statement do in this program? You can use the ODS SELECT and ODS EXCLUDE statements along with graph and table names to specify which ODS output SAS displays.
ods graphics on / width=700; ods select lsmeans diff meanplot diffplot controlplot;
proc glm data=statdata.mggarlic_block; class Fertilizer Sector; model BulbWt=Fertilizer Sector; lsmeans Fertilizer / pdiff=all adjust=tukey; lsmeans Fertilizer / pdiff=controlu(‘4’) adjust=dunnett; lsmeans Fertilizer / pdiff=all adjust=t; title “Garlic Data: Multiple Comparisons”; run; quit; title;
How do you know what output objects your SAS program produces? You can use the ODS TRACE statement in your program. When you add the ODS TRACE statement, SAS writes a trace record to the log that includes information about each output object, such as the path for each object and the label for each object. Let’s try it.
ods graphics on / width=700; ods trace on;
proc glm data=statdata.mggarlic_block; class Fertilizer Sector; model BulbWt=Fertilizer Sector; lsmeans Fertilizer / pdiff=all adjust=tukey; lsmeans Fertilizer / pdiff=controlu(‘4’) adjust=dunnett; lsmeans Fertilizer / pdiff=all adjust=t; title “Garlic Data: Multiple Comparisons”; run; quit; title;
When you check the log, you can see a list of the ODS output objects and information about each one. Now that you know the names, you can decide what output to specify in the ODS SELECT statement. Let’s turn tracing off, and then specify only the output we want to see. Now let’s submit this program.
ods graphics on / width=700; ods trace off; ods select lsmeans diff meanplot diffplot controlplot;
proc glm data=statdata.mggarlic_block; class Fertilizer Sector; model BulbWt=Fertilizer Sector; lsmeans Fertilizer / pdiff=all adjust=tukey; lsmeans Fertilizer / pdiff=controlu(‘4’) adjust=dunnett; lsmeans Fertilizer / pdiff=all adjust=t; title “Garlic Data: Multiple Comparisons”; run; quit; title;
The log shows that the PROC GLM statement ran without errors, so now let’s look at the output. Let’s start with the Tukey LSMEANS Comparisons, which correspond to your first LSMEANS statement. The first table shows the mean bulb weight for each type of fertilizer. The LSMEAN Number is a legend, or key, to read the table of p-values in the second table. In this case, they are the same as the numbers assigned to your fertilizers. The second table shows the p-values from pairwise comparisons of all possible combinations of means. Notice that row 2 column 4 has the same p-value as row 4 column 2. What does this mean? You see the same values because SAS compares the same two means in each case and displays them as a convenience to you. Why are some of the spaces blank? It doesn’t make sense to compare a mean to itself, so in each of these cases, you see blanks. Recall that the null hypothesis for each test is that the means for the two fertilizers are equal. Now for the important question: Do you see a significant pairwise comparison difference in this table? The only significant pairwise difference is between fertilizer 1 and fertilizer 4. The p-value of 0.0144 is less than your alpha, meaning that the bulb weights of these two fertilizers are significantly different from one another.
This plot shows the least squares mean graphically. You can see the mean bulb weight for each type of fertilizer. Fertilizer 1 has the highest mean, but all three organic fertilizers have higher mean weights than the chemical fertilizer, fertilizer 4. You can use the diffogram to visually assess if there is a significant difference between any of the pairwise comparisons using Tukey’s method. The blue diagonal line indicates that the means of fertilizers 1 and 4 are significantly different from one another.
Let’s now move onto the Dunnett method output. These results correspond to your second LSMEANS statement. In this table, you can see that SAS compared the first three fertilizers to fertilizer 4, the control, or chemical fertilizer. Even though the mean weights of garlic bulbs using any of the three organic fertilizers are all greater than the mean weight of garlic bulbs grown using the chemical fertilizer, you can say that only fertilizer 1 is statistically greater than the control. Its p-value is the only one less than your alpha.
Here’s the fertilizer control plot, which serves to reinforce what you just learned. Because you performed one-sided, upper-tailed hypothesis tests of each of the organic fertilizers versus the chemical fertilizer, you only see the upper shaded region with the UDL in your plot. Remember that the bottom horizontal line is the least squares mean of your control group. The vertical line for fertilizer 1 is the only line that extends past the UDL.
Finally, let’s examine the output that corresponds to your third LSMEANS statement. These t-tests do not adjust for multiple comparisons, and are therefore more liberal than tests that do control for the EER. Take a moment to look at the p-values. You might notice that the p-values in this table are smaller than those in the Tukey table. In fact, which additional significant pairwise difference does this method show? It shows that fertilizer 1 is significantly different from fertilizer 2 with a p-value of 0.0195. This is in addition to fertilizers 1 and 4 being statistically different with a p-value of 0.0029. Notice also that the comparison between fertilizers 3 and 4 is nearly significant. So with this test, there’s a tendency to find more significant pairwise differences than might actually exist.
Lastly, let’s take a look at the diffogram. Again, this reinforces what you know about fertilizers 1 and 4 and fertilizers 1 and 2. Using these multiple comparison techniques gives you options. If you feel strongly about controlling the EER, you shouldn’t use the pairwise t-test results and should instead use the Tukey or Dunnett results. You knew before you performed these multiple comparison techniques that fertilizer 1 produced the garlic with the heaviest overall mean bulb weight, so that would be your first choice if you are not considering other factors like cost or availability. But what if fertilizer 1 is very expensive or hard to obtain? With these multiple comparison techniques, you now know which fertilizers are not statistically different from fertilizer 1, so the Montana Gourmet Garlic farmers have options for the fertilizer to use that will produce equally heavy garlic bulbs.
You can use one-way ANOVA to determine whether there are significant differences between the means of two or more populations. With one-way ANOVA, you measure significant effects of one independent factor, or predictor variable, on your response variable. But suppose your experiment involves two or more factors, each with multiple levels, and you have multiple observations at each level. For example, as a wire manufacturer, you’re interested in knowing if the strength of the wire you produce is affected by different alloys and different heat settings used in the production process. You have two types of alloys: high alloy and low alloy, and four heat settings: levels 1, 2, 3, and 4, and these are your categorical predictor variables. You test each heat setting with each alloy type and record the breaking strength of each wire, which is your continuous response variable. How will you design the ANOVA model for this scenario? When you have a continuous response variable and two categorical predictor variables, you use the two-way ANOVA model. Here’s what you learn in this topic.
Letâs take a moment to review some terms related to linear modeling.
ANOVA and regression are used to estimate parameters in statistical models. Statistical models are simply the mathematical relationships relating predictor variables with response variables. The same model can be expressed in a variety of ways, depending on the way you want to communicate the results.
In this course, you will encounter the term effect, meaning the magnitude of the expected change in the response variable presumably caused by the change in value of a predictor variable in the model.
In addition, the variables in a model can be referred to as effects or terms. A main effect is the effect of a single predictor variable, such as X1, X2, or X3. Sometimes the relationship of the response variable with a predictor changes with the changing of another predictor variable. In models, these are coded as X1X2 or X1X2*X3. These effects are called interaction effects. You learn more about interaction effects later in this lesson.
When you have two categorical predictor variables and a continuous response variable in your experiment, you analyze your data using two-way ANOVA. Typically, anytime you consider an ANOVA with more than one predictor variable, it’s called an n-way ANOVA, where n represents the number of predictor variables. In the wire example, you’re interested in how both the type of alloy and the level of heat setting affect the breaking strength of the wire. You can use the two-way ANOVA model, where Alloy and HeatSetting are your two predictor variables in the model, to find the answer. You might be wondering if two-way ANOVA is the same as or similar to a randomized block design because in that design, you add the blocking variable as a factor in the analysis. The analysis in a randomized block design is actually a special type of two-way ANOVA in which you have one factor of interest and one blocking factor.
When you use two-way ANOVA, you examine the effects of your two predictor variables concurrently. You can also determine if the two predictor variables interact with respect to their effect on your response variable. This possible “interaction” means that the effect on one variable depends on the value of the other variable.
For example, this Interaction Plot for Strength shows the average breaking strength of wires over different levels of heat settings for high and low alloys. The high alloys are denoted with the solid blue line and the low alloys with the dashed red line. You can see that the average breaking strength of the wire for high and low alloys shows the same change across different levels of heat setting. When you see this parallelism, you can say that there’s not an interaction between alloy and heat setting. In this plot, however, you can see that the breaking strength for high alloys decreases as the heat setting increases, but the breaking strength for low alloys increases as the heat setting increases. This indicates an interaction between the variables Alloy and HeatSetting.
When you analyze a two-way ANOVA with interactions, you first look at any tests for interaction among the factors. If there is no interaction between the factors, you can interpret the tests for the individual factor effects to determine their significance or non-significance. If an interaction exists between any factors, the tests for the individual factor effects might be misleading due to masking of these effects by the interaction. This is especially true for unbalanced data.
Let’s look at how having more than one predictor variable and interactions affect the ANOVA model. Here’s the model. Yijk is your response variable, which is the breaking strength of each wire. μ is the overall population mean or base level of the response. This is the average breaking strength of all the wires, regardless of the type of alloy or heat setting. αi represents the effect of alloy type, which is the difference between the population mean of the ith type of alloy and the overall mean, μ. βj represents the effect of heat setting, which is the difference between the population mean of the jth level of heat setting and the overall mean, μ. (αβ)ij represents the interaction between the ith alloy type and the jth heat setting level. εijk is the unaccounted for variation or error term in your model.
What should you do if you determine that the interaction is not significant? When the interaction is not statistically significant, you can analyze the main effects with the model in its current form. This is generally the method you use when you analyze designed experiments. However, even when you analyze designed experiments, some statisticians might suggest that if the interaction is not significant, you can delete the interaction effect from your model, rerun the model, and then just analyze the main effects. This increases the power of the main effects test. The approach you choose might depend on your subject matter knowledge of the data and whether you think you should include the non-significant interaction term or not. If the interaction term is significant, it is good practice to keep the main effect terms that make up the interaction in the model, whether they are significant or not. This preserves model hierarchy.
Suppose you’re interested in conducting a study to determine whether different dosage levels of a particular drug have an effect on the blood pressure of people with three different types of heart disease. In this scenario, what is your response variable? Your response variable is blood pressure, which represents a change in the diastolic blood pressure of participants after two weeks of treatment. Is this a case for a two-way ANOVA? Yes it is. You have two categorical predictor variables. The first is Disease, which represents one of three categories of heart disease, A, B, or C. You don’t know what the specific diseases are. The second predictor variable is DrugDose, which represents the following four dosage levels of the drug: 100mg, 200mg, 500mg, and a placebo, which is your control.
The SAS data set Drug contains the experiment data. You can see the four drug dose levels and the three categories of disease. Here’s a question. What do the negative values for BloodP mean? The negative values indicate a reduction in the diastolic blood pressure after two weeks of treatment. The positive values, therefore, indicate an increase in the diastolic blood pressure. Click the information button to view the sample data.
Now let’s see how your experiment fits the two-way ANOVA model. Yijk is the observed BloodP for each patient. μ is the overall population mean of the response, BloodP. This is the average blood pressure of all patients regardless of the disease category or drug dose level. αi is the effect of the ith Disease category, which is the difference between the overall population mean of the ith disease category and the overall mean, μ. βj is the effect of the jth DrugDose, which is the difference between the population mean of the jth drug level and the overall mean, μ. (αβ)ij is the effect of the interaction between the ith Disease and the jth DrugDose. εijk is the error term, or residual in your model.
As with the one-way ANOVA, in this model you also assume that the observations are independent, that the data is normal for each observation, and that the population variances are equal for each treatment. Here’s a question. Can you identify the null hypothesis for your two-way ANOVA? The null hypothesis is that none of the effects in the model are statistically different, that is, no differences exist among the 12 group means. Where did the 12 means come from? Your experiment includes four drug dose levels and three types of heart disease, so you have 12 different combinations of dosage level and heart disease types.
Let’s use this PROC MEANS statement to examine the means of blood pressure overall, the means of blood pressure for each type of disease, the means of blood pressure for each level of drug dose, and the means of blood pressure for each disease by drug dose combination. You do this by specifying PRINTALLTYPES in the PROC MEANS statement. The OUTPUT statement creates a new dataset, Means, to include all of the types of means. The Means data set contains the variable TYPE, with values ranging from 0 to 3 to represent the four tables this PROC MEANS program generates. Type 0 gives you the mean blood pressure change of all observations, regardless of disease type or drug dose. Type 1 gives you the mean blood pressure for each drug dose, regardless of disease type. Type 2 gives you the mean blood pressure for each disease type, regardless of drug dose. And Type 3 gives you the mean blood pressure for each disease type and drug dose combination. You’ll use the TYPE value in the next demonstration. The variable Mean will be named BloodP_Mean in the new output data set.
What does this FORMAT statement do? This line of code simply applies the dosef format to the variable DrugDose so that you can see the actual four drug dose levels in the output rather than the numbers 1 through 4. Let’s submit this code.
proc format; value dosef 1=“Placebo” 2=“100 mg” 3=“200mg” 4=“500mg”; run;
proc means data=statdata.drug mean var std printalltypes; class Disease DrugDose; var BloodP; output out=means mean=BloodP_Mean; format DrugDose dosef.; title “Selected Descriptive Statistics for Drug Data Set”; run; title;
The log shows that SAS read and processed the code without errors. Let’s move on to your output. First you can see the mean, variance, and standard deviation for all of the observations, regardless of disease type or drug dose. Here’s a question. What conclusion can you make about the mean blood pressure for the two-week experiment? Overall, there was a drop in diastolic blood pressure. The mean change was about -2.3.
The next table shows the mean, variance, and standard deviation for each level of drug dose, regardless of disease type. What do you observe about the mean blood pressure in this table? You should notice that the reduction in mean blood pressure is greater when the drug dose is lower. Remember that this isn’t accounting for the type of heart disease each patient has.
Here’s the mean, variance, and standard deviation for each disease type, regardless of drug dose. How did the patients with disease A seem to respond to the experiment? The patients with disease A have the greatest reduction in mean blood pressure at -15. On the other hand, the patients with disease B showed an increase in blood pressure, but this is without accounting for the drug dose.
The last table shows the mean, variance, and standard deviation of blood pressure for each disease type and drug dose combination. SAS orders the table based on the order that you list the variables in the CLASS statement, so you see the disease type and the drug doses within. Here’s a question. In which disease type does the drug dose appear to be most effective? The drug treatment appears to be the most effective in patients with disease A because there is a mean blood pressure reduction at each level. Note that the first drug dosage level is the placebo, and patients with disease type A and placebo saw a slight increase in blood pressure, on average. Patients with disease B have an increase in blood pressure on average, so the treatment doesn’t appear to be effective for them. And the average change in blood pressure for patients with disease C is relatively constant across each level of drug dose.
Let’s now examine a means plot to graphically explore the relationship of blood pressure for each disease type and drug dose combination. In the PROC SGPLOT statement you specify the data set Means, which SAS created in the previous demo. You want to plot only the mean blood pressure change for each disease type and drug dose combination, so you specify where TYPE=3. The SCATTER statement creates a scatter plot with DrugDose on the x axis and BloodP_Mean on the y axis, grouped by Disease. You specify the appearance of the markers in the plot with the MARKERATTRS option. The SERIES statement adds lines to connect the dots in the scatterplot. The XAXIS statement forces the x axis to have tick marks only at integer values. By default, SAS assumes that X is a continuous variable. You must explicitly indicate that DrugDose is categorical. Let’s submit this code and continue to examine the data.
ods graphics on / width=800;
proc sgplot data=means; where TYPE=3;
scatter x=DrugDose y=BloodP_Mean / group=Disease markerattrs=(size=10); series x=DrugDose y=BloodP_Mean / group=Disease
lineattrs=(thickness=2); xaxis integer; format DrugDose dosef.;
title “Plot of Stratified Means in Drug Data Set”; run; title;
The log shows that SAS processed the code without errors, so let’s check the output. From the graph, the relationship is clearer. For disease type A, blood pressure decreases as the drug level increases. For disease type B, blood pressure increases as the drug level increases. For disease type C, blood pressure stays relatively the same for different drug levels. This plot is exploratory, and it helps you plan your analysis. You can generate similar plots in PROC GLM.
Specifying an interaction term in the MODEL statement of PROC GLM is easy. You can simply place an asterisk between the terms. Interaction terms are also called product terms or crossed effects.
Alternatively, you can use the bar operator to specify a full factorial model.
For example, here are two ways of writing the model for a full three-way factorial model:
model Y=A B AB C AC BC AB*C;
and
model Y=A|B|C;
For more information about using the bar operator to specify interaction terms in PROC GLM, click the Information button.
You can use PROC GLM to determine if the effects of Disease and DrugDose, and the interaction between the two, are statistically significant. In the PROC GLM statement, you specify the data set Drug. In the CLASS statement, you specify the classification variables for the analysis. In the MODEL statement, you specify the variables as they exist in the two-way ANOVA model. SAS enables you to easily define the interaction. You simply separate the two main effects variables by an asterisk. Let’s submit this program.
ods graphics on / width=800;
proc glm data=statdata.drug; class DrugDose Disease; model Bloodp=DrugDose Disease DrugDose*Disease; format DrugDose dosef.;
title “Analyze the Effects of DrugDose and Disease”; title2 “Including Interaction”; run; quit; title;
When you check the log, you see that it looks fine. No errors or warnings are present. Now you can examine the output. The p-value for the overall model is very small, so what does this tell you? You can reject the null hypothesis and conclude that at least one of the effects in the model is significant, in other words, there is at least one difference among the 12 group means, one for each drug dose and disease combination. Which factors explain this difference? You’ll see in just a few moments.
The R square is 0.3479, so approximately 35% of the variation in blood pressure change can be explained by the model. The average blood pressure change of all the observations is â“2.294, which is exactly what the PROC MEANS output showed.
The next tables show the breakdown of the main effects and interaction term in the model. Look at the Type I and Type III Sums of Squares values. Do you know why their values are not exactly the same? You don’t have a balanced design in this experiment. In other words, you have a different number of observations in each drug dose and disease combination group. In most situations, you will want to use the Type III SS. The Type I, or sequential SS, are the sums of squares you obtain from fitting the effects in the order you specify in the model. The Type III, or marginal SS, are the sums of squares you obtain from fitting each effect after all the other terms in the model, that is, the sums of squares for each effect corrected for the other terms in the model. Type III SS does not depend upon the order you specify effects in the model.
You want to look at the interaction term first. If it’s significant, the main effects don’t tell you the whole story. The p-value for DrugDose*Disease is 0.0001. Presuming an alpha of 0.05, you reject the null hypothesis. You have sufficient evidence to conclude that there is an interaction between the two factors, meaning that the effect of the level of drug dose on blood pressure changes for the different disease types. You don’t need to worry all that much about the significance of the main effects at this point for two reasons: 1) Because the interaction term is significant, you know that the effect of the drug level changes for the different disease types. 2) Because the interaction term is significant, you want to include the main effects in the model, whether they are significant or not, to preserve model hierarchy.
Let’s finally take a look at the interaction plot for blood pressure. SAS produces this plot by default when you have an interaction term in the model. This plot looks similar to the one you produced with PROC SGPLOT, except that this one plots each of the blood pressure change measurements, as well as the means for each drug dose and disease type combination. Well, you might be thinking, “Okay. I know the interaction is significant. What I really want to know is the effect of drug dose at each particular level of disease.” You have to add the LSMEANS statement to your program to find the answer.
You do not yet know the significance of the DrugDose effect at any particular level of Disease because of the interaction, so let’s see how you can analyze and interpret the effect. Here’s the new program. You add the LSMEANS statement to request the least squares mean for each unique DrugDose and Disease combination. You add the SLICE option to test the effect of DrugDose within each Disease. Let’s submit the step.
ods graphics on / width=800; ods select meanplot lsmeans slicedanova;
proc glm data=statdata.drug; class DrugDose Disease; model Bloodp=DrugDose Disease DrugDoseDisease; lsmeans DrugDoseDisease / slice=Disease; format DrugDose dosef.; title “Analyze the Effects of DrugDose”; title2 “at Each Level of Disease”; run; quit; title;
Again, you check the log to make sure the code ran without errors. This looks fine, so we can move on to the output. Here’s the sliced ANOVA table where you’re testing the significance of drug dosage level on blood pressure within each level of disease. Take a moment and determine which, if any, of these p-values is significant. As you’ve seen in previous plots, the drug dose effect is significant when used in patients with either disease A or disease B, but not in patients with disease C.
SAS creates two types of mean plots when you use the LSMEANS statement with an interaction term. The first plot simply displays the least squares mean for every effect level. SAS plots each effect level on the horizontal axis and the LSMean of blood pressure on the vertical axis.
In this second plot, you can basically see what you’ve seen earlier. You can look a little closer at the combination levels if you want. You can see that the greatest increase in blood pressure change is at the drug dosage level of 200mg for patients with disease B, and that the greatest decrease in blood pressure change is at the drug dosage level of 200mg for patients with disease A.
Based on these results, what treatment plan would you recommend to patients? It seems that you would want to aggressively treat blood pressure in patients with disease A with high dosages of the drug to decrease blood pressure. For those patients with disease B, perhaps a disease caused by a traumatic event, you might not want to use the drug at all because it appears to increase blood pressure. For those patients with disease C, you might want to look into an alternative drug because this drug doesn’t appear to have any effect on blood pressure.
proc means data=statdata.concrete mean var std printalltypes; class Brand Additive; var Strength; output out=means mean=Strength_Mean; title ‘Selected Descriptive Statistics for Concrete Data Set’; run;
proc sgplot data=means; where TYPE=3; scatter x=Additive y=Strength_Mean / group=Brand markerattrs=(size=10); xaxis integer; title ‘Plot of Stratified Means in Concrete Data Set’; run; title;
When running PROC GLM, you can add a STORE statement to save your analysis results. By using the STORE statement, you can run postprocessing analyses on the stored results, even if you no longer have access to the original data set.
The STORE statement requests that the procedure save the context and results of the statistical analysis into an item store. An item store is a binary file format that cannot be modified. You can process the contents of an item store with the PLM procedure.
For example, if you need to perform a time-consuming analysis, you can store the results by using the STORE statement.
At a later time, you can use PROC PLM to perform specific statistical analysis tasks based on the saved results of the previous analysis without having to fit the model again. This can be a great time saver!
Here is the syntax of the STORE statement. Following the keyword STORE and OUT= you specify the item store name and an optional label.
You can use the STORE statement in a number of SAS/STAT procedures. For more information about the STORE statement, click the Information button.
You know that you can add a STORE statement to PROC GLM, or a number of other procedures, to save the analysis results to a store item.
To perform post-fitting statistical analyses and plotting for the contents of the store item, you use the PLM procedure.
Here is the PROC PLM syntax. The statements and options that are available vary depending upon which procedure you used to produce the item store.
PROC PLM is followed by RESTORE= and then the specified item store to be processed. You can include the EFFECTPLOT statement to produce additional plots of the fitted model. The SLICE statement is helpful when your model contains an interaction. You use the SLICE statement to look at the effect of one variable at different slices, or value ranges, of another variable.
For more information about PROC PLM, click the Information button.
During your analysis of residential property sales in Ames, Iowa, you identify a possible interaction between the variables Heating_QC and Season_Sold. You decide to perform a two-way ANOVA of SalePrice with Heating_QC and Season_Sold as the predictor variables. You also include the interaction term Heating_QC*Season_Sold.
You add a STORE statement to save the analysis results as a store item. Later, you run PROC PLM to create some additional plots of the fitted model.
In this demonstration, we use PROC GLM to perform a two-way ANOVA with an interaction using the statdata.ameshousing3 data set. We include a STORE statement to save the results in an item store.
Here is our program.
ods graphics on;
proc glm data=statdata.ameshousing3 order=internal plots(only)=intplot; class Season_Sold Heating_QC; model SalePrice=Heating_QC Season_Sold Heating_QCSeason_Sold; lsmeans Heating_QCSeason_Sold / diff slice=Heating_QC; format Season_Sold Season.; store out=interact; title “Model with Heating Quality and Season as Interacting Predictors”; run; title;
It starts with ODS Graphics on to ensure that we get all of the ODS plots we request.
Next is the PROC GLM step. The PROC GLM statement specifies the data set statdata.ameshousing3.
The option order=internal tells SAS to use the order of the variable values stored internally, rather than the order of the formatted values. The internal values for Season_Sold are 1 (formatted as Winter), 2 (formatted as Spring), 3 (for Summer), and 4 (for Fall). So, including order=internal tells SAS to display the seasons in the order Winter, Spring, Summer, and Fall, rather than in alphabetical order.
We specify plots(only)=intplot to request an interaction plot. You can request an interaction plot even if there isnât an interaction in the model. In this case, there is an interaction, so weâll be able to visualize the interaction in the plot. Because Season_Sold and Heating_QC are categorical variables, we need to include them in a CLASS statement.
In the MODEL statement, we specify the response variable SalePrice, equals, the main effects Heating_QC and Season_Sold, and then interaction effect. The interaction effect is represented as the two main effects separated by an asterisk. Hereâs another way to represent the effects. Instead of listing the interaction effect, we could simply place a vertical bar between the main effects.
Recall that the LSMEANS option computes and compares least squares means of fixed effects. This LSMEANS statement specifies the interaction term Heating_QC by Season_Sold. By specifying slice=Heating_QC, we tell SAS to to slice the interaction effect by the different levels of Heating_QC.
This means that each slice will have one Heating_QC level and will show the Season_Sold effect across that slice. We’ll format Season_Sold with a FORMAT statement. The statement store out=interact saves the analysis results as a SAS item store named interact in the Work library.
The program ends with a TITLE statement and a RUN statement. Letâs run the program and check the log. No errors here. Now letâs look at the results.
The first two tables tell us that there are four levels of season sold (winter, spring, summer, and fall), four levels of heating quality and condition (excellent, fair, good, and typical/average), 300 observations, and no missing values. The next table tells us that the overall model is statistically significant.
Now letâs look at the p-value of the interaction term to determine whether to remove it. In this case, it is significant. This tells us that the relationship between Heating_QC and SalePrice differs across levels of Season_Sold. Also the relationship between Season_Sold and SalePrice differs across levels of Heating_QC. With the significance of this interaction, it should stay in the model. Note that due to model hierarchy, all effects contained within significant interactions should also remain in the model, regardless of their p-value.
The interaction plot is a line plot overlaid with all of the observations of the data set. This plot shows that for most categories of heating quality, the season when the property was sold had little effect on the sale price. However, when the heating quality value was fair, the price sold was low in winter, went up until summer and then went down again in fall.
The next table displays the least square means for every possible combination of Season_Sold and Heating_QC, and the following matrix displays p-values for every possible combination.
These tables were produced by the DIFF option in the LSMEANS statement, but they really arenât very informative. So letâs focus instead on the slice analysis of this model, which is produced by the SLICE option we specified in the code.
In the next table we have Season_Sold by Heating_QC sliced by Heating_QC for SalePrice. We have three degrees of freedom for each level of Heating_QC (excellent, fair, good, and typical/average). These are the three degrees of freedom for the Season_Sold variable, which has four levels.
The first p-value looks at the homogeneity of means within the Heating_QC group excellent across all of the levels of Season_Sold. This p-value shows that there is no significant difference across seasons sold for when heating quality and condition is excellent.
There is a statistically significant season sold effect within the fair group as well as the good group, but not within the typical/average group.
Previously, we used PROC GLM to perform a two-way ANOVA with an interaction using the statdata.ameshousing3 data set. We included a STORE statement to save the results in an item store named interact in the temporary Work library. In this demonsration, we are in the same SAS session, so we can access the the interact item store. We produce additional plots by running PROC PLM on the item store.
Here is our program.
ods graphics on;
/* previous program / / must be run in the same SAS session/ / proc glm data=statdata.ameshousing3 order=internal plots(only)=intplot; class Season_Sold Heating_QC; model SalePrice=Heating_QC Season_Sold Heating_QCSeason_Sold; lsmeans Heating_QCSeason_Sold / diff slice=Heating_QC; format Season_Sold Season.; store out=interact; title “Model with Heating Quality and Season as Interacting Predictors”; run; title; */
proc plm restore=interact plots=all; slice Heating_QC*Season_Sold / sliceby=Heating_QC adjust=tukey; effectplot interaction(sliceby=Heating_QC) / clm; run;
It starts with with ODS Graphics on to ensure that get all of the plots we request.
Previously, we ran PROC GLM with a STORE statement and saved the results in an item store named interact. So, restore=interact tells SAS to run PROC PLM on the interact item store. We specify plots=all, which tells SAS that we want all of the available ODS plots for the statements that we include in PROC PLM.
We want to produce more tables on Season_Sold by different levels of Heating_QC, so we add the SLICE statement slice Heating_QC by Season_Sold, sliceby=Heating_QC. Notice that the syntax sliceby= in the SLICE statement is different from slice= in the LSMEANS statement.
We include adjust=tukey in order to get the Tukey adjustment for multiple comparison tests.
By specifying plots=all, the output will include an effect plot. However, by adding an EFFECTPLOT statement, we can specify more options. We specify the interaction sliceby=Heating_QC, and we specify clm, which gives us confidence limits for the means.
Let’s run our program.
In the results, we see information about the item store we created, including its name and location, the data set from which it was created, the procedure we used to create it, the response variable, the class variables, and the model effects.
We get class level information again, and then we get a table of the overall F test for Season_Sold for the interaction term. This table focuses on just one Heating_QC level, excellent. The p-value is the same as the one shown in the PROC GLM results.
The next table shows all of the pairwise comparisons of Season_Sold within the Heating_QC level excellent. Here we get the unadjusted p-values as well as the Tukey adjusted p-values.
Because we specified plots=all and a SLICE statement, we get a diffogram. This diffogram is for just Season_Sold by Heating_QC at the excellent level. As you can see, there are no significant differences among pairs when we hold Heating_QC constant at excellent.
Next, letâs look at the diffogram for Heating_QC at the level fair. In the plot, notice this blue line. It tells us that there is a significant difference in the mean SalePrice of properties sold in the summer versus winter when the value of Heating_QC is fair.
Now letâs look at the diffogram for Heating_QC at the level good. Spring versus summer is statistically different. The last diffogram holds Heating_QC constant at typical/average and shows no significant differences among the pairs of seasons.
The last plot was produced by the EFFECTPLOT statement. This plot includes the confidence intervals that we requested.
proc sgplot data=statdata.drug; vline DrugDose / group=Disease stat=mean response=BloodP markers; format DrugDose dosefmt.; run;
ods graphics on;
proc glm data=statdata.drug plots(only)=intplot; class DrugDose Disease; model BloodP=DrugDose|Disease; lsmeans DrugDose*Disease / slice=Disease; run; quit;
This summary contains topic summaries, syntax, and sample programs.
Graphical Analysis
Part of knowing your data is to get a general idea of any associations between predictor variables and the response variable. You can do this by conducting a graphical analysis of your data using box plots.
Two-Sample t-Tests
The two-sample t-test is a hypothesis test for answering questions about the means of two populations. You can examine the differences between populations for one or more continuous variables and assess whether the means of the two populations are statistically different from each other.
The null hypothesis for the two-sample t-test is that the means for the two groups are equal. The alternative hypothesis is the logical opposite of the null and is typically what you suspect or are trying to show. It is usually a hypothesis of inequality. The alternative hypothesis for the two-sample t-test is that the means for the two groups are not equal.
The three assumptions for the two-sample t-test are independence, normality, and equal variances.
You use the F-test for equality of variances to evaluate the assumption of equal variances in the two populations. You calculate the F statistic, which is the ratio of the maximum sample variance of the two groups to the minimum sample variance of the two groups. If the p-value of the F-test is greater than your alpha, you fail to reject the null hypothesis and can proceed as if the variances are equal between the groups. If the p-value of the F-test is less than your alpha, you reject the null hypothesis and can proceed as if the variances are not equal.
With one-sided tests, you look for a difference in one direction. For instance, you can test to determine whether the mean of one population is greater than or less than the mean of another population. An advantage of one-sided tests is that they can increase the power of a statistical test.
To perform the two-sample t-test and the one-sided test, you can use PROC TTEST. You add the PLOTS option to the PROC TTEST statement to control the plots that ODS produces. You add the SIDES=U or SIDES=L option to specify an upper or lower one-sided test.
One-Way ANOVA
You can use ANOVA to determine whether there are significant differences between the means of two or more populations. In this model, you have a continuous dependent, or response, variable and a categorical independent, or predictor, variable. With ANOVA, the null hypothesis is that all of the population means are equal. The alternative hypothesis is that not all of the population means are equal. In other words, at least one mean is different from the rest.
One way to represent the relationship between the response and predictor variables in ANOVA is with a mathematical ANOVA model.
ANOVA analyzes the variances of the data to determine whether there is a difference between the group means. You can determine whether the variation of the means is large enough relative to the variation of observations within the group. To do this, you calculate three types of sums of squares: between group variation (SSM), within group variation (SSE), and total variation (SST). The SSM and SSE represent pieces of the total variability. If the SSM is larger than the SSE, you reject the null hypothesis that all of the group means are equal.
Before you perform the hypothesis test, you need to verify the three ANOVA assumptions: the observations are independent observations, the error terms are normally distributed, and the error terms have equal variances across groups.
The residuals that come from your data are estimates of the error term in the model. You calculate the residuals from ANOVA by taking each observation and subtracting its group mean. Then you verify the two assumptions regarding normality and equal variances of the errors.
To verify the ANOVA assumptions and perform the ANOVA test, you use PROC GLM. In the MODEL statement, you specify the dependent and independent variables for the analysis. The MEANS statement computes unadjusted means of the dependent variable for each value of the specified effect. You can add the HOVTEST option to the MEANS statement to perform Levene’s test for homogeneity of variances. If the resulting p-value of Levene’s test is greater than 0.05 (typically), then you fail to reject the null hypothesis of equal variances.
ANOVA with Data from a Randomized Block Design
In a controlled experiment, you can design the analysis prospectively and control for other factors, nuisance factors, that affect the outcome you’re measuring. Nuisance factors can affect the outcome of your experiment but are not of interest in the experiment. In a randomized block design, you can use a blocking variable to control for the nuisance factors and reduce or eliminate their contribution to the experimental error.
One way to represent the relationship between the response and predictor variables in ANOVA is with a mathematical ANOVA model. You can also include a blocking variable in the model.
Along with the three original ANOVA assumptions of independent observations, normally distributed errors, and equal variances across treatments, you make two more assumptions when you include a blocking factor in the model. You assume that the treatments are randomly assigned within each block, and you assume that the effects of the treatment factor are constant across levels of the blocking factor.
You use PROC GLM to perform ANOVA with a blocking variable. You list the blocking variable in the CLASS statement and in the MODEL statement.
ANOVA Post Hoc Tests
A pairwise comparison examines the difference between two treatment means. If your ANOVA results suggest that you reject the null hypothesis that the means are equal across groups, you can conduct multiple pairwise comparisons in a post hoc analysis to learn which means differ.
The chance that you make a Type I error increases each time you conduct a statistical test. The comparisonwise error rate, or CER, is the probability of a Type I error on a single pairwise test. The experimentwise error rate, or EER, is the probability of making at least one Type I error when performing all of the pairwise comparisons. The EER increases as the number of pairwise comparisons increases.
You can use the Tukey method to control the EER. This test compares all possible pairs of means, so it can only be used when you make pairwise comparisons. Dunnett’s method is a specialized multiple comparison test that enables you to compare a single control group to all other groups.
You request all of the multiple comparison methods with options in the LSMEANS statement in PROC GLM. You use the PDIFF=ALL option to request p-values for the differences between all of the means. With this option, SAS produces a diffogram. You use the ADJUST= option to specify the adjustment method for multiple comparisons. When you specify the ADJUST=Dunnett option, SAS produces multiple comparisons using Dunnett’s method and a control plot.
Two-Way ANOVA with Interactions
When you have two categorical predictor variables and a continuous response variable, you can analyze your data using two-way ANOVA. With two-way ANOVA, you can examine the effects of the two predictor variables concurrently. You can also determine whether they interact with respect to their effect on the response variable. An interaction means that the effects on one variable depend on the value of another variable. If there is no interaction, you can interpret the test for the individual factor effects to determine their significance. If an interaction exists between any factors, the test for the individual factor effects might be misleading due to the masking of these effects by the interaction.
You can include interactions and more than one predictor variable in the ANOVA model.
You can graphically explore the relationship between the response variable and the effect of the interaction between the two predictor variables using PROC SGPLOT.
You can use PROC GLM to determine whether the effects of the predictor variables and the interaction between them are statistically significant.
When running PROC GLM, you can add a STORE statement to save your analysis results. By using the STORE statement, you can run postprocessing analyses on the stored results, even if you no longer have access to the original data set. The STORE statement requests that the procedure save the context and results of the statistical analysis into an item store. To perform post-fitting statistical analyses and plotting for the contents of the store item, you use the PLM procedure.
Syntax
To go to the movie where you learned a statement or option, select a link.
PROC TTEST DATA=SAS-data-set
Selected Options in PROC TTEST Statement Option PROC TTEST PLOTS(SHOWNULL)=INTERVAL SIDES=U SIDES=L
PROC GLM DATA=SAS-data-set
Selected Options in PROC GLM Statement Option PROC GLM PLOTS(ONLY) DIAGNOSTICS(UNPACK) MEANS
HOVTEST LSMEANS
PDIFF=ALL ADJUST=
PROC PLM RESTORE=item-store-specification
EFFECTPLOT <plot-type <(plot-definition options)>>
</ options>;
LSMEANS <model-effects > </ options>;
LSMESTIMATE model-effect <'label'> values
<divisor=n><,...<'label'> values
<divisor=n>> </ options>;
SHOW options;
SLICE model-effect </ options>;
WHERE expression ;
RUN;
Selected Option in PROC PLM Statement Option PROC PLM
RESTORE
Sample Programs
Exploring Associations with Box Plots
proc sgplot data=statdata.ameshousing3; vbox SalePrice / category=Central_Air connect=mean; title “Sale Price Differences across Central Air”; run;
proc sgplot data=statdata.ameshousing3; vbox SalePrice / category=Fireplaces connect=mean; title “Sale Price Differences across Fireplaces”; run;
proc sgplot data=statdata.ameshousing3; vbox SalePrice / category=Heating_QC connect=mean; title “Sale Price Differences across Heating Quality”; run;
Running PROC TTEST in SAS
proc ttest data=statdata.testscores plots(shownull)=interval; class Gender; var SATScore; title ‘Two-Sample t-Test Comparing Girls to Boys’; run; title;
Performing a One-Sided t-Test
proc ttest data=statdata.testscores plots(shownull)=interval h0=0 sides=U; class Gender; var SATScore; title ‘One-Sided t-Test Comparing Girls to Boys’; run; title;
Examining Descriptive Statistics across Groups
proc means data=statdata.mggarlic printalltypes maxdec=3; var BulbWt; class Fertilizer; title ‘Descriptive Statistics of Garlic Weight’; run;
proc sgplot data=statdata.mggarlic; vbox BulbWt / category=Fertilizer datalabel=BedID; format BedID 5.; title ‘Box Plots of Garlic Weight’; run; title;
Using the GLM Procedure
proc glm data=statdata.mggarlic plots(only)=diagnostics(unpack); class Fertilizer; model BulbWt=Fertilizer; means Fertilizer / hovtest; title ‘Testing for Equality of Means with PROC GLM’; run; quit; title;
Performing ANOVA with Blocking
proc glm data=statdata.mggarlic_block plots(only)=diagnostics(unpack); class Fertilizer Sector; model BulbWt=Fertilizer Sector; title ‘ANOVA for Randomized Block Design’; run; quit; title;
Performing a Post Hoc Pairwise Comparison
ods select lsmeans diff meanplot diffplot controlplot;
proc glm data=statdata.mggarlic_block; class Fertilizer Sector; model BulbWt=Fertilizer Sector; lsmeans Fertilizer / pdiff=all adjust=tukey; lsmeans Fertilizer / pdiff=controlu(‘4’) adjust=dunnett; lsmeans Fertilizer / pdiff=all adjust=t; title ‘Garlic Data: Multiple Comparisons’; run; quit; title;
Examining Your Data with PROC MEANS
proc format; value dosef 1=“Placebo” 2=“100mg” 3=“200mg” 4=“500mg”; run;
proc means data=statdata.drug mean var std printalltypes; class Disease DrugDose; var BloodP; output out=means mean=BloodP_Mean; format DrugDose dosef.; title ‘Selected Descriptive Statistics for Drug Data Set’; run; title;
Examining Your Data with PROC SGPLOT
proc sgplot data=means; where TYPE=3; scatter x=DrugDose y=BloodP_Mean / group=Disease markerattrs=(size=10); series x=DrugDose y=BloodP_Mean / group=Disease lineattrs=(thickness=2); xaxis integer; format DrugDose dosef.; title ‘Plot of Stratified Means in Drug Data Set’; run; title;
Performing Two-Way ANOVA with Interactions
proc glm data=statdata.drug; class DrugDose Disease; model Bloodp=DrugDose Disease DrugDose*Disease; format DrugDose dosef.; title1 ‘Analyze the Effects of DrugDose and Disease’; title2 ‘including Interactions’; run; quit; title;
Performing a Post Hoc Pairwise Comparison
proc format; value dosef 1=“Placebo” 2=“100mg” 3=“200mg” 4=“500mg”; run;
ods select meanplot lsmeans slicedanova;
proc glm data=statdata.drug; class DrugDose Disease; model Bloodp=DrugDose Disease DrugDoseDisease; lsmeans DrugDoseDisease / slice=Disease; format DrugDose dosef.; title ‘Analyze the Effects of DrugDose at Each Level of Disease’; run; quit; title;
Performing Two-Way ANOVA with an Interaction by Using PROC GLM
ods graphics on;
proc glm data=statdata.ameshousing3 order=internal plots(only)=intplot; class Season_Sold Heating_QC; model SalePrice=Heating_QC Season_Sold Heating_QCSeason_Sold; lsmeans Heating_QCSeason_Sold / diff slice=Heating_QC; format Season_Sold Season.; store out=interact; title “Model with Heating Quality and Season as Interacting” “Predictors”; run;
Performing Postprocessing Analysis by Using PROC PLM
ods graphics on;
/* previous program / / must be run in the same SAS session/ / proc glm data=statdata.ameshousing3 order=internal plots(only)=intplot; class Season_Sold Heating_QC; model SalePrice=Heating_QC Season_Sold Heating_QCSeason_Sold; lsmeans Heating_QCSeason_Sold / diff slice=Heating_QC; format Season_Sold Season.; store out=interact; title “Model with Heating Quality and Season as Interacting” “Predictors”; run; */
proc plm restore=interact plots=all; slice Heating_QC*Season_Sold / sliceby=Heating_QC adjust=tukey; effectplot interaction(sliceby=Heating_QC) / clm; run;
Copyright © 2017 SAS Institute Inc., Cary, NC, USA. All rights reserved.