class: center, middle, inverse, title-slide # ANOVA in R Cookbook ### Chris Ives --- # Contents * Foundations + Importing Data + Descriptives + Plots * ANOVA + One-way ANOVA + Pairwise Comparisons + Planned Contrasts --- ## Installing the `needs` Package - Can install packages and load them - No longer need `install.packages()` and `library()` - It will ask if you would like to load it every time RStudio opens. Select yes. ```r install.packages("needs") ``` --- ## Importing the Data Normally we would use the `import()` function to import our data. However, with the SPSS `.sav` files we don't get factor labels. ```r data_import <- import(here("Lab 2/Lab2_Vocab.sav")) head(data_import) ``` ``` ## idnum vocab instruct ## 1 1 53 1 ## 2 2 49 1 ## 3 3 47 1 ## 4 4 42 1 ## 5 5 51 1 ## 6 6 34 1 ``` Notice how we have 1s down the instruct column. --- ## The Fix For now, when working with `.sav` files we will use the `read.sav` function from the `misty` package. `use.value.labels = TRUE` tells it to use the labels as the cell values. ```r needs(misty) l2_data <- misty::read.sav(here("Lab 2/Lab2_Vocab.sav"), use.value.labels = TRUE) head(l2_data) ``` ``` ## idnum vocab instruct ## 1 1 53 physical science ## 2 2 49 physical science ## 3 3 47 physical science ## 4 4 42 physical science ## 5 5 51 physical science ## 6 6 34 physical science ``` --- ## Checking The Assumptions of ANOVA - Independence of observations - Normality - Homogeneity of variance --- ## Descriptive Statistics Make sure the `psych` package is installed and loaded: ```r needs(psych) ``` `describe` from the `psych` package is one of the more popular functions for descriptive statistics. ```r describe(l2_data) ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["vars"],"name":[1],"type":["int"],"align":["right"]},{"label":["n"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["mean"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["sd"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["median"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["trimmed"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["mad"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["min"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["max"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["range"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["skew"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["kurtosis"],"name":[12],"type":["dbl"],"align":["right"]},{"label":["se"],"name":[13],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"36","3":"18.5","4":"10.5356538","5":"18.5","6":"18.5","7":"13.3434","8":"1","9":"36","10":"35","11":"0.0000000","12":"-1.3003629","13":"1.7559423","_rn_":"idnum"},{"1":"2","2":"36","3":"33.5","4":"12.8652355","5":"35.5","6":"34.2","7":"12.6021","8":"6","9":"53","10":"47","11":"-0.5359932","12":"-0.8726857","13":"2.1442059","_rn_":"vocab"},{"1":"3","2":"36","3":"2.0","4":"0.8280787","5":"2.0","6":"2.0","7":"1.4826","8":"1","9":"3","10":"2","11":"0.0000000","12":"-1.5821759","13":"0.1380131","_rn_":"instruct*"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Independence of Observations - Do participants cross groups? - We know this if we know the study design - In this study, each student experienced one and only one lecture --- ## Normality - We can check the skew and kurtosis values from our `describe()` output. - 0 ± 2 is a good rule of thumb for a tenable assumption of normality. <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["vars"],"name":[1],"type":["int"],"align":["right"]},{"label":["n"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["skew"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["kurtosis"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"36","3":"0.0000000","4":"-1.3003629","_rn_":"idnum"},{"1":"2","2":"36","3":"-0.5359932","4":"-0.8726857","_rn_":"vocab"},{"1":"3","2":"36","3":"0.0000000","4":"-1.5821759","_rn_":"instruct*"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> - Instruct is a categorical variable, so we can ignore the skew and kurtosis on that. --- ## Visual Inspection for Normality - For the most part, plots will be wrapped using the `ggplot()` function. ### Histogram ```r ggplot(data = l2_data, aes(x = vocab)) + geom_histogram() ``` - `aes()` refers to aesthetics. What are the variables we want represented in our plots? Since we just want counts of a single continuous variable, we just need to specify our `x` (i.e., `x = vocab`). --- ## Histograms ```r ggplot(data = l2_data, aes(x = vocab)) + geom_histogram() ``` <img src="Anova_Ckbk_wk1-3_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ## Boxplot - To get boxplots, we just substitute `geom_histogram()` for `geom_boxplot()` and modify our aesthetics. ```r ggplot(data = l2_data, aes(x = instruct, y = vocab)) + geom_boxplot() ``` <img src="Anova_Ckbk_wk1-3_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## Homogeneity of Variance Homogeneity test is a separate analysis. We'll just use the `leveneTest` function for the `car` package. Formula is the same as for the ANOVA we want to run. Specify `center = "mean"` (function's default is median) to match SPSS results. ```r needs(car) car::leveneTest(vocab ~ instruct, data = l2_data, center = "mean") ``` ``` ## Levene's Test for Homogeneity of Variance (center = "mean") ## Df F value Pr(>F) ## group 2 7.5054 0.002058 ** ## 33 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` Our significant result shows error variance around the *mean* is not equal across groups. --- class: inverse-blue middle # Running ANOVAs --- Example here is from Lab 1 data. ANOVAs will be run using the `anova_test()` function. Formula is specified as `DV ~ IV`. My convention is to number my model objects (e.g., m1, m2, etc.) ```r needs(rstatix) m1 <- anova_test(data = l1_data, formula = vocab ~ instruct, detailed = TRUE) ``` ``` ## Coefficient covariances computed by hccm() ``` ```r m1 ``` ``` ## ANOVA Table (type II tests) ## ## Effect SSn SSd DFn DFd F p p<.05 ges ## 1 instruct 1176 3824 1 22 6.766 0.016 * 0.235 ``` --- ## Effect Sizes Effect sizes can be specified using `effect.size = ____` in the `anova.test()` function. Use `effect.size = "ges"` for generalized eta squared or `effect.size = "pes"` for partial eta squared. Default is "`ges"`. ```r anova_test(data = l1_data, formula = vocab ~ instruct, effect.size = "ges") ``` --- ```r anova_test(data = l1_data, formula = vocab ~ instruct, effect.size = "ges") ``` ``` ## Coefficient covariances computed by hccm() ``` ``` ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 instruct 1 22 6.766 0.016 * 0.235 ``` ```r anova_test(data = l1_data, formula = vocab ~ instruct, effect.size = "pes") ``` ``` ## Coefficient covariances computed by hccm() ``` ``` ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 pes ## 1 instruct 1 22 6.766 0.016 * 0.235 ``` Notice they are the same for a one-way between subjects ANOVA. --- ## Posthoc Comparisons | Tukey If you write out `"data = l2data"` the function will produce an error since it designed for multiple types of objects. Here we just type "l2data". ```r rstatix::tukey_hsd(l2_data, formula = vocab ~ instruct) ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["group1"],"name":[2],"type":["chr"],"align":["left"]},{"label":["group2"],"name":[3],"type":["chr"],"align":["left"]},{"label":["null.value"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["estimate"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["conf.low"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["conf.high"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["p.adj"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["p.adj.signif"],"name":[9],"type":["chr"],"align":["left"]}],"data":[{"1":"instruct","2":"physical science","3":"social science","4":"0","5":"-14.0","6":"-25.825984","7":"-2.174016","8":"0.0174","9":"*","_rn_":"1"},{"1":"instruct","2":"physical science","3":"history","4":"0","5":"-5.5","6":"-17.325984","7":"6.325984","8":"0.4960","9":"ns","_rn_":"2"},{"1":"instruct","2":"social science","3":"history","4":"0","5":"8.5","6":"-3.325984","7":"20.325984","8":"0.1970","9":"ns","_rn_":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Posthoc Comparisons | Games-Howell Remember, Games-Howell is used if the assumption of variance homogeneity is violated. ```r rstatix::games_howell_test(vocab ~ instruct, data = l2_data, conf.level = 0.95, detailed = FALSE) ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":[".y."],"name":[1],"type":["chr"],"align":["left"]},{"label":["group1"],"name":[2],"type":["chr"],"align":["left"]},{"label":["group2"],"name":[3],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["conf.low"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["conf.high"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["p.adj"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["p.adj.signif"],"name":[8],"type":["chr"],"align":["left"]}],"data":[{"1":"vocab","2":"physical science","3":"social science","4":"-14.0","5":"-27.625565","6":"-0.3744354","7":"0.043","8":"*","_rn_":"1"},{"1":"vocab","2":"physical science","3":"history","4":"-5.5","5":"-15.459428","6":"4.4594280","7":"0.363","8":"ns","_rn_":"2"},{"1":"vocab","2":"social science","3":"history","4":"8.5","5":"-4.350238","6":"21.3502380","7":"0.235","8":"ns","_rn_":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Posthoc Comparisons | Bonferroni ```r pairwise_t_test(data = l2_data, vocab ~ instruct, p.adjust.method = "bonferroni") ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":[".y."],"name":[1],"type":["chr"],"align":["left"]},{"label":["group1"],"name":[2],"type":["chr"],"align":["left"]},{"label":["group2"],"name":[3],"type":["chr"],"align":["left"]},{"label":["n1"],"name":[4],"type":["int"],"align":["right"]},{"label":["n2"],"name":[5],"type":["int"],"align":["right"]},{"label":["p"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["p.signif"],"name":[7],"type":["chr"],"align":["left"]},{"label":["p.adj"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["p.adj.signif"],"name":[9],"type":["chr"],"align":["left"]}],"data":[{"1":"vocab","2":"physical science","3":"social science","4":"12","5":"12","6":"0.00651","7":"**","8":"0.0195","9":"*","_rn_":"1"},{"1":"vocab","2":"physical science","3":"history","4":"12","5":"12","6":"0.26200","7":"ns","8":"0.7860","9":"ns","_rn_":"2"},{"1":"vocab","2":"social science","3":"history","4":"12","5":"12","6":"0.08700","7":"ns","8":"0.2610","9":"ns","_rn_":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- class: inverse-green middle # Specified Contrasts * They are not automatic in R and are a bit difficult. It is not important that you understand the reasoning behind all of the following steps. --- ## Specified Contrasts Check the order of your levels. ```r levels(l3_data$instruct) ``` ``` ## [1] "physical science" "social science" "history" ``` Make sure they are in the order that you would like and code your contrasts accordingly. As an example, If they need to be reordered use: ```r l3_data$instruct <- factor(l3_data$instruct, levels = c("social science", "history", "physical science")) ``` Note: We will be using the original order in the following contrasts. --- **Step 1**: Set your contrasts. We'll do Helmert here. ```r contrast1 <- c(1, -.5, -.5) # physical science vs others contrast2 <- c(0, 1, -1) #social science vs history ``` **Step 2**: Bind the vectors into a temporary matrix. Constant should be equal to 1/(length of your vectors). ```r mat.temp <- rbind(constant=1/3, contrast1, contrast2) mat.temp ``` ``` ## [,1] [,2] [,3] ## constant 0.3333333 0.3333333 0.3333333 ## contrast1 1.0000000 -0.5000000 -0.5000000 ## contrast2 0.0000000 1.0000000 -1.0000000 ``` --- **Step 3**: Take the inverse of the matrix using the `solve()` function. Then we are dropping the first column with the constants. ```r mat <- solve(mat.temp) mat ``` ``` ## constant contrast1 contrast2 ## [1,] 1 0.6666667 0.0 ## [2,] 1 -0.3333333 0.5 ## [3,] 1 -0.3333333 -0.5 ``` ```r mat<- mat[ , -1] mat ``` ``` ## contrast1 contrast2 ## [1,] 0.6666667 0.0 ## [2,] -0.3333333 0.5 ## [3,] -0.3333333 -0.5 ``` --- **Step 4**: Run your model formula using `lm()` and set contrasts. Here we are linking our "instruct" variable with our contrast matrix. Remember: * contrast1 = physical science vs others * contrast2 = social science vs history ```r m_contrasts <- lm(vocab ~ instruct, data=l3_data, contrasts = list(instruct = mat)) summary(m_contrasts) ``` ``` ## ## Call: ## lm(formula = vocab ~ instruct, data = l3_data, contrasts = list(instruct = mat)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -22.000 -10.000 1.750 9.375 21.000 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 33.500 1.968 17.026 <2e-16 *** ## instructcontrast1 9.750 4.174 2.336 0.0257 * ## instructcontrast2 -8.500 4.819 -1.764 0.0870 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 11.81 on 33 degrees of freedom ## Multiple R-squared: 0.2061, Adjusted R-squared: 0.158 ## F-statistic: 4.284 on 2 and 33 DF, p-value: 0.02218 ``` --- ## What about my ANOVA results? To get the anova output, just run `rstatix::anova_test()` on your `m_contrasts` object. SPSS uses Type III SS by default, so I am matching it here. ```r m4 <- anova_test(m_contrasts, type="III", effect.size = "ges", detailed = TRUE) ``` ``` ## Coefficient covariances computed by hccm() ``` ```r m4 ``` ``` ## ANOVA Table (type tests) ## ## Effect SSn SSd DFn DFd F p p<.05 ges ## 1 (Intercept) 40401 4599 1 33 289.896 6.57e-18 * 0.898 ## 2 instruct 1194 4599 2 33 4.284 2.20e-02 * 0.206 ```