This is part of an ongoing project to increase skills in data analysis, data visualization, and R coding for my lab. I started this in the summer of 2018 by sending an excel spreadsheet to my 5 graduate studens with a brief description of what it was and how we might break down the data, orgainze things, visualize the data and run basic statistics tests. Every few weeks, we work on a new problem and add it to the notebook and then we compare and discuss the solutions in a lab meeting.
The end goal is that each of my students will have their own personal R notebook that they can add to as a reposity for their own code chunks, functions, and solutions to common problems.
The full R Notebook and data file can be found on GitHub
The data we are using was collected in our lab over a number of years. The task itself was a classification learning task in which subjects learned to classify exemplars that were separated into two categories by a rule (Rule Based or RB) or categories that were not separated by a verbalizable rule (II for Information Integration). See this paper for an example and more deatil about this technique. We also collected surveys that asked about participants behavioural habits and we recorded things like the time of year time of day that the experiment was run.
It does not matter too much for the purpose of this notebook, however. I’m just using this as a data set that is tipical of why we use in my lab.
We are using this in my lab as a way to learn how to unpack a large semi-structured Excel spreadsheet and turned it into usable, reproducible data.
The first R challenge was just to read in the data from the excel file, create a dataframe with Total as the DV (this is the total proportion correct for each subject) and Category (II/RB), Month when they were tested, and Time what time of day they were tested as independent variables. From this data frame, we should be able to obtain the summary stats broken down by independent variabales, along with data visualizations.
##Challenge 2 Include an additional visualization that you did not use before and then do means testing with ANOVA for the Category X Month breakdown and then the Category * Time breakdown.
Challenge three will be to create an entirely new data frame in long format. This will use the numbers in the “by block” category as the DV. But before we can analyize by block, we need to have a single column (“perfomrance”) with the performance at that block for a subject, and another column “Block” with the numbers 1, 2, 3, and 4 for each block. This means that the data file will be 4 times taller than the original set. The resulting file should have a column for Subject, Category, Month, Time, Block, and Performance.
variableX<-2+3
variableX
## [1] 5
The first step to this analysis is to load the necessary libraries. Just uncomment these if they are not already installed.
# install.packages('ggplot2')
# install.packages('readxl') #for reading
# install.packages('ez') #for ANOVA
# install.packages('apaTables') #for formatting anova in APA format
# install.packages('RColorBrewer') #for acccesable plots
# install.packages('schoRsch') #for formatting anova in APA format
#install.packages('Rmisc') #quick and easy summarizing
#install.packages('summarytools')
library(ggplot2) #for plotting
library(readxl) #reading in excel docs
library(ez) #this package is calls the car package and runs basic ANOVAs and other stats
library(apaTables)
library(RColorBrewer)
library(schoRsch)
library(Rmisc)
library(summarytools)
library(knitr)
The second step is to load in the data file into a data frame. At the same time, we want to create a data frame with the means, SD, and SE for summarizing things and barplots. I dropped data from the month of may because I know that no one learned the rule described category set. This wasn’t made clear in the original instructions for the R challenge so I’m probably the only one who did this
There are number of ways to read in data files. The simplest is to read in a pure text file, my preferred method is to use the ‘readxl’ package to import the data file directly from Excel. I then create a subset of the original data without data from the month of May
ModifiedFullData <- read_excel("ModifiedFullData.xlsx")
#we drop the may data, becasue there are not many people, no one learned the
#RD set, and it's a different sample, recruited in a different way.
#the full data that will be reported in the paper will not include these data
dm<-subset(ModifiedFullData,Month!="05_May")
You can use summarytools to generate a quick and nice-looking handout. This is one way to look at all the data at one time. It does not creat an object, though. One caveat, this won’t actually show up in the notebook, so I’m going to keep this commented for now.
#view(dfSummary(dm))
#dfSummary(dm)
The first R challenge was about reading in the data, visualizing it, and creating a table of means. In this case, were going to be looking at the column of data that corresponds to total performance and averaging across category set.
The first analysis is an examination of total performance by Month. The following tables and box plots the mean/median performance on each kind of category set (RD - Rule Described, II - Information Integration). The general trend, clearly evident in the data, is an effect of category set on performance. Participants perform much better on the RD categories relative to the II categories. Also evident in the data, is some variability by month. A visual inspection of the data reveals several outliers, but no clear interaction between category set and month.
I create a list of factors that I want to I average across, in this case its a list of category and month. I used the aggregate function from core R to create a table of means, a table of standard deviations, number of observations, and I calculated the standard error. I glued these together into a overall data frame called means. I can then call the means data frame to see all of the numbers. I should probably format this better, but for now it gives me everything I need.
This was how I carried it out the first time. I used the core R aggregate function to calculate means, sd, se, etc. Then glued it together in a single frame. Not very elegant but it works
f<-list(dm$Cat,dm$Month)
means<-aggregate(dm$Total, f, FUN="mean")
sd<-aggregate(dm$Total, f, FUN="sd")
n<-aggregate(dm$Total, f, FUN="length")
SE<-sd$x/sqrt(n$x)
colnames(means)<-c("Cat", "Month", "Mean")
means$sd<-sd$x
means$n<-n$x
means$SE<-SE
means
## Cat Month Mean sd n SE
## 1 II 01_Jan 0.6736429 0.08647259 39 0.013846696
## 2 RD 01_Jan 0.7771440 0.10990142 25 0.021980284
## 3 II 02_Feb 0.6351750 0.06351079 24 0.012964085
## 4 RD 02_Feb 0.7252933 0.13957327 15 0.036037662
## 5 II 03_Mar 0.6097871 0.07909173 31 0.014205295
## 6 RD 03_Mar 0.7215545 0.12718169 33 0.022139490
## 7 II 04_Apr 0.6534290 0.08082444 31 0.014516498
## 8 RD 04_Apr 0.7533683 0.11394962 41 0.017795941
## 9 II 09_Sep 0.7001296 0.05571264 27 0.010721903
## 10 RD 09_Sep 0.7875217 0.08504029 68 0.010312650
## 11 II 10_Oct 0.6659231 0.08071443 45 0.012032196
## 12 RD 10_Oct 0.7410893 0.07214378 63 0.009089262
## 13 II 11_Nov 0.6784375 0.07653876 10 0.024203680
## 14 RD 11_Nov 0.7417468 0.03186927 4 0.015934637
Probably easier to use the summarySE function. This function is part of the Rmisc library and is well suited for generating quick tables when the data are properly structured in long format.
First a table that just calculates the means for each category set.
dt<-summarySE(data = dm, measurevar="Total", groupvars = c("Cat"),
na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dt, digits=3)#simple way to round
| Cat | N | Total | sd | se | ci |
|---|---|---|---|---|---|
| II | 207 | 0.659 | 0.080 | 0.006 | 0.011 |
| RD | 249 | 0.756 | 0.102 | 0.006 | 0.013 |
This table breaks the data into category and month
dtM<-summarySE(data = dm, measurevar="Total", groupvars = c("Cat","Month"),
na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dtM, digits=3)
| Cat | Month | N | Total | sd | se | ci |
|---|---|---|---|---|---|---|
| II | 01_Jan | 39 | 0.674 | 0.086 | 0.014 | 0.028 |
| II | 02_Feb | 24 | 0.635 | 0.064 | 0.013 | 0.027 |
| II | 03_Mar | 31 | 0.610 | 0.079 | 0.014 | 0.029 |
| II | 04_Apr | 31 | 0.653 | 0.081 | 0.015 | 0.030 |
| II | 09_Sep | 27 | 0.700 | 0.056 | 0.011 | 0.022 |
| II | 10_Oct | 45 | 0.666 | 0.081 | 0.012 | 0.024 |
| II | 11_Nov | 10 | 0.678 | 0.077 | 0.024 | 0.055 |
| RD | 01_Jan | 25 | 0.777 | 0.110 | 0.022 | 0.045 |
| RD | 02_Feb | 15 | 0.725 | 0.140 | 0.036 | 0.077 |
| RD | 03_Mar | 33 | 0.722 | 0.127 | 0.022 | 0.045 |
| RD | 04_Apr | 41 | 0.753 | 0.114 | 0.018 | 0.036 |
| RD | 09_Sep | 68 | 0.788 | 0.085 | 0.010 | 0.021 |
| RD | 10_Oct | 63 | 0.741 | 0.072 | 0.009 | 0.018 |
| RD | 11_Nov | 4 | 0.742 | 0.032 | 0.016 | 0.051 |
The box part is pretty straightforward, I use the data frame that I created with the subset, and part performance by xmonth. I prefer to use Colour Brewer to create interesting pallets. This one is called paired, and the more observations you have the more shades of blue it gives you. I also like the classic theme
ggplot(dm,aes(x=Month, y=Total, fill=Cat))+
geom_boxplot() +
ggtitle("Performance by Month") +
scale_fill_discrete(name = "Category")+
scale_fill_brewer(palette="Paired")+
theme_classic()
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
The box plot above is a very good visualization of the overall data, but doesn’t always work if you want to do means testing with t-tests or ANOVAs later. So I’m going to create a bar plot. This is where the aggregate data frame that I created above comes in handy. It’s much easier to do the bar plod on a table it means.
In particular, this makes it really easy to create the error bars.
The first plot was created with my old-fashion’ way of calculating the table of means. The second uses the “summarSE” function. They are the same, so we’ll just use this from now on.
ggplot(means,aes(x=Month, y=Mean, fill=Cat))+
geom_bar(position = "dodge",color="black", stat="identity")+
geom_errorbar(aes(ymin = Mean-SE, ymax = Mean+SE), width = 0.2,
position =position_dodge(.9))+
ggtitle("Performance by Month") +
scale_fill_discrete(name = "Category")+
scale_fill_brewer(palette="Paired")+
theme_classic()
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
ggplot(dtM,aes(x=Month, y=Total, fill=Cat))+
geom_bar(position = "dodge",color="black", stat="identity")+
geom_errorbar(aes(ymin = Total-se, ymax = Total+se), width = 0.2,
position =position_dodge(.9))+
ggtitle("Performance by Month") +
scale_fill_discrete(name = "Category")+
scale_fill_brewer(palette="Paired")+
theme_classic()
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
The second set of analysis we want to do is to look at total performance on the two category sets by time of day. The datafile already groups them into Morning, Early Afternoon and Late Afternoon, so we just need to use that column to create the data frame. Old way Again using the subset without data from the month of May, I created a factor list this time using category (Cat) and time (Time2). Time2 Was used in this case because it bins the actual time into the three larger time frames. This was done by the original program. Once I have the factor list, I can then aggregate to create a data frame called meansT (for time). I also aggregate to find the standard deviation, the number of observations and I calculate the standard error. I then glue them together into one data frame that I can plot and analyse.new way use the summarySE to generate the same data frame
dtT<-summarySE(data = dm, measurevar="Total", groupvars = c("Cat","Time2"),
na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dt, digits=3)
| Cat | N | Total | sd | se | ci |
|---|---|---|---|---|---|
| II | 207 | 0.659 | 0.080 | 0.006 | 0.011 |
| RD | 249 | 0.756 | 0.102 | 0.006 | 0.013 |
The box plot shows the median and the range for performance at each of the three time points divided by category set. Not surprisingly, performance is always better on the rule described category sets. And because were looking at the same data just grouped by time instead of my month, there are outliers in this data as well.
ggplot(dm,aes(x=Time2, y=Total, fill=Cat))+
geom_boxplot() +
ggtitle("Performance by Time of Day") +
scale_fill_discrete(name = "Category")+
scale_fill_brewer(palette="Paired")+
theme_classic()
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
the bar plot visualizes the data by means and standard error, rather than by median and quartile. Performance is always higher on the rule described category sets, and it doesn’t seem to be much variation with time.
ggplot(dtT,aes(x=Time2, y=Total, fill=Cat))+
geom_bar(position = "dodge", color="black", stat="identity")+
geom_errorbar(aes(ymin = Total-se, ymax = Total+se), width = 0.2,
position =position_dodge(.9))+
ggtitle("Performance by Time") +
scale_fill_discrete(name = "Category")+
scale_fill_brewer(palette="Paired")+
theme_classic()
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
The main part of the second challenge was to do some means testings. Let’s test the means for Total as DV with Category and Month as the IVs. Just use the EZ package for now, though aov is OK here too (We’ll find out why and why not)
To set up a between subjects ANOVA using ez, specify the data file, the DV, the error term (wid) and the factors. We are using type 3 SS to align with SPSS.
a<-ezANOVA(dm, #the data file
dv=.(Total), #DV
wid=.(UniqueSubjNum), #subject as within ID means subject is the error term
between=.(Cat,Month), #Cat and Month as IVs
detailed=TRUE,
return_aov = FALSE,
type=3)
The ANOVA can be displayed in a few ways, the first is the simplest. Just create a table using kable for the ANOVA portion of the object (a) that you put the ANOVA into. ezANOVA gives you the ANOVA and the Levine’s test and each one can be acessed separately
a$ANOVA
## Effect DFn DFd SSn SSd F p
## 1 (Intercept) 1 442 132.74265483 3.590508 1.634093e+04 0.000000e+00
## 2 Cat 1 442 0.54351095 3.590508 6.690747e+01 3.051176e-15
## 3 Month 6 442 0.26327746 3.590508 5.401679e+00 2.109659e-05
## 4 Cat:Month 6 442 0.01960727 3.590508 4.022834e-01 8.775290e-01
## p<.05 ges
## 1 * 0.973663721
## 2 * 0.131472767
## 3 * 0.068316579
## 4 0.005431202
kable(a$ANOVA, digits = 4)
| Effect | DFn | DFd | SSn | SSd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | 1 | 442 | 132.7427 | 3.5905 | 16340.9325 | 0.0000 | * | 0.9737 |
| Cat | 1 | 442 | 0.5435 | 3.5905 | 66.9075 | 0.0000 | * | 0.1315 |
| Month | 6 | 442 | 0.2633 | 3.5905 | 5.4017 | 0.0000 | * | 0.0683 |
| Cat:Month | 6 | 442 | 0.0196 | 3.5905 | 0.4023 | 0.8775 | 0.0054 |
here’s one was to get the full table in apa style.
apa.ezANOVA.table(a)
##
##
## ANOVA results
##
##
## Predictor df_num df_den SS_num SS_den F p ges
## (Intercept) 1 442 132.74 3.59 16340.93 .000 .97
## Cat 1 442 0.54 3.59 66.91 .000 .13
## Month 6 442 0.26 3.59 5.40 .000 .07
## Cat x Month 6 442 0.02 3.59 0.40 .878 .01
##
## Note. df_num indicates degrees of freedom numerator. df_den indicates degrees of freedom denominator.
## SS_num indicates sum of squares numerator. SS_den indicates sum of squares denominator.
## ges indicates generalized eta-squared.
##
here’s another way, a bit more information in the second one.
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
etasq = "partial", dfsep = ", ")
## $`--- ANOVA RESULTS ------------------------------------`
## Effect MSE df1 df2 F p petasq getasq
## 1 (Intercept) 0.008123322 1 442 16340.93 0.000 0.97 0.97
## 2 Cat 0.008123322 1 442 66.91 0.000 0.13 0.13
## 3 Month 0.008123322 6 442 5.40 0.000 0.07 0.07
## 4 Cat:Month 0.008123322 6 442 0.40 0.878 0.01 0.01
##
## $`--- SPHERICITY TESTS ------------------------------------`
## [1] "N/A"
##
## $`--- FORMATTED RESULTS ------------------------------------`
## Effect Text
## 1 (Intercept) F(1,442) = 16340.93, p < .001, np2 = .97
## 2 Cat F(1,442) = 66.91, p < .001, np2 = .13
## 3 Month F(6,442) = 5.40, p < .001, np2 = .07
## 4 Cat:Month F(6,442) = 0.40, p = .878, np2 = .01
##
## $`NOTE:`
## [1] "Reporting unadjusted p-values."
Create a subset and do the one way ANOVA. Not needed, BTW since there is no interaction term. We’re just looking at how to do the ANOVA…
dmRD<-subset(dm,dm$Cat=="RD")
a<-ezANOVA(dmRD,
dv=.(Total),
wid=.(UniqueSubjNum),
between=.(Month),
detailed=TRUE,
return_aov = FALSE,
type=3)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
etasq = "partial", dfsep = ", ")
## $`--- ANOVA RESULTS ------------------------------------`
## Effect MSE df1 df2 F p petasq getasq
## 1 (Intercept) 0.00995814 1 242 6257.50 0.000 0.96 0.96
## 2 Month 0.00995814 6 242 2.46 0.025 0.06 0.06
##
## $`--- SPHERICITY TESTS ------------------------------------`
## [1] "N/A"
##
## $`--- FORMATTED RESULTS ------------------------------------`
## Effect Text
## 1 (Intercept) F(1,242) = 6257.50, p < .001, np2 = .96
## 2 Month F(6,242) = 2.46, p = .025, np2 = .06
##
## $`NOTE:`
## [1] "Reporting unadjusted p-values."
Same as above, but on Month
dmII<-subset(dm,dm$Cat=="II")
a<-ezANOVA(dmII,
dv=.(Total),
wid=.(UniqueSubjNum),
between=.(Month),
detailed=TRUE,
return_aov = FALSE,
type=3)
#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
etasq = "partial", dfsep = ", ")
## $`--- ANOVA RESULTS ------------------------------------`
## Effect MSE df1 df2 F p petasq getasq
## 1 (Intercept) 0.005903192 1 200 12402.99 0.000 0.98 0.98
## 2 Month 0.005903192 6 200 4.22 0.001 0.11 0.11
##
## $`--- SPHERICITY TESTS ------------------------------------`
## [1] "N/A"
##
## $`--- FORMATTED RESULTS ------------------------------------`
## Effect Text
## 1 (Intercept) F(1,200) = 12402.99, p < .001, np2 = .98
## 2 Month F(6,200) = 4.22, p = .001, np2 = .11
##
## $`NOTE:`
## [1] "Reporting unadjusted p-values."
##ANOVA on Total with Cat and Time This is an ANOVA on total with Cat and Time 2 as the factores. This goes along with the bar plot above.
a<-ezANOVA(dm,
dv=.(Total),
wid=.(UniqueSubjNum),
between=.(Cat,Time2),
detailed=TRUE,
return_aov = FALSE,
type=3)
#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
etasq = "partial", dfsep = ", ")
## $`--- ANOVA RESULTS ------------------------------------`
## Effect MSE df1 df2 F p petasq getasq
## 1 (Intercept) 0.00861368 1 450 25502.38 0.000 0.98 0.98
## 2 Cat 0.00861368 1 450 120.09 0.000 0.21 0.21
## 3 Time2 0.00861368 2 450 0.31 0.734 0.00 0.00
## 4 Cat:Time2 0.00861368 2 450 0.34 0.714 0.00 0.00
##
## $`--- SPHERICITY TESTS ------------------------------------`
## [1] "N/A"
##
## $`--- FORMATTED RESULTS ------------------------------------`
## Effect Text
## 1 (Intercept) F(1,450) = 25502.38, p < .001, np2 = .98
## 2 Cat F(1,450) = 120.09, p < .001, np2 = .21
## 3 Time2 F(2,450) = 0.31, p = .734, np2 < .01
## 4 Cat:Time2 F(2,450) = 0.34, p = .714, np2 < .01
##
## $`NOTE:`
## [1] "Reporting unadjusted p-values."
a<-ezANOVA(dmRD,
dv=.(Total),
wid=.(UniqueSubjNum),
between=.(Time2),
detailed=TRUE,
return_aov = FALSE,
type=3)
#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
etasq = "partial", dfsep = ", ")
## $`--- ANOVA RESULTS ------------------------------------`
## Effect MSE df1 df2 F p petasq getasq
## 1 (Intercept) 0.01035358 1 246 13701.98 0.000 0.98 0.98
## 2 Time2 0.01035358 2 246 0.48 0.617 0.00 0.00
##
## $`--- SPHERICITY TESTS ------------------------------------`
## [1] "N/A"
##
## $`--- FORMATTED RESULTS ------------------------------------`
## Effect Text
## 1 (Intercept) F(1,246) = 13701.98, p < .001, np2 = .98
## 2 Time2 F(2,246) = 0.48, p = .617, np2 < .01
##
## $`NOTE:`
## [1] "Reporting unadjusted p-values."
a<-ezANOVA(dmII,
dv=.(Total),
wid=.(UniqueSubjNum),
between=.(Time2),
detailed=TRUE,
return_aov = FALSE,
type=3)
#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
etasq = "partial", dfsep = ", ")
## $`--- ANOVA RESULTS ------------------------------------`
## Effect MSE df1 df2 F p petasq getasq
## 1 (Intercept) 0.00651557 1 204 13104.59 0.000 0.98 0.98
## 2 Time2 0.00651557 2 204 0.08 0.922 0.00 0.00
##
## $`--- SPHERICITY TESTS ------------------------------------`
## [1] "N/A"
##
## $`--- FORMATTED RESULTS ------------------------------------`
## Effect Text
## 1 (Intercept) F(1,204) = 13104.59, p < .001, np2 = .98
## 2 Time2 F(2,204) = 0.08, p = .922, np2 < .01
##
## $`NOTE:`
## [1] "Reporting unadjusted p-values."
Challenge three will be to create an entirely new data frame in long format. This will use the numbers in the “by block” category as the DV. But before we can analyize by block, we need to have a single column (“performance”) with the performance at that block for a subject, and another column “Block” with the numbers 1, 2, 3, and 4 for each block. This means that the data file will be 4 times taller than the original set. The resulting file should have a column for Subject, Category, Month, Time, Block, and Performance.
create The new data frame from existing data file. We do that my creating a variable that has the name we want for the column and putting the approriate values into that. We’ll do block separately.
Subj<-dm$UniqueSubjNum
Cat<-dm$Cat
Month<-dm$Month
Time<-dm$Time2
Block1<-dm$`1_Block`
Block2<-dm$`2_Block`
Block3<-dm$`3_Block`
Block4<-dm$`4_Block`
dm_wide <-data.frame(Subj, Cat, Month, Time, Block1, Block2, Block3, Block4)
Next we need to make this three times taller. There are a few ways to do this,
library(tidyr) #for easy data manipulations
dm_wide<- gather(dm_wide, Block, Perf, Block1:Block4, factor_key=TRUE)
Now let’s plot learning curves. First make a table
dt_B<-summarySE(data = dm_wide, measurevar="Perf", groupvars = c("Cat", "Block"),
na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dt_B, digits=3)#simple way to round
| Cat | Block | N | Perf | sd | se | ci |
|---|---|---|---|---|---|---|
| II | Block1 | 207 | 0.615 | 0.074 | 0.005 | 0.010 |
| II | Block2 | 207 | 0.659 | 0.092 | 0.006 | 0.013 |
| II | Block3 | 207 | 0.678 | 0.103 | 0.007 | 0.014 |
| II | Block4 | 207 | 0.682 | 0.114 | 0.008 | 0.016 |
| RD | Block1 | 249 | 0.667 | 0.120 | 0.008 | 0.015 |
| RD | Block2 | 249 | 0.775 | 0.129 | 0.008 | 0.016 |
| RD | Block3 | 249 | 0.788 | 0.125 | 0.008 | 0.016 |
| RD | Block4 | 249 | 0.794 | 0.131 | 0.008 | 0.016 |
ggplot(dt_B,aes(x=Block, y=Perf, group=Cat, colour=Cat))+
geom_line(aes(linetype=Cat))+
geom_point(aes(shape=Cat))+
geom_errorbar(aes(ymin = Perf-se, ymax = Perf+se) ,width = 0.1)+
ggtitle("Performance by Block") +
scale_fill_discrete(name = "Category")+
scale_fill_brewer(palette="Paired")+
theme_classic()
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
To set up a between subjects ANOVA using ez, specify the data file, the DV, the error term (wid) and the factors. We are using type 3 SS to align with SPSS.
a<-ezANOVA(dm_wide, #the data file
dv=.(Perf), #DV
wid=.(Subj), #subject as within ID means subject is the error term
between=.(Cat),#Cat and Month as IVs
within=.(Block),
detailed=TRUE,
return_aov = FALSE,
type=3)
apa.ezANOVA.table(a)
##
##
## ANOVA results
##
##
## Predictor df_num df_den Epsilon SS_num SS_den F p ges
## (Intercept) 1.00 454.00 904.57 17.42 23573.74 .000 .97
## Cat 1.00 454.00 4.28 17.42 111.47 .000 .15
## Block 2.49 1132.66 0.83 2.72 6.10 202.24 .000 .10
## Cat x Block 2.49 1132.66 0.83 0.31 6.10 23.19 .000 .01
##
## Note. df_num indicates degrees of freedom numerator. df_den indicates degrees of freedom denominator.
## Epsilon indicates Greenhouse-Geisser multiplier for degrees of freedom,
## p-values and degrees of freedom in the table incorporate this correction.
## SS_num indicates sum of squares numerator. SS_den indicates sum of squares denominator.
## ges indicates generalized eta-squared.
##