1 Summary

This is part of an ongoing project to increase skills in data analysis, data visualization, and R coding for my lab. I started this in the summer of 2018 by sending an excel spreadsheet to my 5 graduate studens with a brief description of what it was and how we might break down the data, orgainze things, visualize the data and run basic statistics tests. Every few weeks, we work on a new problem and add it to the notebook and then we compare and discuss the solutions in a lab meeting.

The end goal is that each of my students will have their own personal R notebook that they can add to as a reposity for their own code chunks, functions, and solutions to common problems.

The full R Notebook and data file can be found on GitHub

1.1 Background

The data we are using was collected in our lab over a number of years. The task itself was a classification learning task in which subjects learned to classify exemplars that were separated into two categories by a rule (Rule Based or RB) or categories that were not separated by a verbalizable rule (II for Information Integration). See this paper for an example and more deatil about this technique. We also collected surveys that asked about participants behavioural habits and we recorded things like the time of year time of day that the experiment was run.

It does not matter too much for the purpose of this notebook, however. I’m just using this as a data set that is tipical of why we use in my lab.

We are using this in my lab as a way to learn how to unpack a large semi-structured Excel spreadsheet and turned it into usable, reproducible data.

1.2 Challenge 1

The first R challenge was just to read in the data from the excel file, create a dataframe with Total as the DV (this is the total proportion correct for each subject) and Category (II/RB), Month when they were tested, and Time what time of day they were tested as independent variables. From this data frame, we should be able to obtain the summary stats broken down by independent variabales, along with data visualizations.

##Challenge 2 Include an additional visualization that you did not use before and then do means testing with ANOVA for the Category X Month breakdown and then the Category * Time breakdown.

1.3 Challenge 3

Challenge three will be to create an entirely new data frame in long format. This will use the numbers in the “by block” category as the DV. But before we can analyize by block, we need to have a single column (“perfomrance”) with the performance at that block for a subject, and another column “Block” with the numbers 1, 2, 3, and 4 for each block. This means that the data file will be 4 times taller than the original set. The resulting file should have a column for Subject, Category, Month, Time, Block, and Performance.

variableX<-2+3
variableX

## [1] 5

2 Initial steps

2.1 Libraries

The first step to this analysis is to load the necessary libraries. Just uncomment these if they are not already installed.

# install.packages('ggplot2')
# install.packages('readxl') #for reading
# install.packages('ez') #for ANOVA
# install.packages('apaTables') #for formatting anova in APA format
# install.packages('RColorBrewer') #for acccesable plots
# install.packages('schoRsch') #for formatting anova in APA format
#install.packages('Rmisc') #quick and easy summarizing
#install.packages('summarytools')

library(ggplot2) #for plotting
library(readxl) #reading in excel docs  
library(ez) #this package is calls the car package and runs basic ANOVAs and other stats
library(apaTables) 
library(RColorBrewer) 
library(schoRsch)
library(Rmisc)
library(summarytools)
library(knitr)

2.2 Data Files

The second step is to load in the data file into a data frame. At the same time, we want to create a data frame with the means, SD, and SE for summarizing things and barplots. I dropped data from the month of may because I know that no one learned the rule described category set. This wasn’t made clear in the original instructions for the R challenge so I’m probably the only one who did this

There are number of ways to read in data files. The simplest is to read in a pure text file, my preferred method is to use the ‘readxl’ package to import the data file directly from Excel. I then create a subset of the original data without data from the month of May

ModifiedFullData <- read_excel("ModifiedFullData.xlsx")
#we drop the may data, becasue there are not many people, no one learned the 
#RD set, and it's a different sample, recruited in a different way.
#the full data that will be reported in the paper will not include these data
dm<-subset(ModifiedFullData,Month!="05_May")

2.3 Summarize the whole frame

You can use summarytools to generate a quick and nice-looking handout. This is one way to look at all the data at one time. It does not creat an object, though. One caveat, this won’t actually show up in the notebook, so I’m going to keep this commented for now.

#view(dfSummary(dm))
#dfSummary(dm)

3 R Challenge 1 - Summary and Data Viz

The first R challenge was about reading in the data, visualizing it, and creating a table of means. In this case, were going to be looking at the column of data that corresponds to total performance and averaging across category set.

The first analysis is an examination of total performance by Month. The following tables and box plots the mean/median performance on each kind of category set (RD - Rule Described, II - Information Integration). The general trend, clearly evident in the data, is an effect of category set on performance. Participants perform much better on the RD categories relative to the II categories. Also evident in the data, is some variability by month. A visual inspection of the data reveals several outliers, but no clear interaction between category set and month.

3.1 Category Set By Month Table

I create a list of factors that I want to I average across, in this case its a list of category and month. I used the aggregate function from core R to create a table of means, a table of standard deviations, number of observations, and I calculated the standard error. I glued these together into a overall data frame called means. I can then call the means data frame to see all of the numbers. I should probably format this better, but for now it gives me everything I need.

3.1.1 Using Aggregate

This was how I carried it out the first time. I used the core R aggregate function to calculate means, sd, se, etc. Then glued it together in a single frame. Not very elegant but it works

f<-list(dm$Cat,dm$Month)
means<-aggregate(dm$Total, f, FUN="mean")
sd<-aggregate(dm$Total, f, FUN="sd")
n<-aggregate(dm$Total, f, FUN="length")
SE<-sd$x/sqrt(n$x)
colnames(means)<-c("Cat", "Month", "Mean")
means$sd<-sd$x
means$n<-n$x
means$SE<-SE
means

##    Cat  Month      Mean         sd  n          SE
## 1   II 01_Jan 0.6736429 0.08647259 39 0.013846696
## 2   RD 01_Jan 0.7771440 0.10990142 25 0.021980284
## 3   II 02_Feb 0.6351750 0.06351079 24 0.012964085
## 4   RD 02_Feb 0.7252933 0.13957327 15 0.036037662
## 5   II 03_Mar 0.6097871 0.07909173 31 0.014205295
## 6   RD 03_Mar 0.7215545 0.12718169 33 0.022139490
## 7   II 04_Apr 0.6534290 0.08082444 31 0.014516498
## 8   RD 04_Apr 0.7533683 0.11394962 41 0.017795941
## 9   II 09_Sep 0.7001296 0.05571264 27 0.010721903
## 10  RD 09_Sep 0.7875217 0.08504029 68 0.010312650
## 11  II 10_Oct 0.6659231 0.08071443 45 0.012032196
## 12  RD 10_Oct 0.7410893 0.07214378 63 0.009089262
## 13  II 11_Nov 0.6784375 0.07653876 10 0.024203680
## 14  RD 11_Nov 0.7417468 0.03186927  4 0.015934637

3.1.2 Using Summary SE

Probably easier to use the summarySE function. This function is part of the Rmisc library and is well suited for generating quick tables when the data are properly structured in long format.

First a table that just calculates the means for each category set.

dt<-summarySE(data = dm, measurevar="Total", groupvars = c("Cat"),
    na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dt, digits=3)#simple way to round

Cat	N	Total	sd	se	ci
II	207	0.659	0.080	0.006	0.011
RD	249	0.756	0.102	0.006	0.013

This table breaks the data into category and month

dtM<-summarySE(data = dm, measurevar="Total", groupvars = c("Cat","Month"),
    na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dtM, digits=3)

Cat	Month	N	Total	sd	se	ci
II	01_Jan	39	0.674	0.086	0.014	0.028
II	02_Feb	24	0.635	0.064	0.013	0.027
II	03_Mar	31	0.610	0.079	0.014	0.029
II	04_Apr	31	0.653	0.081	0.015	0.030
II	09_Sep	27	0.700	0.056	0.011	0.022
II	10_Oct	45	0.666	0.081	0.012	0.024
II	11_Nov	10	0.678	0.077	0.024	0.055
RD	01_Jan	25	0.777	0.110	0.022	0.045
RD	02_Feb	15	0.725	0.140	0.036	0.077
RD	03_Mar	33	0.722	0.127	0.022	0.045
RD	04_Apr	41	0.753	0.114	0.018	0.036
RD	09_Sep	68	0.788	0.085	0.010	0.021
RD	10_Oct	63	0.741	0.072	0.009	0.018
RD	11_Nov	4	0.742	0.032	0.016	0.051

3.2 Category Set By Month BoxPlot

The box part is pretty straightforward, I use the data frame that I created with the subset, and part performance by xmonth. I prefer to use Colour Brewer to create interesting pallets. This one is called paired, and the more observations you have the more shades of blue it gives you. I also like the classic theme

ggplot(dm,aes(x=Month, y=Total, fill=Cat))+
  geom_boxplot() +
  ggtitle("Performance by Month") +
  scale_fill_discrete(name = "Category")+
  scale_fill_brewer(palette="Paired")+
  theme_classic()

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

3.3 Category Set By Month Bar Plot

The box plot above is a very good visualization of the overall data, but doesn’t always work if you want to do means testing with t-tests or ANOVAs later. So I’m going to create a bar plot. This is where the aggregate data frame that I created above comes in handy. It’s much easier to do the bar plod on a table it means.

In particular, this makes it really easy to create the error bars.

The first plot was created with my old-fashion’ way of calculating the table of means. The second uses the “summarSE” function. They are the same, so we’ll just use this from now on.

ggplot(means,aes(x=Month, y=Mean, fill=Cat))+
  geom_bar(position = "dodge",color="black", stat="identity")+
  geom_errorbar(aes(ymin = Mean-SE, ymax = Mean+SE), width = 0.2,
                position =position_dodge(.9))+
  ggtitle("Performance by Month") +
  scale_fill_discrete(name = "Category")+
  scale_fill_brewer(palette="Paired")+
  theme_classic()

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

ggplot(dtM,aes(x=Month, y=Total, fill=Cat))+
  geom_bar(position = "dodge",color="black", stat="identity")+
  geom_errorbar(aes(ymin = Total-se, ymax = Total+se), width = 0.2,
                position =position_dodge(.9))+
  ggtitle("Performance by Month") +
  scale_fill_discrete(name = "Category")+
  scale_fill_brewer(palette="Paired")+
  theme_classic()

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

3.4 Analysis By Time

The second set of analysis we want to do is to look at total performance on the two category sets by time of day. The datafile already groups them into Morning, Early Afternoon and Late Afternoon, so we just need to use that column to create the data frame. Old way Again using the subset without data from the month of May, I created a factor list this time using category (Cat) and time (Time2). Time2 Was used in this case because it bins the actual time into the three larger time frames. This was done by the original program. Once I have the factor list, I can then aggregate to create a data frame called meansT (for time). I also aggregate to find the standard deviation, the number of observations and I calculate the standard error. I then glue them together into one data frame that I can plot and analyse.new way use the summarySE to generate the same data frame

dtT<-summarySE(data = dm, measurevar="Total", groupvars = c("Cat","Time2"),
    na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dt, digits=3)

Cat	N	Total	sd	se	ci
II	207	0.659	0.080	0.006	0.011
RD	249	0.756	0.102	0.006	0.013

3.5 Category Set By Time BoxPlot

The box plot shows the median and the range for performance at each of the three time points divided by category set. Not surprisingly, performance is always better on the rule described category sets. And because were looking at the same data just grouped by time instead of my month, there are outliers in this data as well.

ggplot(dm,aes(x=Time2, y=Total, fill=Cat))+
  geom_boxplot() +
  ggtitle("Performance by Time of Day") +
  scale_fill_discrete(name = "Category")+
  scale_fill_brewer(palette="Paired")+
  theme_classic()

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

3.6 Category Set By Month Bar Plot

the bar plot visualizes the data by means and standard error, rather than by median and quartile. Performance is always higher on the rule described category sets, and it doesn’t seem to be much variation with time.

ggplot(dtT,aes(x=Time2, y=Total, fill=Cat))+
  geom_bar(position = "dodge", color="black", stat="identity")+
  geom_errorbar(aes(ymin = Total-se, ymax = Total+se), width = 0.2,
                position =position_dodge(.9))+
  ggtitle("Performance by Time") +
  scale_fill_discrete(name = "Category")+
  scale_fill_brewer(palette="Paired")+
  theme_classic()

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

4 R Challenge 2 - ANOVAs

The main part of the second challenge was to do some means testings. Let’s test the means for Total as DV with Category and Month as the IVs. Just use the EZ package for now, though aov is OK here too (We’ll find out why and why not)

4.1 ANOVA on Total with Cat and Month

To set up a between subjects ANOVA using ez, specify the data file, the DV, the error term (wid) and the factors. We are using type 3 SS to align with SPSS.

a<-ezANOVA(dm, #the data file
       dv=.(Total), #DV
       wid=.(UniqueSubjNum), #subject as within ID means subject is the error term 
       between=.(Cat,Month), #Cat and Month as IVs
       detailed=TRUE,
       return_aov = FALSE,
       type=3)

The ANOVA can be displayed in a few ways, the first is the simplest. Just create a table using kable for the ANOVA portion of the object (a) that you put the ANOVA into. ezANOVA gives you the ANOVA and the Levine’s test and each one can be acessed separately

a$ANOVA

##        Effect DFn DFd          SSn      SSd            F            p
## 1 (Intercept)   1 442 132.74265483 3.590508 1.634093e+04 0.000000e+00
## 2         Cat   1 442   0.54351095 3.590508 6.690747e+01 3.051176e-15
## 3       Month   6 442   0.26327746 3.590508 5.401679e+00 2.109659e-05
## 4   Cat:Month   6 442   0.01960727 3.590508 4.022834e-01 8.775290e-01
##   p<.05         ges
## 1     * 0.973663721
## 2     * 0.131472767
## 3     * 0.068316579
## 4       0.005431202

kable(a$ANOVA, digits = 4)

Effect	DFn	DFd	SSn	SSd	F	p	p<.05	ges
(Intercept)	1	442	132.7427	3.5905	16340.9325	0.0000	*	0.9737
Cat	1	442	0.5435	3.5905	66.9075	0.0000	*	0.1315
Month	6	442	0.2633	3.5905	5.4017	0.0000	*	0.0683
Cat:Month	6	442	0.0196	3.5905	0.4023	0.8775		0.0054

here’s one was to get the full table in apa style.

apa.ezANOVA.table(a)

## 
## 
## ANOVA results
##  
## 
##    Predictor df_num df_den SS_num SS_den        F    p ges
##  (Intercept)      1    442 132.74   3.59 16340.93 .000 .97
##          Cat      1    442   0.54   3.59    66.91 .000 .13
##        Month      6    442   0.26   3.59     5.40 .000 .07
##  Cat x Month      6    442   0.02   3.59     0.40 .878 .01
## 
## Note. df_num indicates degrees of freedom numerator. df_den indicates degrees of freedom denominator. 
## SS_num indicates sum of squares numerator. SS_den indicates sum of squares denominator. 
## ges indicates generalized eta-squared.
##

here’s another way, a bit more information in the second one.

anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
          etasq = "partial", dfsep = ", ")

## $`--- ANOVA RESULTS     ------------------------------------`
##        Effect         MSE df1 df2        F     p petasq getasq
## 1 (Intercept) 0.008123322   1 442 16340.93 0.000   0.97   0.97
## 2         Cat 0.008123322   1 442    66.91 0.000   0.13   0.13
## 3       Month 0.008123322   6 442     5.40 0.000   0.07   0.07
## 4   Cat:Month 0.008123322   6 442     0.40 0.878   0.01   0.01
## 
## $`--- SPHERICITY TESTS  ------------------------------------`
## [1] "N/A"
## 
## $`--- FORMATTED RESULTS ------------------------------------`
##        Effect                                     Text
## 1 (Intercept) F(1,442) = 16340.93, p < .001, np2 = .97
## 2         Cat F(1,442) =    66.91, p < .001, np2 = .13
## 3       Month F(6,442) =     5.40, p < .001, np2 = .07
## 4   Cat:Month F(6,442) =     0.40, p = .878, np2 = .01
## 
## $`NOTE:`
## [1] "Reporting unadjusted p-values."

4.2 ANOVA on Total RD (only) with Month

Create a subset and do the one way ANOVA. Not needed, BTW since there is no interaction term. We’re just looking at how to do the ANOVA…

dmRD<-subset(dm,dm$Cat=="RD")
a<-ezANOVA(dmRD, 
       dv=.(Total), 
       wid=.(UniqueSubjNum), 
       between=.(Month), 
       detailed=TRUE,
       return_aov = FALSE,
       type=3)

anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
          etasq = "partial", dfsep = ", ")

## $`--- ANOVA RESULTS     ------------------------------------`
##        Effect        MSE df1 df2       F     p petasq getasq
## 1 (Intercept) 0.00995814   1 242 6257.50 0.000   0.96   0.96
## 2       Month 0.00995814   6 242    2.46 0.025   0.06   0.06
## 
## $`--- SPHERICITY TESTS  ------------------------------------`
## [1] "N/A"
## 
## $`--- FORMATTED RESULTS ------------------------------------`
##        Effect                                    Text
## 1 (Intercept) F(1,242) = 6257.50, p < .001, np2 = .96
## 2       Month F(6,242) =    2.46, p = .025, np2 = .06
## 
## $`NOTE:`
## [1] "Reporting unadjusted p-values."

4.3 ANOVA on Total II (only) with Month

Same as above, but on Month

dmII<-subset(dm,dm$Cat=="II")
a<-ezANOVA(dmII, 
       dv=.(Total), 
       wid=.(UniqueSubjNum), 
       between=.(Month), 
       detailed=TRUE,
       return_aov = FALSE,
       type=3)

#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
          etasq = "partial", dfsep = ", ")

## $`--- ANOVA RESULTS     ------------------------------------`
##        Effect         MSE df1 df2        F     p petasq getasq
## 1 (Intercept) 0.005903192   1 200 12402.99 0.000   0.98   0.98
## 2       Month 0.005903192   6 200     4.22 0.001   0.11   0.11
## 
## $`--- SPHERICITY TESTS  ------------------------------------`
## [1] "N/A"
## 
## $`--- FORMATTED RESULTS ------------------------------------`
##        Effect                                     Text
## 1 (Intercept) F(1,200) = 12402.99, p < .001, np2 = .98
## 2       Month F(6,200) =     4.22, p = .001, np2 = .11
## 
## $`NOTE:`
## [1] "Reporting unadjusted p-values."

##ANOVA on Total with Cat and Time This is an ANOVA on total with Cat and Time 2 as the factores. This goes along with the bar plot above.

a<-ezANOVA(dm, 
       dv=.(Total), 
       wid=.(UniqueSubjNum), 
       between=.(Cat,Time2), 
       detailed=TRUE,
       return_aov = FALSE,
       type=3)

#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
          etasq = "partial", dfsep = ", ")

## $`--- ANOVA RESULTS     ------------------------------------`
##        Effect        MSE df1 df2        F     p petasq getasq
## 1 (Intercept) 0.00861368   1 450 25502.38 0.000   0.98   0.98
## 2         Cat 0.00861368   1 450   120.09 0.000   0.21   0.21
## 3       Time2 0.00861368   2 450     0.31 0.734   0.00   0.00
## 4   Cat:Time2 0.00861368   2 450     0.34 0.714   0.00   0.00
## 
## $`--- SPHERICITY TESTS  ------------------------------------`
## [1] "N/A"
## 
## $`--- FORMATTED RESULTS ------------------------------------`
##        Effect                                     Text
## 1 (Intercept) F(1,450) = 25502.38, p < .001, np2 = .98
## 2         Cat F(1,450) =   120.09, p < .001, np2 = .21
## 3       Time2 F(2,450) =     0.31, p = .734, np2 < .01
## 4   Cat:Time2 F(2,450) =     0.34, p = .714, np2 < .01
## 
## $`NOTE:`
## [1] "Reporting unadjusted p-values."

4.4 ANOVA on Total RD (Only) with Time

a<-ezANOVA(dmRD, 
       dv=.(Total), 
       wid=.(UniqueSubjNum), 
       between=.(Time2), 
       detailed=TRUE,
       return_aov = FALSE,
       type=3)

#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
          etasq = "partial", dfsep = ", ")

## $`--- ANOVA RESULTS     ------------------------------------`
##        Effect        MSE df1 df2        F     p petasq getasq
## 1 (Intercept) 0.01035358   1 246 13701.98 0.000   0.98   0.98
## 2       Time2 0.01035358   2 246     0.48 0.617   0.00   0.00
## 
## $`--- SPHERICITY TESTS  ------------------------------------`
## [1] "N/A"
## 
## $`--- FORMATTED RESULTS ------------------------------------`
##        Effect                                     Text
## 1 (Intercept) F(1,246) = 13701.98, p < .001, np2 = .98
## 2       Time2 F(2,246) =     0.48, p = .617, np2 < .01
## 
## $`NOTE:`
## [1] "Reporting unadjusted p-values."

4.5 ANOVA on Total II (Only) with Time

a<-ezANOVA(dmII, 
       dv=.(Total), 
       wid=.(UniqueSubjNum), 
       between=.(Time2), 
       detailed=TRUE,
       return_aov = FALSE,
       type=3)

#apa.ezANOVA.table(a)
anova_out(a, print = TRUE, sph.cor = "no", mau.p = 0.05,
          etasq = "partial", dfsep = ", ")

## $`--- ANOVA RESULTS     ------------------------------------`
##        Effect        MSE df1 df2        F     p petasq getasq
## 1 (Intercept) 0.00651557   1 204 13104.59 0.000   0.98   0.98
## 2       Time2 0.00651557   2 204     0.08 0.922   0.00   0.00
## 
## $`--- SPHERICITY TESTS  ------------------------------------`
## [1] "N/A"
## 
## $`--- FORMATTED RESULTS ------------------------------------`
##        Effect                                     Text
## 1 (Intercept) F(1,204) = 13104.59, p < .001, np2 = .98
## 2       Time2 F(2,204) =     0.08, p = .922, np2 < .01
## 
## $`NOTE:`
## [1] "Reporting unadjusted p-values."

5 Challenge 3: Creating a new Data Frame

Challenge three will be to create an entirely new data frame in long format. This will use the numbers in the “by block” category as the DV. But before we can analyize by block, we need to have a single column (“performance”) with the performance at that block for a subject, and another column “Block” with the numbers 1, 2, 3, and 4 for each block. This means that the data file will be 4 times taller than the original set. The resulting file should have a column for Subject, Category, Month, Time, Block, and Performance.

5.1 Step 1

create The new data frame from existing data file. We do that my creating a variable that has the name we want for the column and putting the approriate values into that. We’ll do block separately.

Subj<-dm$UniqueSubjNum
Cat<-dm$Cat
Month<-dm$Month
Time<-dm$Time2
Block1<-dm$`1_Block`
Block2<-dm$`2_Block`
Block3<-dm$`3_Block`
Block4<-dm$`4_Block`
dm_wide <-data.frame(Subj, Cat, Month, Time, Block1, Block2, Block3, Block4)

Next we need to make this three times taller. There are a few ways to do this,

library(tidyr) #for easy data manipulations
dm_wide<- gather(dm_wide, Block, Perf, Block1:Block4, factor_key=TRUE)

Now let’s plot learning curves. First make a table

dt_B<-summarySE(data = dm_wide, measurevar="Perf", groupvars = c("Cat", "Block"),
    na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
kable(dt_B, digits=3)#simple way to round

Cat	Block	N	Perf	sd	se	ci
II	Block1	207	0.615	0.074	0.005	0.010
II	Block2	207	0.659	0.092	0.006	0.013
II	Block3	207	0.678	0.103	0.007	0.014
II	Block4	207	0.682	0.114	0.008	0.016
RD	Block1	249	0.667	0.120	0.008	0.015
RD	Block2	249	0.775	0.129	0.008	0.016
RD	Block3	249	0.788	0.125	0.008	0.016
RD	Block4	249	0.794	0.131	0.008	0.016

ggplot(dt_B,aes(x=Block, y=Perf, group=Cat, colour=Cat))+
  geom_line(aes(linetype=Cat))+
  geom_point(aes(shape=Cat))+
  geom_errorbar(aes(ymin = Perf-se, ymax = Perf+se) ,width = 0.1)+
  ggtitle("Performance by Block") +
  scale_fill_discrete(name = "Category")+
  scale_fill_brewer(palette="Paired")+
  theme_classic()

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

5.2 ANOVA