## Joining, by = "n"
You may need to install these packages before loading
library(ggplot2)
library(dplyr)
You can download the datafile, called woodward_data.csv from this week’s Learn resource. Then, create an R script in the same directory as your datafile, and read it in following the code below. Process and graph the data by following along with this guide.
woodward_data = read.csv("woodward_data.csv")
summary(woodward_data)
## n TestEvent Familiarization LT
## Min. : 1.00 NewGoal :96 Hand:96 Min. : 0.03849
## 1st Qu.: 8.75 NewLocation:96 Rod :96 1st Qu.: 5.23771
## Median :16.50 Median : 9.07272
## Mean :16.50 Mean :11.52830
## 3rd Qu.:24.25 3rd Qu.:15.83204
## Max. :32.00 Max. :30.00000
## TrialNumber
## Min. :1
## 1st Qu.:1
## Median :2
## Mean :2
## 3rd Qu.:3
## Max. :3
Here, the different columns mean:
n = subject number (i.e., which participant is which)Create a histogram of looking times
hist(woodward_data$PutTheRightColumnNameHere)
You can see that the data is skewed.
The graphs that you see in papers typically display by subject averages. What that means is, first, for each subject, you get their mean result in each condition, and then you find the overall mean in each condition, across the different subjects. So, remember that each subject here did 3 trials per condition. What we will do first is, for each subject, average across those three trials in each condition.
To do that we will use some dplyr functions, that make grouping and averaging easy.
subject_average = woodward_data %>%
group_by(COLUMN_FOR_SUBJECT_NUMBER, TEST_EVENT_COLUMN,FAMILIARIZATION_COLUMN) %>% #<-- FILL THIS IN
summarise(LT.mean = FUNCTION_FOR_MEAN(LT))
summary(subject_average)
hist(subject_average$COLUMN_FOR_LOOKING_TIME_MEAN)
You can see three interesting things here.
%>% operator takes the woodward_data variable and passes it to the function on the next line, which is called group_by.Your result should look like this:
## n TestEvent Familiarization LT.mean
## Min. : 1.00 NewGoal :32 Hand:32 Min. : 1.057
## 1st Qu.: 8.75 NewLocation:32 Rod :32 1st Qu.: 5.222
## Median :16.50 Median : 8.590
## Mean :16.50 Mean :11.528
## 3rd Qu.:24.25 3rd Qu.:16.147
## Max. :32.00 Max. :30.000
## # A tibble: 64 x 4
## # Groups: n, TestEvent [?]
## n TestEvent Familiarization LT.mean
## <int> <fct> <fct> <dbl>
## 1 1 NewGoal Rod 6.03
## 2 1 NewLocation Rod 4.33
## 3 2 NewGoal Rod 30
## 4 2 NewLocation Rod 30
## 5 3 NewGoal Rod 8.09
## 6 3 NewLocation Rod 6.38
## 7 4 NewGoal Rod 11.2
## 8 4 NewLocation Rod 7.01
## 9 5 NewGoal Rod 13.7
## 10 5 NewLocation Rod 12.8
## # ... with 54 more rows
Now, let’s get the average for each condition
condition_average = subject_average %>%
group_by(FAMILIARIZATION_COLUMN,TEST_EVENT_COLUMN) %>% #<-- FILL THIS IN
summarise(LT.mean = FUNCTION_FOR_MEAN(LT.mean),
LT.sd = FUNCTION_FOR_STANDARD_DEVIATION(LT.mean))
condition_average
## # A tibble: 4 x 4
## # Groups: Familiarization [?]
## Familiarization TestEvent LT.Grand.Mean LT.sd
## <fct> <fct> <dbl> <dbl>
## 1 Hand NewGoal 9.57 8.79
## 2 Hand NewLocation 12.4 8.41
## 3 Rod NewGoal 13.0 8.03
## 4 Rod NewLocation 11.1 8.37
Now let’s graph the data, using GGPLOT2 as last time. Look back to the last exercise to see how to add the errorbars to your chart.
ggplot(condition_average, aes(x = WHAT_VARIABLE_ON_X_AXIS,
y = WHAT_VARIABLE_ON_Y_AXIS,
fill = WHAT_VARIABLE_DECIDES_COLORS_OF_BARS))+
geom_col(position = "dodge")+
geom_errorbar(aes(ymin = WHAT_VALUE, ymax = WHAT_VALUE), position = "dodge")