## Joining, by = "n"
Cast your mind back to the end of Lecture 2, where we talked about Amanda Woodward’s experiments on infants’ perception of goals. Remember that she familiarized infants to a hand or a rod reaching for one of two objects. Then, she showed them Test trials in which the positions of the objects had been switched. She showed that when infants had viewed a hand, they looked longer when that hand went to a new object, even if it traced the same path as before. However, if infants had viewed a rod, they looked longer if that rod went to the old object, tracing a different path as before.
From this, she inferred that infants perceive the hand as having a goal.
You can see examples of the test trials below.
Reaching Hand
Reaching Rod
In this exercise, we will redo Woodward’s analysis, processing the data and then graphing it. Note, though, that we are not using her original data – rather, I simulated some data to match the basic results of her study.
You can download the datafile, called woodward_data.csv from this week’s Learn resource. Then, create an R script in the same directory as your datafile, and read it in following the code below. Process and graph the data by following along with this guide.
You may need to install these packages before loading
library(ggplot2)
library(dplyr)
woodward_data = read.csv("woodward_datafile.csv")
summary(woodward_data)
## n TestEvent Familiarization LT
## Min. : 1.00 NewGoal :96 Hand:96 Min. : 1.430
## 1st Qu.: 8.75 NewLocation:96 Rod :96 1st Qu.: 7.149
## Median :16.50 Median :10.671
## Mean :16.50 Mean :11.971
## 3rd Qu.:24.25 3rd Qu.:15.703
## Max. :32.00 Max. :30.000
## TrialNumber
## Min. :1
## 1st Qu.:1
## Median :2
## Mean :2
## 3rd Qu.:3
## Max. :3
Here, the different columns mean:
n = subject number (i.e., which participant is which)Create a histogram of looking times
hist(woodward_data$PutTheRightColumnNameHere)
You can see that the data is skewed.
The graphs that you see in papers typically display by subject averages. What that means is, first, for each subject, you get their mean result in each condition, and then you find the overall mean in each condition, across the different subjects. So, remember that each subject here did 3 trials per condition. What we will do first is, for each subject, average across those three trials in each condition.
To do that we will use some dplyr functions, that make grouping and averaging easy.
subject_average = woodward_data %>%
group_by(COLUMN_FOR_SUBJECT_NUMBER, TEST_EVENT_COLUMN,FAMILIARIZATION_COLUMN) %>% #<-- FILL THIS IN
summarise(LT.mean = FUNCTION_FOR_MEAN(LT))
summary(subject_average)
hist(subject_average$COLUMN_FOR_LOOKING_TIME_MEAN)
You can see three interesting things here.
%>% operator takes the woodward_data variable and passes it to the function on the next line, which is called group_by.Your result should look like this:
## n TestEvent Familiarization LT.mean
## Min. : 1.00 NewGoal :32 Hand:32 Min. : 2.008
## 1st Qu.: 8.75 NewLocation:32 Rod :32 1st Qu.: 6.919
## Median :16.50 Median :10.537
## Mean :16.50 Mean :11.971
## 3rd Qu.:24.25 3rd Qu.:15.784
## Max. :32.00 Max. :30.000
## # A tibble: 64 x 4
## # Groups: n, TestEvent [?]
## n TestEvent Familiarization LT.mean
## <int> <fct> <fct> <dbl>
## 1 1 NewGoal Rod 9.48
## 2 1 NewLocation Rod 14.1
## 3 2 NewGoal Rod 11.7
## 4 2 NewLocation Rod 11.4
## 5 3 NewGoal Rod 20.9
## 6 3 NewLocation Rod 23.8
## 7 4 NewGoal Rod 11.1
## 8 4 NewLocation Rod 14.9
## 9 5 NewGoal Rod 2.90
## 10 5 NewLocation Rod 8.52
## # ... with 54 more rows
Now, let’s get the average for each condition
condition_average = subject_average %>%
group_by(FAMILIARIZATION_COLUMN,TEST_EVENT_COLUMN) %>% #<-- FILL THIS IN
summarise(LT.mean = FUNCTION_FOR_MEAN(LT.mean),
LT.sd = FUNCTION_FOR_STANDARD_DEVIATION(LT.mean))
condition_average
## # A tibble: 4 x 4
## # Groups: Familiarization [?]
## Familiarization TestEvent LT.Grand.Mean LT.sd
## <fct> <fct> <dbl> <dbl>
## 1 Hand NewGoal 15.9 6.71
## 2 Hand NewLocation 12.8 7.24
## 3 Rod NewGoal 8.22 4.92
## 4 Rod NewLocation 10.9 5.02
Now let’s graph the data, using GGPLOT2 as last time. Look back to the last exercise to see how to add the errorbars to your chart.
ggplot(condition_average, aes(x = WHAT_VARIABLE_ON_X_AXIS,
y = WHAT_VARIABLE_ON_Y_AXIS,
fill = WHAT_VARIABLE_DECIDES_COLORS_OF_BARS))+
geom_col(position = "dodge")+
geom_errorbar(aes(ymin = WHAT_VALUE, ymax = WHAT_VALUE), position = "dodge")
You can see that infants differentially look to the test trials, depending on what they were familiarized with.
Note also that the error bars here are very wide. That’s because we are plotting the standard deviations of the mean, rather than the standard errors of the mean.
If you like, you can try to figure out how to edit the code above to plot the standard error instead. Remember that the standard error of the data in a condition is the standard deviation divided by the square root of the number of participants in that condition (here, 16). See if you can figure out how to plot that statistic instead. You can always email me, or come to office hours, if you have any questions.