In this assignment we’ll learn about dplyr and tidyr, two packages from the tidyverse that allow elegant and easily understandable data tidying and manipulation. We’ll do this by working through the steps of loading an actual dataset, tidying it up, and carrying out some basic analyses.
The dataset we’re using comes from the OSF Reproduciblity project replication of a study by Maya Tamir, Christopher Mitchell, and our very own James Gross (“Hedonic and Instrumental Motives in Anger Regulation,” Tamir, Mitchell, and Gross, Psychological Science, 2008). You can find the replication report here, and the original paper here. The replication tests two hypotheses from the original paper:
Rating hypothesis: Participants will prefer listening to angry music (or recalling an anger-inducing experience) before playing a confrontational (violent) game, but will prefer listening to exciting or neutral music (or recalling a calm experience) before a neutral game. This is assessed through preference ratings where the participants read a description of a game, and then are asked to rate on a likert scale.
Performance hypothesis: Subjects would perform better after listening to angry music on a confrontational game (not one of the ones described in the materials for the previous hypothesis, to avoid contamination), but would perform better on a non-confrontational game (again, not described in the materials for hypothesis 1) after listening to non-angry music. This is computed by having the subjects play without music for 5 minutes, and then after/with music for 5 minutes, and comparing change scores depending on the music type.
First, let’s load the libraries we’re going to use.
library(foreign) # for reading spss formatted datalibrary(knitr)library(tidyr)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(stringr) # useful for some string manipulationlibrary(ggplot2)
Load Data
d =read.spss("data/Tamiretal2008ReplicationData.sav", to.data.frame=T)
Take a look at the data structure:
head(d) %>%kable()
Subject
Cond
Exper
Inifile
Date
Time
Game1Angry1
Game1Angry2
Game1Angry3
Game1AngryFriends
Game1AngryStrangers
Game1CalmFriends
Game1CalmStrangers
Game1ExcitedFriends
Game1ExcitedStrangers
Game1Exciting1
Game1Exciting2
Game1Exciting3
Game1Intro
Game1Neutral1
Game1Neutral2
Game1Neutral3
Game2Angry1
Game2Angry2
Game2Angry3
Game2AngryFriends
Game2AngryStrangers
Game2CalmFriends
Game2CalmStrangers
Game2ExcitedFriends
Game2ExcitedStrangers
Game2Exciting1
Game2Exciting2
Game2Exciting3
Game2Intro
Game2Neutral1
Game2Neutral2
Game2Neutral3
Game3Angry1
Game3Angry2
Game3Angry3
Game3AngryFriends
Game3AngryStrangers
Game3CalmFriends
Game3CalmStrangers
Game3ExcitedFriends
Game3ExcitedStrangers
Game3Exciting1
Game3Exciting2
Game3Exciting3
Game3Intro
Game3Neutral1
Game3Neutral2
Game3Neutral3
Game4Angry1
Game4Angry2
Game4Angry3
Game4AngryFriends
Game4AngryStrangers
Game4CalmFriends
Game4CalmStrangers
Game4ExcitedFriends
Game4ExcitedStrangers
Game4Exciting1
Game4Exciting2
Game4Exciting3
Game4Intro
Game4Neutral1
Game4Neutral2
Game4Neutral3
MusicSelectionEnd
MusicSelectionInstrx
RecallSelectionEnd
RecallSelectionInstrx
Subject2
Cond2
Exper_A
Inifile_A
Date_A
Time_A
DescribeMusic
HowActiveAngry1
HowActiveAngry2
HowActiveAngry3
HowActiveExciting1
HowActiveExciting2
HowActiveExciting3
HowActiveNeutral1
HowActiveNeutral2
HowActiveNeutral3
HowAngryAngry1
HowAngryAngry2
HowAngryAngry3
HowAngryExciting1
HowAngryExciting2
HowAngryExciting3
HowAngryNeutral1
HowAngryNeutral2
HowAngryNeutral3
HowExcitedAngry1
HowExcitedAngry2
HowExcitedAngry3
HowExcitedExciting1
HowExcitedExciting2
HowExcitedExciting3
HowExcitedNeutral1
HowExcitedNeutral2
HowExcitedNeutral3
HowPleasantAngry1
HowPleasantAngry2
HowPleasantAngry3
HowPleasantExciting1
HowPleasantExciting2
HowPleasantExciting3
HowPleasantNeutral1
HowPleasantNeutral2
HowPleasantNeutral3
MusicRatingEnd
MusicRatingInstrx
WhichGames
aboutyou
age
distractions
endinstructions
ethnicity
overlooking
race
sex
whatabout
year
Subject3
DDNoMusicLevel
DDNoMusicScore
DDMusicLevel
DDMusicScore
SOFNoMusicEnemies
SOFNoMusicFriendlies
SOFNoMusicTime
SOFMusicEnemies
SOFMusicFriendlies
SOFMusicTime
GameComments
DoNotUseVideoGamePerformanceData
ConfrontationalAngryMusicScore
ConfrontationalExcitingMusicScore
ConfrontationalNeutralMusicScore
ConfrontationalAngryRecallScore
ConfrontationalExcitingRecallScore
ConfrontationalNeutralRecallScore
NonconfrontationalAngryMusicScore
NonconfrontationalExcitingMusicScore
NonconfrontationalNeutralMusicScore
NonconfrontationalAngryRecallScore
NonconfrontationalExcitingRecallScore
NonconfrontationalNeutralRecallScore
ConfrontationalAngerScore
ConfrontationalExcitingScore
ConfrontationalNeutralScore
NonconfrontationalAngerScore
NonconfrontationalExcitingScore
NonconfrontationalNeutralScore
Usable
DoNotUse
ProblemDetails
DinerDashWithMusicScore
DinerDashWithoutMusicScore
MusicCondition
ZDinerDashWithMusicScore
ZDinerDashWithoutMusicScore
ZSOFNoMusicEnemies
ZSOFMusicEnemies
DinerDashDifferenceScore
SOFDifferenceScore
PleasantScoreForAngryMusic
PleasantScoreForExcitingMusic
PleasantScoreForNeutralMusic
AngryScoreForAngryMusic
AngryScoreForExcitingMusic
AngryScoreForNeutralMusic
ExcitedScoreForExcitingMusic
ExcitedScoreForNeutralMusic
ActiveScoreForExcitingMusic
ActiveScoreForNeutralMusic
ExcitedScoreForAngryMusic
ActiveScoreForAngryMusic
1
2
C:151Part1.exp
default.mlp
13642819200
40781
6
6
5
2
5
2
2
1
2
3
2
6
ok
2
4
4
6
4
6
3
6
1
2
1
1
3
2
4
ok
1
3
1
2
2
3
3
2
7
6
6
5
2
2
3
ok
5
6
5
2
2
2
2
2
5
5
7
4
5
5
2
ok
1
5
2
ok
ok
ok
ok
1
2
C:151Part2.exp
default.mlp
13642819200
43151
2
4
4
4
5
4
5
2
2
2
5
4
4
3
4
3
2
2
1
4
3
3
4
4
4
2
2
2
1
2
1
2
2
1
5
4
5
ok
ok
ok
ok
18
ok
ok
2
ok
2
1
ok
1
1
3
0
3
830
22
2
24360
19
0
23340
NA
5.500000
3.333333
2.500000
3.75
1.25
2.00
2.166667
3.166667
4.000000
2.50
5.25
6.25
4.8
2.5
2.3
2.3
4.0
4.9
1
NA
5830
5000
Exciting
-0.0733328
0.2692740
0.7501199
-0.2020329
-0.3426068
-0.9521528
1.333333
1.666667
4.666667
4.333333
3.333333
1.666667
4.000000
2.000000
4.666667
2.000000
3.333333
4.000000
2
3
C:151Part1.exp
default.mlp
13642819200
50753
7
7
7
7
7
6
6
6
6
5
3
2
ok
1
1
1
7
6
7
6
7
2
3
5
5
5
2
1
ok
1
1
2
6
3
5
3
2
6
5
6
5
4
3
3
ok
2
1
5
2
5
2
4
4
2
4
3
4
1
2
6
ok
5
5
2
ok
ok
ok
ok
2
3
C:151Part2.exp
default.mlp
13642819200
53012
3
5
5
5
5
2
4
2
2
1
5
5
5
4
3
3
2
1
1
5
5
5
4
2
4
3
2
1
1
2
1
1
4
3
4
4
4
ok
ok
ok
ok
20
ok
ok
2
ok
2
2
ok
2
2
3
20
3
2930
18
1
23580
18
2
22500
NA
6.833333
3.000000
1.166667
7.00
5.75
5.25
3.833333
3.166667
3.333333
3.00
5.25
5.25
6.9
4.1
2.8
3.5
4.0
4.1
0
1
Female participant (this is a males only study)
7930
5020
Neutral
NA
NA
NA
NA
NA
NA
1.333333
2.666667
4.000000
5.000000
3.333333
1.333333
3.333333
2.000000
3.666667
1.666667
5.000000
5.000000
3
1
C:151Part1.exp
default.mlp
13642819200
54540
6
5
7
2
2
2
2
2
2
2
3
4
ok
1
2
3
5
3
6
3
3
3
3
3
3
2
5
2
ok
4
3
1
2
2
3
4
4
3
3
4
4
3
6
2
ok
2
3
3
5
2
2
4
5
2
4
4
5
7
4
5
ok
3
2
4
ok
ok
ok
ok
3
1
C:151Part2.exp
default.mlp
13642819200
57041
2
4
4
4
2
1
3
1
2
1
4
4
4
3
1
3
1
1
2
3
3
2
2
2
3
2
1
2
2
2
4
2
2
2
2
2
1
ok
ok
ok
ok
18
ok
ok
2
ok
2
1
ok
1
3
2
1250
3
370
15
0
15300
23
1
24300
NA
5.333333
3.000000
2.333333
2.25
2.25
2.25
2.666667
4.500000
2.833333
4.25
4.25
3.25
4.1
2.7
2.3
3.3
4.4
3.0
1
NA
5370
1250
Anger
-0.7334425
-2.8616517
-0.1401958
0.3183548
2.1282092
0.4585506
2.666667
2.000000
1.666667
4.000000
2.333333
1.333333
2.333333
1.666667
2.000000
1.333333
2.666667
4.000000
4
4
C:151Part1.exp
default.mlp
13642905600
34952
4
1
1
6
6
2
1
3
4
5
4
5
ok
1
2
2
6
2
6
3
6
1
1
2
4
3
2
2
ok
1
1
3
2
1
6
5
4
2
2
5
6
3
1
3
ok
2
2
6
1
1
2
1
1
2
2
4
4
6
6
6
ok
4
5
2
ok
ok
ok
ok
4
4
C:151Part2.exp
default.mlp
13642905600
37630
3
5
3
3
5
5
5
2
2
1
3
2
3
1
1
1
2
1
1
4
1
3
4
3
5
2
2
1
1
1
3
4
4
3
2
4
5
ok
ok
ok
ok
18
ok
ok
2
ok
2
1
ok
1
4
3
1742
3
1921
3
0
5280
19
0
16860
Participant died, restart
1
3.333333
3.500000
1.666667
6.00
3.50
1.50
2.166667
4.166667
3.500000
3.75
5.00
2.00
4.4
3.5
1.6
2.8
4.5
2.9
1
NA
6921
6742
Anger
1.4922750
1.7236934
-1.6664514
-0.2020329
-0.2314183
1.4644185
1.666667
3.666667
3.666667
2.666667
1.000000
1.333333
4.000000
1.666667
5.000000
1.666667
2.666667
3.666667
5
5
C:151Part1.exp
default.mlp
13642905600
49095
6
6
7
6
6
2
2
5
5
1
3
2
ok
3
2
4
5
6
6
5
6
1
1
4
4
1
2
2
ok
4
4
5
3
5
6
1
3
5
5
6
5
3
1
3
ok
2
4
5
3
4
3
2
3
5
5
5
6
1
5
5
ok
4
2
5
ok
ok
ok
ok
5
5
C:151Part2.exp
default.mlp
13642905600
51434
2
5
4
5
3
3
3
2
1
1
2
2
3
2
2
1
1
1
1
4
4
5
3
3
3
2
1
3
4
3
2
1
1
2
1
1
5
ok
ok
ok
ok
18
ok
ok
2
ok
2
1
ok
1
5
3
60
3
1750
18
2
19140
23
3
20820
Error in game towards the end of time
1
6.000000
1.833333
3.666667
6.00
4.75
1.75
4.000000
3.000000
3.666667
2.00
5.75
5.00
6.0
3.0
2.9
3.2
4.1
4.2
1
NA
6750
5060
Exciting
1.2468865
0.3193688
0.2413681
0.3183548
0.9275176
0.0769867
3.000000
1.333333
2.333333
2.333333
1.666667
1.000000
3.000000
2.000000
3.000000
1.333333
4.333333
4.666667
6
6
C:151Part1.exp
default.mlp
13642905600
59714
5
5
6
3
4
5
4
6
4
3
2
4
ok
2
2
4
6
5
6
3
5
3
2
5
4
2
2
3
ok
2
3
4
2
2
5
1
1
4
3
4
2
1
2
2
ok
5
4
4
2
3
3
1
2
4
4
5
4
2
4
3
ok
3
5
5
ok
ok
ok
ok
6
6
C:151Part2.exp
default.mlp
13642905600
62320
3
3
3
2
3
3
4
1
2
1
2
2
2
2
1
1
1
1
1
5
2
3
3
2
4
1
1
2
2
2
3
3
3
4
3
3
4
ok
ok
ok
ok
19
ok
ok
2
ok
2
1
ok
1
6
3
840
3
1380
23
1
23220
24
0
23400
NA
5.500000
2.666667
2.833333
3.75
5.00
4.00
2.833333
2.333333
4.333333
1.25
3.50
3.75
4.8
3.6
3.3
2.2
2.8
4.1
1
NA
6380
5840
Neutral
0.7159287
0.9706014
0.8773079
0.4484517
-0.2546727
-0.4288562
2.333333
3.333333
3.333333
2.000000
1.333333
1.000000
3.000000
1.333333
3.333333
1.333333
3.333333
2.666667
This data is what we call wide form – each subject is a single row, and the columns represent different observations. This is a somewhat inconvenient way of representing the data, for example if we wanted to do the same operation to each likert rating (for example normalize it to be in the range 0-1), we’d have to do it on each of the 40 or so rating columns. To avoid this, our eventual goal will be to convert the data into long form, where each row is a single observation.
For now, take a look at the column names to get a better idea of what all is in the dataset.
And see if you can figure out what range the likert scores are in. What’s the highest number on the likert scale, and what’s the lowest? (Hint, d$Game1Angry1 is one of the likert rating columns, and you may want to use unique or range or hist)
## your code heremin(d$Game1Angry1, na.rm = T)
[1] 1
max(d$Game1Angry1, na.rm = T)
[1] 7
Highest number: 7 Lowest number: 1
cleaning up a bit
First, we’ll get rid of rows and columns of the data that we don’t need.
filter out excluded rows
First, we need to filter out any rows that should be excluded. According to the report, there are two exclusions:
“exclude data from participant 2 and participant 23 participant 2 is female, and this is a males only study participant 23 was set up on part 2 of the study (the music ratings) twice and never did part 1”
You can see participant 23’s data and the fact that they did not do part 1 by looking at the last rows of the dataframe:
tail(d) %>%kable()
Subject
Cond
Exper
Inifile
Date
Time
Game1Angry1
Game1Angry2
Game1Angry3
Game1AngryFriends
Game1AngryStrangers
Game1CalmFriends
Game1CalmStrangers
Game1ExcitedFriends
Game1ExcitedStrangers
Game1Exciting1
Game1Exciting2
Game1Exciting3
Game1Intro
Game1Neutral1
Game1Neutral2
Game1Neutral3
Game2Angry1
Game2Angry2
Game2Angry3
Game2AngryFriends
Game2AngryStrangers
Game2CalmFriends
Game2CalmStrangers
Game2ExcitedFriends
Game2ExcitedStrangers
Game2Exciting1
Game2Exciting2
Game2Exciting3
Game2Intro
Game2Neutral1
Game2Neutral2
Game2Neutral3
Game3Angry1
Game3Angry2
Game3Angry3
Game3AngryFriends
Game3AngryStrangers
Game3CalmFriends
Game3CalmStrangers
Game3ExcitedFriends
Game3ExcitedStrangers
Game3Exciting1
Game3Exciting2
Game3Exciting3
Game3Intro
Game3Neutral1
Game3Neutral2
Game3Neutral3
Game4Angry1
Game4Angry2
Game4Angry3
Game4AngryFriends
Game4AngryStrangers
Game4CalmFriends
Game4CalmStrangers
Game4ExcitedFriends
Game4ExcitedStrangers
Game4Exciting1
Game4Exciting2
Game4Exciting3
Game4Intro
Game4Neutral1
Game4Neutral2
Game4Neutral3
MusicSelectionEnd
MusicSelectionInstrx
RecallSelectionEnd
RecallSelectionInstrx
Subject2
Cond2
Exper_A
Inifile_A
Date_A
Time_A
DescribeMusic
HowActiveAngry1
HowActiveAngry2
HowActiveAngry3
HowActiveExciting1
HowActiveExciting2
HowActiveExciting3
HowActiveNeutral1
HowActiveNeutral2
HowActiveNeutral3
HowAngryAngry1
HowAngryAngry2
HowAngryAngry3
HowAngryExciting1
HowAngryExciting2
HowAngryExciting3
HowAngryNeutral1
HowAngryNeutral2
HowAngryNeutral3
HowExcitedAngry1
HowExcitedAngry2
HowExcitedAngry3
HowExcitedExciting1
HowExcitedExciting2
HowExcitedExciting3
HowExcitedNeutral1
HowExcitedNeutral2
HowExcitedNeutral3
HowPleasantAngry1
HowPleasantAngry2
HowPleasantAngry3
HowPleasantExciting1
HowPleasantExciting2
HowPleasantExciting3
HowPleasantNeutral1
HowPleasantNeutral2
HowPleasantNeutral3
MusicRatingEnd
MusicRatingInstrx
WhichGames
aboutyou
age
distractions
endinstructions
ethnicity
overlooking
race
sex
whatabout
year
Subject3
DDNoMusicLevel
DDNoMusicScore
DDMusicLevel
DDMusicScore
SOFNoMusicEnemies
SOFNoMusicFriendlies
SOFNoMusicTime
SOFMusicEnemies
SOFMusicFriendlies
SOFMusicTime
GameComments
DoNotUseVideoGamePerformanceData
ConfrontationalAngryMusicScore
ConfrontationalExcitingMusicScore
ConfrontationalNeutralMusicScore
ConfrontationalAngryRecallScore
ConfrontationalExcitingRecallScore
ConfrontationalNeutralRecallScore
NonconfrontationalAngryMusicScore
NonconfrontationalExcitingMusicScore
NonconfrontationalNeutralMusicScore
NonconfrontationalAngryRecallScore
NonconfrontationalExcitingRecallScore
NonconfrontationalNeutralRecallScore
ConfrontationalAngerScore
ConfrontationalExcitingScore
ConfrontationalNeutralScore
NonconfrontationalAngerScore
NonconfrontationalExcitingScore
NonconfrontationalNeutralScore
Usable
DoNotUse
ProblemDetails
DinerDashWithMusicScore
DinerDashWithoutMusicScore
MusicCondition
ZDinerDashWithMusicScore
ZDinerDashWithoutMusicScore
ZSOFNoMusicEnemies
ZSOFMusicEnemies
DinerDashDifferenceScore
SOFDifferenceScore
PleasantScoreForAngryMusic
PleasantScoreForExcitingMusic
PleasantScoreForNeutralMusic
AngryScoreForAngryMusic
AngryScoreForExcitingMusic
AngryScoreForNeutralMusic
ExcitedScoreForExcitingMusic
ExcitedScoreForNeutralMusic
ActiveScoreForExcitingMusic
ActiveScoreForNeutralMusic
ExcitedScoreForAngryMusic
ActiveScoreForAngryMusic
86
87
1
C:151Part1.exp
default.mlp
13644633600
40065
1
3
4
6
7
1
1
1
1
1
1
1
ok
2
2
3
5
5
7
1
7
4
4
2
2
5
1
1
ok
1
1
1
5
3
6
1
2
5
6
4
2
1
1
1
ok
5
1
2
3
1
4
1
1
7
7
7
7
2
5
5
ok
5
5
4
ok
ok
ok
ok
87
1
C:151Part2.exp
default.mlp
13644633600
42314
2
5
5
4
5
5
5
1
1
1
3
5
1
1
1
1
1
5
1
4
4
4
3
4
4
1
2
1
3
3
4
2
4
3
3
3
2
ok
ok
ok
ok
20
ok
ok
2
ok
2
1
ok
2
87
3
0
3
170
15
0
13140
25
1
23160
Participant died, restart
1
4.166667
1.666667
1.666667
6.50
1.25
1.75
3.666667
2.500000
3.666667
1.25
4.25
5.75
5.1
1.5
1.7
2.7
3.2
4.5
1
NA
5170
5000
Anger
-1.0204467
0.2692740
-0.1401958
0.5785486
-1.2897207
0.7187444
3.333333
3.000000
2.666667
3.000000
1.000000
2.333333
3.666667
1.333333
5.000000
1.000000
4.000000
4.666667
87
88
6
C:151Part1.exp
default.mlp
13644633600
51237
7
7
5
4
1
4
4
7
4
7
7
6
ok
2
1
1
7
7
4
1
1
5
6
7
4
7
1
1
ok
1
1
1
2
1
7
1
1
7
2
7
3
2
1
1
ok
4
6
2
2
1
7
3
4
2
6
7
7
4
1
2
ok
5
3
1
ok
ok
ok
ok
88
6
C:151Part2.exp
default.mlp
13644633600
53402
2
5
5
5
5
5
5
2
2
1
5
5
1
3
1
2
1
1
1
5
5
5
5
5
5
1
5
5
1
1
5
5
5
2
5
5
5
ok
ok
ok
ok
18
ok
ok
2
ok
1
1
ok
1
88
3
0
3
866
24
0
23460
27
0
22380
NA
6.166667
4.833333
1.166667
2.50
5.50
4.50
3.333333
1.833333
3.500000
1.75
6.00
5.50
4.7
5.1
2.5
2.7
3.5
4.3
1
NA
5866
5000
Neutral
-0.0216721
0.2692740
1.0044959
0.8387424
-0.2909461
-0.1657534
2.333333
4.000000
5.000000
3.666667
2.000000
1.000000
5.000000
3.666667
5.000000
1.666667
5.000000
5.000000
88
89
2
C:151Part1.exp
default.mlp
13644633600
54293
7
6
6
7
5
3
2
7
6
3
5
2
ok
1
2
1
6
4
6
7
2
3
1
7
5
1
3
1
ok
1
2
2
2
4
4
1
1
6
4
3
6
5
5
6
ok
4
1
6
1
1
1
1
1
7
5
7
5
5
4
7
ok
5
5
3
ok
ok
ok
ok
89
2
C:151Part2.exp
default.mlp
13644633600
56552
2
5
3
4
4
5
5
1
2
1
5
5
4
2
3
1
1
1
1
5
5
4
3
4
5
2
2
1
3
3
2
3
3
5
4
4
5
ok
ok
ok
ok
18
ok
ok
2
ok
2
1
ok
1
89
2
3280
3
820
7
0
8880
31
0
23100
1
5.833333
2.500000
1.500000
5.25
6.25
2.25
2.166667
5.333333
4.000000
1.00
4.25
5.25
5.6
4.0
1.8
1.7
4.9
4.5
1
NA
5820
3280
Exciting
-0.0876830
-1.1667773
-1.1576995
1.3591301
1.0790942
2.5168296
2.666667
3.666667
4.333333
4.666667
2.000000
1.000000
4.000000
1.666667
4.666667
1.333333
4.666667
4.000000
89
90
3
C:151Part1.exp
default.mlp
13644633600
58190
5
5
5
7
7
1
1
4
1
1
1
1
ok
1
1
6
5
1
7
7
7
1
1
1
4
3
2
2
ok
1
3
1
1
1
5
2
2
7
6
7
7
2
1
1
ok
4
4
7
1
3
1
3
3
5
4
7
7
2
4
5
ok
1
2
5
ok
ok
ok
ok
90
3
C:151Part2.exp
default.mlp
13644633600
60558
2
5
5
3
5
5
5
1
1
1
5
5
3
3
1
1
1
1
1
5
5
5
4
5
4
1
1
2
2
1
3
1
5
2
4
4
5
ok
ok
ok
ok
18
ok
ok
2
ok
2
1
ok
1
90
2
3040
3
0
22
2
28440
26
0
25500
NA
4.666667
1.666667
2.166667
7.00
3.25
1.00
2.000000
2.500000
3.833333
2.25
7.00
6.00
5.6
2.3
1.7
2.1
4.3
4.7
1
NA
5000
3040
Neutral
-1.2644002
-1.3671565
0.7501199
0.7086455
0.1027563
-0.0414744
2.000000
2.666667
4.333333
4.333333
1.666667
1.000000
4.333333
1.333333
5.000000
1.000000
5.000000
4.333333
90
23
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
23
1
C:151Part2.exp
default.mlp
13643078400
61329
2
4
5
5
3
3
3
3
4
3
3
3
2
3
2
2
2
2
2
4
4
5
5
5
3
3
4
4
1
1
1
1
2
1
3
3
3
ok
ok
ok
ok
20
ok
ok
2
ok
1
1
ok
2
23
2
3990
3
750
9
2
19260
18
2
24120
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
0
1
Participant 23 was set up on part 2 of the survey when he was supposed to be set up on part 1; he did part 2 twice; data should be excluded entirely
5750
3990
NA
-0.1881345
-0.5739887
-0.9033236
-0.3321298
0.3858541
0.5711938
1.000000
1.333333
3.000000
2.666667
2.333333
2.000000
4.333333
3.666667
3.000000
3.333333
4.333333
4.666667
91
23
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
23
1
C:151Part2.exp
default.mlp
13643078400
63502
2
4
3
5
4
3
5
4
4
2
2
3
2
3
3
1
2
2
1
3
3
3
3
5
4
3
4
3
2
2
1
2
5
3
5
5
1
ok
ok
ok
ok
20
ok
ok
2
ok
1
1
ok
2
23
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
0
1
Participant 23 was set up on part 2 of the survey when he was supposed to be set up on part 1; he did part 2 twice; data should be excluded entirely
NA
NA
NA
NA
NA
NA
NA
NA
NA
1.666667
3.333333
3.666667
2.333333
2.333333
1.666667
4.000000
3.333333
4.000000
3.333333
3.000000
4.000000
Notice that participant 23 has missing values for part 1.
The researchers have made a column called DoNotUse based on their exclusion criteria. Use this column to filter the dataframe! (hint: this is a little trickier than it might be because of how R treats NA values. You may want to use ?unique to check values in the column and check out ?is.na.)
filtered_d = d %>%filter(is.na(DoNotUse)) # your code here: exclude subjects that are marked as "DoNotUse"
It’s good practice to assign a new variable name (in this case filtered_d) to a data frame when you change it in an important way, or apply a code chunk that shouldn’t be run twice. This helps prevent you seeing different results when you run your code in chunks (and might run one multiple times, or skip it, etc.) vs. knit the document.
get rid of unnecessary columns
The dataset contains a bunch of columns we don’t care about:
The dataset contains three subject columns, which are identical except for a single NA which is not mentioned in the protocol, and so is likely an error.
Columns telling us the path to the executable run for each part of the experiment, we don’t really care about that.
Etc.
To get rid of these, we’ll use the select function to take only the columns we need.
filtered_d = filtered_d %>%select(c("Subject", "Cond"), # Generally important columns for both hypothesescontains("Game"), # we want all the game columns for hypothesis 1-contains("Intro"), -c("WhichGames", "GameComments"), # except thesestarts_with("DinerDashWith"), c("SOFMusicEnemies", "SOFNoMusicEnemies")) # These columns are for hypothesis 2
Even better, let’s split this into separate data frames for hypothesis 1 and hypothesis 2, since they are different types of experiments with different measurements, and therefore different analyses that will need to be performed. Now that we’ve cleaned up the data, this is pretty easy to do! We’ll just drop the columns that are for the other hypothesis. The select function lets us choose which columns to remove (instead of which to keep) by putting a minus sign in front of them. First, let’s create a dataset for the rating hypothesis by getting rid of the game performance columns:
rating_hyp_d = filtered_d %>%filter(is.na(DoNotUseVideoGamePerformanceData)) %>%# first, let's get rid of the subjects who did so poorly on one game that their data is unusableselect(-DoNotUseVideoGamePerformanceData, # now get rid of that column-starts_with("DinerDash"), # and the other columns we don't need-starts_with("SOF"))
Now you try! Fill in the selection criteria to get rid of the “Game” columns, which we don’t need for the performance hypothesis. (It’s simpler than the code block above, because you don’t need to do a filter first, only a select.)
performance_hyp_d = filtered_d %>%select(-contains("Game")) # your code here: remove the columns containing "Game" in the name
Converting to long form
Now we want to convert the data to long form, to make the rest of our manipulations easier. To do this, we can use pivot_longer on the target columns. This will take many columns, and change the column names into entries in a “key” column, while the values that were in the original column will be turned into entries in a “value” column. It’s easiest to see with an example:
tiny_demo_d =head(performance_hyp_d, 2) # get just the first two subjects performance data, for a demo
First, take a look at the original wide-form data:
tiny_demo_d %>%pivot_longer(cols=-c("Subject", "Cond"), # this tells it to transform all columns *except* these onesnames_to='Measurement', values_to='Value')
The measurement column in each dataset now contains a bunch of different types of information. Really, we would like these to be separate columns. For example, we could have one column telling you which video-game it is, and one telling you whether there was music. Tidyverse contains some handy features for splitting columns, but unfortunately the measurement names here are not well suited to it (if the different types of information were always the same length, or were separated by a symbol like “.” or “_“, it would be easy). Thus we’ll have to do a bit of manual testing. We can use the mutate function in dplyr to create new columns as functions of old ones (or alter existing columns). We’ll also use the grepl function, which lets us test whether a regular expression (a fancy type of search pattern) is contained in a column name. For most your purposes, you can probably just use grepl to search for strings, but there are some other quite useful functions in regular expressions, like the”or”” function (|) we use below.
performance_hyp_long_d = performance_hyp_long_d %>%mutate(ConfrontationalGame =grepl("SOF", Measurement), # create a new variable that will say whether the measurement was of the game soldier of fortune (SOF).WithMusic =!grepl("NoMusic|WithoutMusic", Measurement), # creates a new column named WithMusic, which is False if the measurement contains *either* "NoMusic" or "WithoutMusic"MusicCondition =factor(ifelse(Cond >3, Cond-3, Cond), levels =1:3, labels =c("Anger", "Exciting", "Neutral"))) # Get rid of uninterpretable condition labels
Now you can help! For the rating dataset, write a test on a measurement name, using grepl or %in% to figure out whether it’s a recall or a music rating. Your new IsRecall column should be true if the measurement name contain either “Friends” or “Strangers”.
rating_hyp_long_d = rating_hyp_long_d %>%mutate(IsRecall =grepl("Friends|Strangers", Measurement),## Your code here )
Here are a couple other useful ways of manipulating columns. (You won’t remember all the functions you see here now, but that’s okay. You can always reference this tutorial later if there’s something you need to figure out how to do.)
rating_hyp_long_d = rating_hyp_long_d %>%mutate(GameNumber =as.numeric(substr(rating_hyp_long_d$Measurement, 5, 5)),ConfrontationalGame = GameNumber <=2, # in a mutate, we can use a column we created (or changed) right away. Games 1 and 2 are confrontational, games 3 and 4 are not.Emotion =str_extract(Measurement, "Angry|Neutral|Excited|Exciting|Calm"),Emotion =ifelse(Emotion =="Excited", "Exciting", # this just gets rid of some annoying labeling choicesifelse(Emotion =="Calm", "Neutral", Emotion)) )
Groups, Summaries, and Results
Performance Hypothesis
For the performance data, we need to do a little bit of manipulation of the columns in order to get to the performance measures the experimenters actually used. Because they want to compare changes in performance across games that have very different scoring systems, the easiest solution is to compare z-scores. The way they did this was to z-score performance before music, z-score performance after music, and then create a difference measure which is a difference of z-scores. (To my mind, this is actually not quite the correct way to analyze this data, but like the replication we will follow the original authors.)
We’ll add a new z-scored value column. However, we have to be careful! We want to z-score within groups of the rows, that are all the same type of measurement. For example, we want to z-score the “DinnerDashWithMusic” scores with respect to eachother, but not with respect to the scores from the other game, for example. We can use the group_by function to set groups, and then all the changes we apply will only occur within those groups until we ungroup the dataset.
To make this more concrete, let’s see how the group_by function can let us compute means within different groups, for example mean scores on the two different games.
performance_hyp_long_d %>%group_by(ConfrontationalGame) %>%summarize(AvgScore =mean(Score, na.rm=T)) # the na.rm tells R to ignore NA values
This makes it clear why we can’t just z-score the games together! The scores are very different between games. So let’s z-score within groups (using the scale function):
performance_hyp_long_d = performance_hyp_long_d %>%group_by(ConfrontationalGame, WithMusic) %>%# we're going to compute four sets of z-scores, one for the confrontational game without music, one for the confrontational game with, one for the nonconfrontational game without music, and one for the nonconfrontational game withmutate(z_scored_performance =scale(Score)) %>%ungroup()
Rating Hypothesis
The rating hypothesis analysis also requires some grouped manipulation. The experimenters collected repeated measures on ratings in each emotion category and each music/recall category from each game. For this analysis, they averaged all the ratings over the following two variables: the given emotion and the game type, to produce a nice summary. Your job is to implement this, calling the new variable MeanRating, and save the summarized data in a new data frame called rating_summary_d. (Hint: use a group_by and a summarize.)
Up to reordering (and the fact that we didn’t compute error bars), this is a pretty decent replication of Fig. 1 from the original Tamir et al. paper. The ratings were highest for Angry in the confrontational game, and lowest for Angry in the non-confrontational game.
And the long form dataset makes it easy to run a linear model (don’t worry too much about this, we’ll talk more about it in 252).
model =lm(Rating ~ ConfrontationalGame * Emotion, rating_hyp_long_d)summary(model)
There are still a few more steps to go for the performance hypothesis. We need to take a difference score to see how people improved from before hearing the music to after, and then see if the improvement is larger if they heard music congruent with the type of game.
To compute the difference score, we have to make our data a bit wider. We now want to subtract the pre-music scores from the post-music scores, which is easiest to do if they are in two different columns. To do this we’ll use the pivot_wider function (which is more or less the opposite of pivot_longer)
performance_diff_d = performance_hyp_long_d %>%mutate(WithMusic =factor(WithMusic, levels=c(F, T), labels=c("PreMusic", "PostMusic"))) %>%# first, tweak the variable so our code is easier to read.select(-c("Score", "Measurement")) %>%# now we remove columns we don't need (bonus: leave them in and see if you can understand what goes wrong!)pivot_wider(names_from=WithMusic, values_from=z_scored_performance) %>%mutate(ImprovementScore=PostMusic-PreMusic)
If you don’t understand every step of that code (or any other dplyr code), it can be helpful to look at the result of running just the first line, then just the first two lines, and so on.
Now we’re finally to reproduce Fig. 2 from Tamir et al., we just need to get the mean differences within each game and each kind of music, and save them to a variable called MeanImprovementScore:
(Bonus: also calculate the SEM in the summary data, and then add errorbars to the plot with geom_errorbar!)
Not quite as exact a replication of the effect as Fig. 1. This concurs with the replication report, which says that the hypothesis 1 effect replicated, but hypothesis 2 did not. Here’s a model just for thoroughness (again, don’t worry too much about it):