Psych 251 PS2: Tidying Data

Author

Jane Stephenson

Published

October 10, 2025

In this assignment we’ll learn about dplyr and tidyr, two packages from the tidyverse that allow elegant and easily understandable data tidying and manipulation. We’ll do this by working through the steps of loading an actual dataset, tidying it up, and carrying out some basic analyses.

The dataset we’re using comes from the OSF Reproduciblity project replication of a study by Maya Tamir, Christopher Mitchell, and our very own James Gross (“Hedonic and Instrumental Motives in Anger Regulation,” Tamir, Mitchell, and Gross, Psychological Science, 2008). You can find the replication report here, and the original paper here. The replication tests two hypotheses from the original paper:

Rating hypothesis: Participants will prefer listening to angry music (or recalling an anger-inducing experience) before playing a confrontational (violent) game, but will prefer listening to exciting or neutral music (or recalling a calm experience) before a neutral game. This is assessed through preference ratings where the participants read a description of a game, and then are asked to rate on a likert scale.
Performance hypothesis: Subjects would perform better after listening to angry music on a confrontational game (not one of the ones described in the materials for the previous hypothesis, to avoid contamination), but would perform better on a non-confrontational game (again, not described in the materials for hypothesis 1) after listening to non-angry music. This is computed by having the subjects play without music for 5 minutes, and then after/with music for 5 minutes, and comparing change scores depending on the music type.

First, let’s load the libraries we’re going to use.

library(foreign) # for reading spss formatted data
library(knitr)
library(tidyr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(stringr) # useful for some string manipulation
library(ggplot2)

Load Data

d = read.spss("data/Tamiretal2008ReplicationData.sav", to.data.frame=T)

Take a look at the data structure:

head(d) %>% kable()

Subject	Cond	Exper	Inifile	Date	Time	Game1Angry1	Game1Angry2	Game1Angry3	Game1AngryFriends	Game1AngryStrangers	Game1CalmFriends	Game1CalmStrangers	Game1ExcitedFriends	Game1ExcitedStrangers	Game1Exciting1	Game1Exciting2	Game1Exciting3	Game1Intro	Game1Neutral1	Game1Neutral2	Game1Neutral3	Game2Angry1	Game2Angry2	Game2Angry3	Game2AngryFriends	Game2AngryStrangers	Game2CalmFriends	Game2CalmStrangers	Game2ExcitedFriends	Game2ExcitedStrangers	Game2Exciting1	Game2Exciting2	Game2Exciting3	Game2Intro	Game2Neutral1	Game2Neutral2	Game2Neutral3	Game3Angry1	Game3Angry2	Game3Angry3	Game3AngryFriends	Game3AngryStrangers	Game3CalmFriends	Game3CalmStrangers	Game3ExcitedFriends	Game3ExcitedStrangers	Game3Exciting1	Game3Exciting2	Game3Exciting3	Game3Intro	Game3Neutral1	Game3Neutral2	Game3Neutral3	Game4Angry1	Game4Angry2	Game4Angry3	Game4AngryFriends	Game4AngryStrangers	Game4CalmFriends	Game4CalmStrangers	Game4ExcitedFriends	Game4ExcitedStrangers	Game4Exciting1	Game4Exciting2	Game4Exciting3	Game4Intro	Game4Neutral1	Game4Neutral2	Game4Neutral3	MusicSelectionEnd	MusicSelectionInstrx	RecallSelectionEnd	RecallSelectionInstrx	Subject2	Cond2	Exper_A	Inifile_A	Date_A	Time_A	DescribeMusic	HowActiveAngry1	HowActiveAngry2	HowActiveAngry3	HowActiveExciting1	HowActiveExciting2	HowActiveExciting3	HowActiveNeutral1	HowActiveNeutral2	HowActiveNeutral3	HowAngryAngry1	HowAngryAngry2	HowAngryAngry3	HowAngryExciting1	HowAngryExciting2	HowAngryExciting3	HowAngryNeutral1	HowAngryNeutral2	HowAngryNeutral3	HowExcitedAngry1	HowExcitedAngry2	HowExcitedAngry3	HowExcitedExciting1	HowExcitedExciting2	HowExcitedExciting3	HowExcitedNeutral1	HowExcitedNeutral2	HowExcitedNeutral3	HowPleasantAngry1	HowPleasantAngry2	HowPleasantAngry3	HowPleasantExciting1	HowPleasantExciting2	HowPleasantExciting3	HowPleasantNeutral1	HowPleasantNeutral2	HowPleasantNeutral3	MusicRatingEnd	MusicRatingInstrx	WhichGames	aboutyou	age	distractions	endinstructions	ethnicity	overlooking	race	sex	whatabout	year	Subject3	DDNoMusicLevel	DDNoMusicScore	DDMusicLevel	DDMusicScore	SOFNoMusicEnemies	SOFNoMusicFriendlies	SOFNoMusicTime	SOFMusicEnemies	SOFMusicFriendlies	SOFMusicTime	GameComments	DoNotUseVideoGamePerformanceData	ConfrontationalAngryMusicScore	ConfrontationalExcitingMusicScore	ConfrontationalNeutralMusicScore	ConfrontationalAngryRecallScore	ConfrontationalExcitingRecallScore	ConfrontationalNeutralRecallScore	NonconfrontationalAngryMusicScore	NonconfrontationalExcitingMusicScore	NonconfrontationalNeutralMusicScore	NonconfrontationalAngryRecallScore	NonconfrontationalExcitingRecallScore	NonconfrontationalNeutralRecallScore	ConfrontationalAngerScore	ConfrontationalExcitingScore	ConfrontationalNeutralScore	NonconfrontationalAngerScore	NonconfrontationalExcitingScore	NonconfrontationalNeutralScore	Usable	DoNotUse	ProblemDetails	DinerDashWithMusicScore	DinerDashWithoutMusicScore	MusicCondition	ZDinerDashWithMusicScore	ZDinerDashWithoutMusicScore	ZSOFNoMusicEnemies	ZSOFMusicEnemies	DinerDashDifferenceScore	SOFDifferenceScore	PleasantScoreForAngryMusic	PleasantScoreForExcitingMusic	PleasantScoreForNeutralMusic	AngryScoreForAngryMusic	AngryScoreForExcitingMusic	AngryScoreForNeutralMusic	ExcitedScoreForExcitingMusic	ExcitedScoreForNeutralMusic	ActiveScoreForExcitingMusic	ActiveScoreForNeutralMusic	ExcitedScoreForAngryMusic	ActiveScoreForAngryMusic
1	2	C:151Part1.exp	default.mlp	13642819200	40781	6	6	5	2	5	2	2	1	2	3	2	6	ok	2	4	4	6	4	6	3	6	1	2	1	1	3	2	4	ok	1	3	1	2	2	3	3	2	7	6	6	5	2	2	3	ok	5	6	5	2	2	2	2	2	5	5	7	4	5	5	2	ok	1	5	2	ok	ok	ok	ok	1	2	C:151Part2.exp	default.mlp	13642819200	43151	2	4	4	4	5	4	5	2	2	2	5	4	4	3	4	3	2	2	1	4	3	3	4	4	4	2	2	2	1	2	1	2	2	1	5	4	5	ok	ok	ok	ok	18	ok	ok	2	ok	2	1	ok	1	1	3	0	3	830	22	2	24360	19	0	23340		NA	5.500000	3.333333	2.500000	3.75	1.25	2.00	2.166667	3.166667	4.000000	2.50	5.25	6.25	4.8	2.5	2.3	2.3	4.0	4.9	1	NA		5830	5000	Exciting	-0.0733328	0.2692740	0.7501199	-0.2020329	-0.3426068	-0.9521528	1.333333	1.666667	4.666667	4.333333	3.333333	1.666667	4.000000	2.000000	4.666667	2.000000	3.333333	4.000000
2	3	C:151Part1.exp	default.mlp	13642819200	50753	7	7	7	7	7	6	6	6	6	5	3	2	ok	1	1	1	7	6	7	6	7	2	3	5	5	5	2	1	ok	1	1	2	6	3	5	3	2	6	5	6	5	4	3	3	ok	2	1	5	2	5	2	4	4	2	4	3	4	1	2	6	ok	5	5	2	ok	ok	ok	ok	2	3	C:151Part2.exp	default.mlp	13642819200	53012	3	5	5	5	5	2	4	2	2	1	5	5	5	4	3	3	2	1	1	5	5	5	4	2	4	3	2	1	1	2	1	1	4	3	4	4	4	ok	ok	ok	ok	20	ok	ok	2	ok	2	2	ok	2	2	3	20	3	2930	18	1	23580	18	2	22500		NA	6.833333	3.000000	1.166667	7.00	5.75	5.25	3.833333	3.166667	3.333333	3.00	5.25	5.25	6.9	4.1	2.8	3.5	4.0	4.1	0	1	Female participant (this is a males only study)	7930	5020	Neutral	NA	NA	NA	NA	NA	NA	1.333333	2.666667	4.000000	5.000000	3.333333	1.333333	3.333333	2.000000	3.666667	1.666667	5.000000	5.000000
3	1	C:151Part1.exp	default.mlp	13642819200	54540	6	5	7	2	2	2	2	2	2	2	3	4	ok	1	2	3	5	3	6	3	3	3	3	3	3	2	5	2	ok	4	3	1	2	2	3	4	4	3	3	4	4	3	6	2	ok	2	3	3	5	2	2	4	5	2	4	4	5	7	4	5	ok	3	2	4	ok	ok	ok	ok	3	1	C:151Part2.exp	default.mlp	13642819200	57041	2	4	4	4	2	1	3	1	2	1	4	4	4	3	1	3	1	1	2	3	3	2	2	2	3	2	1	2	2	2	4	2	2	2	2	2	1	ok	ok	ok	ok	18	ok	ok	2	ok	2	1	ok	1	3	2	1250	3	370	15	0	15300	23	1	24300		NA	5.333333	3.000000	2.333333	2.25	2.25	2.25	2.666667	4.500000	2.833333	4.25	4.25	3.25	4.1	2.7	2.3	3.3	4.4	3.0	1	NA		5370	1250	Anger	-0.7334425	-2.8616517	-0.1401958	0.3183548	2.1282092	0.4585506	2.666667	2.000000	1.666667	4.000000	2.333333	1.333333	2.333333	1.666667	2.000000	1.333333	2.666667	4.000000
4	4	C:151Part1.exp	default.mlp	13642905600	34952	4	1	1	6	6	2	1	3	4	5	4	5	ok	1	2	2	6	2	6	3	6	1	1	2	4	3	2	2	ok	1	1	3	2	1	6	5	4	2	2	5	6	3	1	3	ok	2	2	6	1	1	2	1	1	2	2	4	4	6	6	6	ok	4	5	2	ok	ok	ok	ok	4	4	C:151Part2.exp	default.mlp	13642905600	37630	3	5	3	3	5	5	5	2	2	1	3	2	3	1	1	1	2	1	1	4	1	3	4	3	5	2	2	1	1	1	3	4	4	3	2	4	5	ok	ok	ok	ok	18	ok	ok	2	ok	2	1	ok	1	4	3	1742	3	1921	3	0	5280	19	0	16860	Participant died, restart	1	3.333333	3.500000	1.666667	6.00	3.50	1.50	2.166667	4.166667	3.500000	3.75	5.00	2.00	4.4	3.5	1.6	2.8	4.5	2.9	1	NA		6921	6742	Anger	1.4922750	1.7236934	-1.6664514	-0.2020329	-0.2314183	1.4644185	1.666667	3.666667	3.666667	2.666667	1.000000	1.333333	4.000000	1.666667	5.000000	1.666667	2.666667	3.666667
5	5	C:151Part1.exp	default.mlp	13642905600	49095	6	6	7	6	6	2	2	5	5	1	3	2	ok	3	2	4	5	6	6	5	6	1	1	4	4	1	2	2	ok	4	4	5	3	5	6	1	3	5	5	6	5	3	1	3	ok	2	4	5	3	4	3	2	3	5	5	5	6	1	5	5	ok	4	2	5	ok	ok	ok	ok	5	5	C:151Part2.exp	default.mlp	13642905600	51434	2	5	4	5	3	3	3	2	1	1	2	2	3	2	2	1	1	1	1	4	4	5	3	3	3	2	1	3	4	3	2	1	1	2	1	1	5	ok	ok	ok	ok	18	ok	ok	2	ok	2	1	ok	1	5	3	60	3	1750	18	2	19140	23	3	20820	Error in game towards the end of time	1	6.000000	1.833333	3.666667	6.00	4.75	1.75	4.000000	3.000000	3.666667	2.00	5.75	5.00	6.0	3.0	2.9	3.2	4.1	4.2	1	NA		6750	5060	Exciting	1.2468865	0.3193688	0.2413681	0.3183548	0.9275176	0.0769867	3.000000	1.333333	2.333333	2.333333	1.666667	1.000000	3.000000	2.000000	3.000000	1.333333	4.333333	4.666667
6	6	C:151Part1.exp	default.mlp	13642905600	59714	5	5	6	3	4	5	4	6	4	3	2	4	ok	2	2	4	6	5	6	3	5	3	2	5	4	2	2	3	ok	2	3	4	2	2	5	1	1	4	3	4	2	1	2	2	ok	5	4	4	2	3	3	1	2	4	4	5	4	2	4	3	ok	3	5	5	ok	ok	ok	ok	6	6	C:151Part2.exp	default.mlp	13642905600	62320	3	3	3	2	3	3	4	1	2	1	2	2	2	2	1	1	1	1	1	5	2	3	3	2	4	1	1	2	2	2	3	3	3	4	3	3	4	ok	ok	ok	ok	19	ok	ok	2	ok	2	1	ok	1	6	3	840	3	1380	23	1	23220	24	0	23400		NA	5.500000	2.666667	2.833333	3.75	5.00	4.00	2.833333	2.333333	4.333333	1.25	3.50	3.75	4.8	3.6	3.3	2.2	2.8	4.1	1	NA		6380	5840	Neutral	0.7159287	0.9706014	0.8773079	0.4484517	-0.2546727	-0.4288562	2.333333	3.333333	3.333333	2.000000	1.333333	1.000000	3.000000	1.333333	3.333333	1.333333	3.333333	2.666667

This data is what we call wide form – each subject is a single row, and the columns represent different observations. This is a somewhat inconvenient way of representing the data, for example if we wanted to do the same operation to each likert rating (for example normalize it to be in the range 0-1), we’d have to do it on each of the 40 or so rating columns. To avoid this, our eventual goal will be to convert the data into long form, where each row is a single observation.

For now, take a look at the column names to get a better idea of what all is in the dataset.

colnames(d)

  [1] "Subject"                              
  [2] "Cond"                                 
  [3] "Exper"                                
  [4] "Inifile"                              
  [5] "Date"                                 
  [6] "Time"                                 
  [7] "Game1Angry1"                          
  [8] "Game1Angry2"                          
  [9] "Game1Angry3"                          
 [10] "Game1AngryFriends"                    
 [11] "Game1AngryStrangers"                  
 [12] "Game1CalmFriends"                     
 [13] "Game1CalmStrangers"                   
 [14] "Game1ExcitedFriends"                  
 [15] "Game1ExcitedStrangers"                
 [16] "Game1Exciting1"                       
 [17] "Game1Exciting2"                       
 [18] "Game1Exciting3"                       
 [19] "Game1Intro"                           
 [20] "Game1Neutral1"                        
 [21] "Game1Neutral2"                        
 [22] "Game1Neutral3"                        
 [23] "Game2Angry1"                          
 [24] "Game2Angry2"                          
 [25] "Game2Angry3"                          
 [26] "Game2AngryFriends"                    
 [27] "Game2AngryStrangers"                  
 [28] "Game2CalmFriends"                     
 [29] "Game2CalmStrangers"                   
 [30] "Game2ExcitedFriends"                  
 [31] "Game2ExcitedStrangers"                
 [32] "Game2Exciting1"                       
 [33] "Game2Exciting2"                       
 [34] "Game2Exciting3"                       
 [35] "Game2Intro"                           
 [36] "Game2Neutral1"                        
 [37] "Game2Neutral2"                        
 [38] "Game2Neutral3"                        
 [39] "Game3Angry1"                          
 [40] "Game3Angry2"                          
 [41] "Game3Angry3"                          
 [42] "Game3AngryFriends"                    
 [43] "Game3AngryStrangers"                  
 [44] "Game3CalmFriends"                     
 [45] "Game3CalmStrangers"                   
 [46] "Game3ExcitedFriends"                  
 [47] "Game3ExcitedStrangers"                
 [48] "Game3Exciting1"                       
 [49] "Game3Exciting2"                       
 [50] "Game3Exciting3"                       
 [51] "Game3Intro"                           
 [52] "Game3Neutral1"                        
 [53] "Game3Neutral2"                        
 [54] "Game3Neutral3"                        
 [55] "Game4Angry1"                          
 [56] "Game4Angry2"                          
 [57] "Game4Angry3"                          
 [58] "Game4AngryFriends"                    
 [59] "Game4AngryStrangers"                  
 [60] "Game4CalmFriends"                     
 [61] "Game4CalmStrangers"                   
 [62] "Game4ExcitedFriends"                  
 [63] "Game4ExcitedStrangers"                
 [64] "Game4Exciting1"                       
 [65] "Game4Exciting2"                       
 [66] "Game4Exciting3"                       
 [67] "Game4Intro"                           
 [68] "Game4Neutral1"                        
 [69] "Game4Neutral2"                        
 [70] "Game4Neutral3"                        
 [71] "MusicSelectionEnd"                    
 [72] "MusicSelectionInstrx"                 
 [73] "RecallSelectionEnd"                   
 [74] "RecallSelectionInstrx"                
 [75] "Subject2"                             
 [76] "Cond2"                                
 [77] "Exper_A"                              
 [78] "Inifile_A"                            
 [79] "Date_A"                               
 [80] "Time_A"                               
 [81] "DescribeMusic"                        
 [82] "HowActiveAngry1"                      
 [83] "HowActiveAngry2"                      
 [84] "HowActiveAngry3"                      
 [85] "HowActiveExciting1"                   
 [86] "HowActiveExciting2"                   
 [87] "HowActiveExciting3"                   
 [88] "HowActiveNeutral1"                    
 [89] "HowActiveNeutral2"                    
 [90] "HowActiveNeutral3"                    
 [91] "HowAngryAngry1"                       
 [92] "HowAngryAngry2"                       
 [93] "HowAngryAngry3"                       
 [94] "HowAngryExciting1"                    
 [95] "HowAngryExciting2"                    
 [96] "HowAngryExciting3"                    
 [97] "HowAngryNeutral1"                     
 [98] "HowAngryNeutral2"                     
 [99] "HowAngryNeutral3"                     
[100] "HowExcitedAngry1"                     
[101] "HowExcitedAngry2"                     
[102] "HowExcitedAngry3"                     
[103] "HowExcitedExciting1"                  
[104] "HowExcitedExciting2"                  
[105] "HowExcitedExciting3"                  
[106] "HowExcitedNeutral1"                   
[107] "HowExcitedNeutral2"                   
[108] "HowExcitedNeutral3"                   
[109] "HowPleasantAngry1"                    
[110] "HowPleasantAngry2"                    
[111] "HowPleasantAngry3"                    
[112] "HowPleasantExciting1"                 
[113] "HowPleasantExciting2"                 
[114] "HowPleasantExciting3"                 
[115] "HowPleasantNeutral1"                  
[116] "HowPleasantNeutral2"                  
[117] "HowPleasantNeutral3"                  
[118] "MusicRatingEnd"                       
[119] "MusicRatingInstrx"                    
[120] "WhichGames"                           
[121] "aboutyou"                             
[122] "age"                                  
[123] "distractions"                         
[124] "endinstructions"                      
[125] "ethnicity"                            
[126] "overlooking"                          
[127] "race"                                 
[128] "sex"                                  
[129] "whatabout"                            
[130] "year"                                 
[131] "Subject3"                             
[132] "DDNoMusicLevel"                       
[133] "DDNoMusicScore"                       
[134] "DDMusicLevel"                         
[135] "DDMusicScore"                         
[136] "SOFNoMusicEnemies"                    
[137] "SOFNoMusicFriendlies"                 
[138] "SOFNoMusicTime"                       
[139] "SOFMusicEnemies"                      
[140] "SOFMusicFriendlies"                   
[141] "SOFMusicTime"                         
[142] "GameComments"                         
[143] "DoNotUseVideoGamePerformanceData"     
[144] "ConfrontationalAngryMusicScore"       
[145] "ConfrontationalExcitingMusicScore"    
[146] "ConfrontationalNeutralMusicScore"     
[147] "ConfrontationalAngryRecallScore"      
[148] "ConfrontationalExcitingRecallScore"   
[149] "ConfrontationalNeutralRecallScore"    
[150] "NonconfrontationalAngryMusicScore"    
[151] "NonconfrontationalExcitingMusicScore" 
[152] "NonconfrontationalNeutralMusicScore"  
[153] "NonconfrontationalAngryRecallScore"   
[154] "NonconfrontationalExcitingRecallScore"
[155] "NonconfrontationalNeutralRecallScore" 
[156] "ConfrontationalAngerScore"            
[157] "ConfrontationalExcitingScore"         
[158] "ConfrontationalNeutralScore"          
[159] "NonconfrontationalAngerScore"         
[160] "NonconfrontationalExcitingScore"      
[161] "NonconfrontationalNeutralScore"       
[162] "Usable"                               
[163] "DoNotUse"                             
[164] "ProblemDetails"                       
[165] "DinerDashWithMusicScore"              
[166] "DinerDashWithoutMusicScore"           
[167] "MusicCondition"                       
[168] "ZDinerDashWithMusicScore"             
[169] "ZDinerDashWithoutMusicScore"          
[170] "ZSOFNoMusicEnemies"                   
[171] "ZSOFMusicEnemies"                     
[172] "DinerDashDifferenceScore"             
[173] "SOFDifferenceScore"                   
[174] "PleasantScoreForAngryMusic"           
[175] "PleasantScoreForExcitingMusic"        
[176] "PleasantScoreForNeutralMusic"         
[177] "AngryScoreForAngryMusic"              
[178] "AngryScoreForExcitingMusic"           
[179] "AngryScoreForNeutralMusic"            
[180] "ExcitedScoreForExcitingMusic"         
[181] "ExcitedScoreForNeutralMusic"          
[182] "ActiveScoreForExcitingMusic"          
[183] "ActiveScoreForNeutralMusic"           
[184] "ExcitedScoreForAngryMusic"            
[185] "ActiveScoreForAngryMusic"

And see if you can figure out what range the likert scores are in. What’s the highest number on the likert scale, and what’s the lowest? (Hint, d$Game1Angry1 is one of the likert rating columns, and you may want to use unique or range or hist)

## your code here
min(d$Game1Angry1, na.rm = T)

[1] 1

max(d$Game1Angry1, na.rm = T)

[1] 7

Highest number: 7 Lowest number: 1

cleaning up a bit

First, we’ll get rid of rows and columns of the data that we don’t need.

filter out excluded rows

First, we need to filter out any rows that should be excluded. According to the report, there are two exclusions:

“exclude data from participant 2 and participant 23 participant 2 is female, and this is a males only study participant 23 was set up on part 2 of the study (the music ratings) twice and never did part 1”

You can see participant 23’s data and the fact that they did not do part 1 by looking at the last rows of the dataframe:

tail(d) %>% kable()

	Subject	Cond	Exper	Inifile	Date	Time	Game1Angry1	Game1Angry2	Game1Angry3	Game1AngryFriends	Game1AngryStrangers	Game1CalmFriends	Game1CalmStrangers	Game1ExcitedFriends	Game1ExcitedStrangers	Game1Exciting1	Game1Exciting2	Game1Exciting3	Game1Intro	Game1Neutral1	Game1Neutral2	Game1Neutral3	Game2Angry1	Game2Angry2	Game2Angry3	Game2AngryFriends	Game2AngryStrangers	Game2CalmFriends	Game2CalmStrangers	Game2ExcitedFriends	Game2ExcitedStrangers	Game2Exciting1	Game2Exciting2	Game2Exciting3	Game2Intro	Game2Neutral1	Game2Neutral2	Game2Neutral3	Game3Angry1	Game3Angry2	Game3Angry3	Game3AngryFriends	Game3AngryStrangers	Game3CalmFriends	Game3CalmStrangers	Game3ExcitedFriends	Game3ExcitedStrangers	Game3Exciting1	Game3Exciting2	Game3Exciting3	Game3Intro	Game3Neutral1	Game3Neutral2	Game3Neutral3	Game4Angry1	Game4Angry2	Game4Angry3	Game4AngryFriends	Game4AngryStrangers	Game4CalmFriends	Game4CalmStrangers	Game4ExcitedFriends	Game4ExcitedStrangers	Game4Exciting1	Game4Exciting2	Game4Exciting3	Game4Intro	Game4Neutral1	Game4Neutral2	Game4Neutral3	MusicSelectionEnd	MusicSelectionInstrx	RecallSelectionEnd	RecallSelectionInstrx	Subject2	Cond2	Exper_A	Inifile_A	Date_A	Time_A	DescribeMusic	HowActiveAngry1	HowActiveAngry2	HowActiveAngry3	HowActiveExciting1	HowActiveExciting2	HowActiveExciting3	HowActiveNeutral1	HowActiveNeutral2	HowActiveNeutral3	HowAngryAngry1	HowAngryAngry2	HowAngryAngry3	HowAngryExciting1	HowAngryExciting2	HowAngryExciting3	HowAngryNeutral1	HowAngryNeutral2	HowAngryNeutral3	HowExcitedAngry1	HowExcitedAngry2	HowExcitedAngry3	HowExcitedExciting1	HowExcitedExciting2	HowExcitedExciting3	HowExcitedNeutral1	HowExcitedNeutral2	HowExcitedNeutral3	HowPleasantAngry1	HowPleasantAngry2	HowPleasantAngry3	HowPleasantExciting1	HowPleasantExciting2	HowPleasantExciting3	HowPleasantNeutral1	HowPleasantNeutral2	HowPleasantNeutral3	MusicRatingEnd	MusicRatingInstrx	WhichGames	aboutyou	age	distractions	endinstructions	ethnicity	overlooking	race	sex	whatabout	year	Subject3	DDNoMusicLevel	DDNoMusicScore	DDMusicLevel	DDMusicScore	SOFNoMusicEnemies	SOFNoMusicFriendlies	SOFNoMusicTime	SOFMusicEnemies	SOFMusicFriendlies	SOFMusicTime	GameComments	DoNotUseVideoGamePerformanceData	ConfrontationalAngryMusicScore	ConfrontationalExcitingMusicScore	ConfrontationalNeutralMusicScore	ConfrontationalAngryRecallScore	ConfrontationalExcitingRecallScore	ConfrontationalNeutralRecallScore	NonconfrontationalAngryMusicScore	NonconfrontationalExcitingMusicScore	NonconfrontationalNeutralMusicScore	NonconfrontationalAngryRecallScore	NonconfrontationalExcitingRecallScore	NonconfrontationalNeutralRecallScore	ConfrontationalAngerScore	ConfrontationalExcitingScore	ConfrontationalNeutralScore	NonconfrontationalAngerScore	NonconfrontationalExcitingScore	NonconfrontationalNeutralScore	Usable	DoNotUse	ProblemDetails	DinerDashWithMusicScore	DinerDashWithoutMusicScore	MusicCondition	ZDinerDashWithMusicScore	ZDinerDashWithoutMusicScore	ZSOFNoMusicEnemies	ZSOFMusicEnemies	DinerDashDifferenceScore	SOFDifferenceScore	PleasantScoreForAngryMusic	PleasantScoreForExcitingMusic	PleasantScoreForNeutralMusic	AngryScoreForAngryMusic	AngryScoreForExcitingMusic	AngryScoreForNeutralMusic	ExcitedScoreForExcitingMusic	ExcitedScoreForNeutralMusic	ActiveScoreForExcitingMusic	ActiveScoreForNeutralMusic	ExcitedScoreForAngryMusic	ActiveScoreForAngryMusic
86	87	1	C:151Part1.exp	default.mlp	13644633600	40065	1	3	4	6	7	1	1	1	1	1	1	1	ok	2	2	3	5	5	7	1	7	4	4	2	2	5	1	1	ok	1	1	1	5	3	6	1	2	5	6	4	2	1	1	1	ok	5	1	2	3	1	4	1	1	7	7	7	7	2	5	5	ok	5	5	4	ok	ok	ok	ok	87	1	C:151Part2.exp	default.mlp	13644633600	42314	2	5	5	4	5	5	5	1	1	1	3	5	1	1	1	1	1	5	1	4	4	4	3	4	4	1	2	1	3	3	4	2	4	3	3	3	2	ok	ok	ok	ok	20	ok	ok	2	ok	2	1	ok	2	87	3	0	3	170	15	0	13140	25	1	23160	Participant died, restart	1	4.166667	1.666667	1.666667	6.50	1.25	1.75	3.666667	2.500000	3.666667	1.25	4.25	5.75	5.1	1.5	1.7	2.7	3.2	4.5	1	NA		5170	5000	Anger	-1.0204467	0.2692740	-0.1401958	0.5785486	-1.2897207	0.7187444	3.333333	3.000000	2.666667	3.000000	1.000000	2.333333	3.666667	1.333333	5.000000	1.000000	4.000000	4.666667
87	88	6	C:151Part1.exp	default.mlp	13644633600	51237	7	7	5	4	1	4	4	7	4	7	7	6	ok	2	1	1	7	7	4	1	1	5	6	7	4	7	1	1	ok	1	1	1	2	1	7	1	1	7	2	7	3	2	1	1	ok	4	6	2	2	1	7	3	4	2	6	7	7	4	1	2	ok	5	3	1	ok	ok	ok	ok	88	6	C:151Part2.exp	default.mlp	13644633600	53402	2	5	5	5	5	5	5	2	2	1	5	5	1	3	1	2	1	1	1	5	5	5	5	5	5	1	5	5	1	1	5	5	5	2	5	5	5	ok	ok	ok	ok	18	ok	ok	2	ok	1	1	ok	1	88	3	0	3	866	24	0	23460	27	0	22380		NA	6.166667	4.833333	1.166667	2.50	5.50	4.50	3.333333	1.833333	3.500000	1.75	6.00	5.50	4.7	5.1	2.5	2.7	3.5	4.3	1	NA		5866	5000	Neutral	-0.0216721	0.2692740	1.0044959	0.8387424	-0.2909461	-0.1657534	2.333333	4.000000	5.000000	3.666667	2.000000	1.000000	5.000000	3.666667	5.000000	1.666667	5.000000	5.000000
88	89	2	C:151Part1.exp	default.mlp	13644633600	54293	7	6	6	7	5	3	2	7	6	3	5	2	ok	1	2	1	6	4	6	7	2	3	1	7	5	1	3	1	ok	1	2	2	2	4	4	1	1	6	4	3	6	5	5	6	ok	4	1	6	1	1	1	1	1	7	5	7	5	5	4	7	ok	5	5	3	ok	ok	ok	ok	89	2	C:151Part2.exp	default.mlp	13644633600	56552	2	5	3	4	4	5	5	1	2	1	5	5	4	2	3	1	1	1	1	5	5	4	3	4	5	2	2	1	3	3	2	3	3	5	4	4	5	ok	ok	ok	ok	18	ok	ok	2	ok	2	1	ok	1	89	2	3280	3	820	7	0	8880	31	0	23100		1	5.833333	2.500000	1.500000	5.25	6.25	2.25	2.166667	5.333333	4.000000	1.00	4.25	5.25	5.6	4.0	1.8	1.7	4.9	4.5	1	NA		5820	3280	Exciting	-0.0876830	-1.1667773	-1.1576995	1.3591301	1.0790942	2.5168296	2.666667	3.666667	4.333333	4.666667	2.000000	1.000000	4.000000	1.666667	4.666667	1.333333	4.666667	4.000000
89	90	3	C:151Part1.exp	default.mlp	13644633600	58190	5	5	5	7	7	1	1	4	1	1	1	1	ok	1	1	6	5	1	7	7	7	1	1	1	4	3	2	2	ok	1	3	1	1	1	5	2	2	7	6	7	7	2	1	1	ok	4	4	7	1	3	1	3	3	5	4	7	7	2	4	5	ok	1	2	5	ok	ok	ok	ok	90	3	C:151Part2.exp	default.mlp	13644633600	60558	2	5	5	3	5	5	5	1	1	1	5	5	3	3	1	1	1	1	1	5	5	5	4	5	4	1	1	2	2	1	3	1	5	2	4	4	5	ok	ok	ok	ok	18	ok	ok	2	ok	2	1	ok	1	90	2	3040	3	0	22	2	28440	26	0	25500		NA	4.666667	1.666667	2.166667	7.00	3.25	1.00	2.000000	2.500000	3.833333	2.25	7.00	6.00	5.6	2.3	1.7	2.1	4.3	4.7	1	NA		5000	3040	Neutral	-1.2644002	-1.3671565	0.7501199	0.7086455	0.1027563	-0.0414744	2.000000	2.666667	4.333333	4.333333	1.666667	1.000000	4.333333	1.333333	5.000000	1.000000	5.000000	4.333333
90	23	NA			NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA					23	1	C:151Part2.exp	default.mlp	13643078400	61329	2	4	5	5	3	3	3	3	4	3	3	3	2	3	2	2	2	2	2	4	4	5	5	5	3	3	4	4	1	1	1	1	2	1	3	3	3	ok	ok	ok	ok	20	ok	ok	2	ok	1	1	ok	2	23	2	3990	3	750	9	2	19260	18	2	24120		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	1	Participant 23 was set up on part 2 of the survey when he was supposed to be set up on part 1; he did part 2 twice; data should be excluded entirely	5750	3990	NA	-0.1881345	-0.5739887	-0.9033236	-0.3321298	0.3858541	0.5711938	1.000000	1.333333	3.000000	2.666667	2.333333	2.000000	4.333333	3.666667	3.000000	3.333333	4.333333	4.666667
91	23	NA			NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA					23	1	C:151Part2.exp	default.mlp	13643078400	63502	2	4	3	5	4	3	5	4	4	2	2	3	2	3	3	1	2	2	1	3	3	3	3	5	4	3	4	3	2	2	1	2	5	3	5	5	1	ok	ok	ok	ok	20	ok	ok	2	ok	1	1	ok	2	23	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA		NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	1	Participant 23 was set up on part 2 of the survey when he was supposed to be set up on part 1; he did part 2 twice; data should be excluded entirely	NA	NA	NA	NA	NA	NA	NA	NA	NA	1.666667	3.333333	3.666667	2.333333	2.333333	1.666667	4.000000	3.333333	4.000000	3.333333	3.000000	4.000000

Notice that participant 23 has missing values for part 1.

The researchers have made a column called DoNotUse based on their exclusion criteria. Use this column to filter the dataframe! (hint: this is a little trickier than it might be because of how R treats NA values. You may want to use ?unique to check values in the column and check out ?is.na.)

filtered_d = d %>% 
  filter(is.na(DoNotUse)) # your code here: exclude subjects that are marked as "DoNotUse"

It’s good practice to assign a new variable name (in this case filtered_d) to a data frame when you change it in an important way, or apply a code chunk that shouldn’t be run twice. This helps prevent you seeing different results when you run your code in chunks (and might run one multiple times, or skip it, etc.) vs. knit the document.

get rid of unnecessary columns

The dataset contains a bunch of columns we don’t care about:

The dataset contains three subject columns, which are identical except for a single NA which is not mentioned in the protocol, and so is likely an error.
Columns telling us the path to the executable run for each part of the experiment, we don’t really care about that.
Etc.

To get rid of these, we’ll use the select function to take only the columns we need.

filtered_d = filtered_d %>% 
  select(c("Subject", "Cond"), # Generally important columns for both hypotheses
         contains("Game"), # we want all the game columns for hypothesis 1
         -contains("Intro"), -c("WhichGames", "GameComments"), # except these
         starts_with("DinerDashWith"), c("SOFMusicEnemies", "SOFNoMusicEnemies")) # These columns are for hypothesis 2

Even better, let’s split this into separate data frames for hypothesis 1 and hypothesis 2, since they are different types of experiments with different measurements, and therefore different analyses that will need to be performed. Now that we’ve cleaned up the data, this is pretty easy to do! We’ll just drop the columns that are for the other hypothesis. The select function lets us choose which columns to remove (instead of which to keep) by putting a minus sign in front of them. First, let’s create a dataset for the rating hypothesis by getting rid of the game performance columns:

rating_hyp_d = filtered_d %>% 
  filter(is.na(DoNotUseVideoGamePerformanceData)) %>% # first, let's get rid of the subjects who did so poorly on one game that their data is unusable
  select(-DoNotUseVideoGamePerformanceData, # now get rid of that column
         -starts_with("DinerDash"), # and the other columns we don't need
         -starts_with("SOF"))

Now you try! Fill in the selection criteria to get rid of the “Game” columns, which we don’t need for the performance hypothesis. (It’s simpler than the code block above, because you don’t need to do a filter first, only a select.)

performance_hyp_d = filtered_d %>% 
  select(-contains("Game")) # your code here: remove the columns containing "Game" in the name

Converting to long form

Now we want to convert the data to long form, to make the rest of our manipulations easier. To do this, we can use pivot_longer on the target columns. This will take many columns, and change the column names into entries in a “key” column, while the values that were in the original column will be turned into entries in a “value” column. It’s easiest to see with an example:

tiny_demo_d = head(performance_hyp_d, 2) # get just the first two subjects performance data, for a demo

First, take a look at the original wide-form data:

tiny_demo_d

  Subject Cond DinerDashWithMusicScore DinerDashWithoutMusicScore
1       1    2                    5830                       5000
2       3    1                    5370                       1250
  SOFMusicEnemies SOFNoMusicEnemies
1              19                22
2              23                15

Now, take a look at the long-form version:

tiny_demo_d %>% pivot_longer(cols=-c("Subject", "Cond"), # this tells it to transform all columns *except* these ones
                             names_to='Measurement', 
                             values_to='Value')

# A tibble: 8 × 4
  Subject  Cond Measurement                Value
    <dbl> <dbl> <chr>                      <dbl>
1       1     2 DinerDashWithMusicScore     5830
2       1     2 DinerDashWithoutMusicScore  5000
3       1     2 SOFMusicEnemies               19
4       1     2 SOFNoMusicEnemies             22
5       3     1 DinerDashWithMusicScore     5370
6       3     1 DinerDashWithoutMusicScore  1250
7       3     1 SOFMusicEnemies               23
8       3     1 SOFNoMusicEnemies             15

See how the columns have been converted into rows (except for the two we excluded), and the dataset has gone from wide to long?

Now let’s actually convert the performance dataset

performance_hyp_long_d = performance_hyp_d %>%
  pivot_longer(cols=-c("Subject", "Cond"),
               names_to='Measurement', 
               values_to='Score')

head(performance_hyp_long_d)

# A tibble: 6 × 4
  Subject  Cond Measurement                Score
    <dbl> <dbl> <chr>                      <dbl>
1       1     2 DinerDashWithMusicScore     5830
2       1     2 DinerDashWithoutMusicScore  5000
3       1     2 SOFMusicEnemies               19
4       1     2 SOFNoMusicEnemies             22
5       3     1 DinerDashWithMusicScore     5370
6       3     1 DinerDashWithoutMusicScore  1250

And you can convert the rating dataset! (Call the “Key” column “Measurement” and call the “Value” column “Rating”, so that the code below will work)

rating_hyp_long_d = rating_hyp_d %>%
  pivot_longer(cols=-c("Subject", "Cond"), 
               names_to = "Measurement",
               values_to = "Rating")

head(rating_hyp_long_d)

# A tibble: 6 × 4
  Subject  Cond Measurement         Rating
    <dbl> <dbl> <chr>                <dbl>
1       1     2 Game1Angry1              6
2       1     2 Game1Angry2              6
3       1     2 Game1Angry3              5
4       1     2 Game1AngryFriends        2
5       1     2 Game1AngryStrangers      5
6       1     2 Game1CalmFriends         2

Splitting columns

The measurement column in each dataset now contains a bunch of different types of information. Really, we would like these to be separate columns. For example, we could have one column telling you which video-game it is, and one telling you whether there was music. Tidyverse contains some handy features for splitting columns, but unfortunately the measurement names here are not well suited to it (if the different types of information were always the same length, or were separated by a symbol like “.” or “_“, it would be easy). Thus we’ll have to do a bit of manual testing. We can use the mutate function in dplyr to create new columns as functions of old ones (or alter existing columns). We’ll also use the grepl function, which lets us test whether a regular expression (a fancy type of search pattern) is contained in a column name. For most your purposes, you can probably just use grepl to search for strings, but there are some other quite useful functions in regular expressions, like the”or”” function (|) we use below.

performance_hyp_long_d = performance_hyp_long_d %>% 
  mutate(ConfrontationalGame = grepl("SOF", Measurement), # create a new variable that will say whether the measurement was of the game soldier of fortune (SOF).
         WithMusic = !grepl("NoMusic|WithoutMusic", Measurement), # creates a new column named WithMusic, which is False if the measurement contains *either* "NoMusic" or "WithoutMusic"
         MusicCondition = factor(ifelse(Cond > 3, Cond-3, Cond), levels = 1:3, labels = c("Anger", "Exciting", "Neutral"))) # Get rid of uninterpretable condition labels

Now you can help! For the rating dataset, write a test on a measurement name, using grepl or %in% to figure out whether it’s a recall or a music rating. Your new IsRecall column should be true if the measurement name contain either “Friends” or “Strangers”.

rating_hyp_long_d = rating_hyp_long_d %>%
  mutate(
    IsRecall = grepl("Friends|Strangers", Measurement),## Your code here
  )

Here are a couple other useful ways of manipulating columns. (You won’t remember all the functions you see here now, but that’s okay. You can always reference this tutorial later if there’s something you need to figure out how to do.)

rating_hyp_long_d = rating_hyp_long_d %>%
  mutate(
    GameNumber = as.numeric(substr(rating_hyp_long_d$Measurement, 5, 5)),
    ConfrontationalGame = GameNumber <= 2, # in a mutate, we can use a column we created (or changed) right away. Games 1 and 2 are confrontational, games 3 and 4 are not.
    Emotion = str_extract(Measurement, "Angry|Neutral|Excited|Exciting|Calm"),
    Emotion = ifelse(Emotion == "Excited", "Exciting", # this just gets rid of some annoying labeling choices
              ifelse(Emotion == "Calm", "Neutral", Emotion))
    )

Groups, Summaries, and Results

Performance Hypothesis

For the performance data, we need to do a little bit of manipulation of the columns in order to get to the performance measures the experimenters actually used. Because they want to compare changes in performance across games that have very different scoring systems, the easiest solution is to compare z-scores. The way they did this was to z-score performance before music, z-score performance after music, and then create a difference measure which is a difference of z-scores. (To my mind, this is actually not quite the correct way to analyze this data, but like the replication we will follow the original authors.)

We’ll add a new z-scored value column. However, we have to be careful! We want to z-score within groups of the rows, that are all the same type of measurement. For example, we want to z-score the “DinnerDashWithMusic” scores with respect to eachother, but not with respect to the scores from the other game, for example. We can use the group_by function to set groups, and then all the changes we apply will only occur within those groups until we ungroup the dataset.

To make this more concrete, let’s see how the group_by function can let us compute means within different groups, for example mean scores on the two different games.

performance_hyp_long_d %>% 
  group_by(ConfrontationalGame) %>%
  summarize(AvgScore = mean(Score, na.rm=T)) # the na.rm tells R to ignore NA values

# A tibble: 2 × 2
  ConfrontationalGame AvgScore
  <lgl>                  <dbl>
1 FALSE                 5288. 
2 TRUE                    18.3

This makes it clear why we can’t just z-score the games together! The scores are very different between games. So let’s z-score within groups (using the scale function):

performance_hyp_long_d = performance_hyp_long_d %>%
  group_by(ConfrontationalGame, WithMusic) %>% # we're going to compute four sets of z-scores, one for the confrontational game without music, one for the confrontational game with, one for the nonconfrontational game without music, and one for the nonconfrontational game with
  mutate(z_scored_performance = scale(Score)) %>%
  ungroup()

Rating Hypothesis

The rating hypothesis analysis also requires some grouped manipulation. The experimenters collected repeated measures on ratings in each emotion category and each music/recall category from each game. For this analysis, they averaged all the ratings over the following two variables: the given emotion and the game type, to produce a nice summary. Your job is to implement this, calling the new variable MeanRating, and save the summarized data in a new data frame called rating_summary_d. (Hint: use a group_by and a summarize.)

rating_summary_d = rating_hyp_long_d %>% 
  group_by(ConfrontationalGame, Emotion) %>% 
  summarise(MeanRating = mean(Rating, na.rm=T))

`summarise()` has grouped output by 'ConfrontationalGame'. You can override
using the `.groups` argument.

Let’s take a look at the result:

rating_summary_d

# A tibble: 6 × 3
# Groups:   ConfrontationalGame [2]
  ConfrontationalGame Emotion  MeanRating
  <lgl>               <chr>         <dbl>
1 FALSE               Angry          2.72
2 FALSE               Exciting       3.97
3 FALSE               Neutral        3.68
4 TRUE                Angry          4.68
5 TRUE                Exciting       3.05
6 TRUE                Neutral        2.16

And a simple bar plot (don’t worry too much about what exactly this code is doing):

ggplot(rating_summary_d, aes(x=ConfrontationalGame, y=MeanRating, fill=Emotion)) +
  geom_bar(position="dodge", stat="identity") + 
  scale_fill_brewer(palette="Set1")

Up to reordering (and the fact that we didn’t compute error bars), this is a pretty decent replication of Fig. 1 from the original Tamir et al. paper. The ratings were highest for Angry in the confrontational game, and lowest for Angry in the non-confrontational game.

And the long form dataset makes it easy to run a linear model (don’t worry too much about this, we’ll talk more about it in 252).

model = lm(Rating ~ ConfrontationalGame * Emotion, rating_hyp_long_d)
summary(model)


Call:
lm(formula = Rating ~ ConfrontationalGame * Emotion, data = rating_hyp_long_d)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6787 -1.1553 -0.0468  1.3170  4.8447 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                              2.71702    0.07965  34.114   <2e-16
ConfrontationalGameTRUE                  1.96170    0.11264  17.416   <2e-16
EmotionExciting                          1.25319    0.11264  11.126   <2e-16
EmotionNeutral                           0.96596    0.11264   8.576   <2e-16
ConfrontationalGameTRUE:EmotionExciting -2.88511    0.15929 -18.112   <2e-16
ConfrontationalGameTRUE:EmotionNeutral  -3.48936    0.15929 -21.905   <2e-16
                                           
(Intercept)                             ***
ConfrontationalGameTRUE                 ***
EmotionExciting                         ***
EmotionNeutral                          ***
ConfrontationalGameTRUE:EmotionExciting ***
ConfrontationalGameTRUE:EmotionNeutral  ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.727 on 2814 degrees of freedom
Multiple R-squared:  0.1896,    Adjusted R-squared:  0.1882 
F-statistic: 131.7 on 5 and 2814 DF,  p-value: < 2.2e-16

Performance Hypothesis (Continued)

There are still a few more steps to go for the performance hypothesis. We need to take a difference score to see how people improved from before hearing the music to after, and then see if the improvement is larger if they heard music congruent with the type of game.

To compute the difference score, we have to make our data a bit wider. We now want to subtract the pre-music scores from the post-music scores, which is easiest to do if they are in two different columns. To do this we’ll use the pivot_wider function (which is more or less the opposite of pivot_longer)

performance_diff_d = performance_hyp_long_d %>%
  mutate(WithMusic = factor(WithMusic, levels=c(F, T), labels=c("PreMusic", "PostMusic"))) %>% # first, tweak the variable so our code is easier to read.
  select(-c("Score", "Measurement")) %>% # now we remove columns we don't need (bonus: leave them in and see if you can understand what goes wrong!)
  pivot_wider(names_from=WithMusic, 
              values_from=z_scored_performance) %>%
  mutate(ImprovementScore=PostMusic-PreMusic)

Let’s take a look at the end result:

performance_diff_d

# A tibble: 176 × 7
   Subject  Cond ConfrontationalGame MusicCondition PostMusic[,1] PreMusic[,1]
     <dbl> <dbl> <lgl>               <fct>                  <dbl>        <dbl>
 1       1     2 FALSE               Exciting             -0.0751        0.262
 2       1     2 TRUE                Exciting             -0.205         0.739
 3       3     1 FALSE               Anger                -0.732        -2.86 
 4       3     1 TRUE                Anger                 0.313        -0.150
 5       4     4 FALSE               Anger                 1.48          1.71 
 6       4     4 TRUE                Anger                -0.205        -1.68 
 7       5     5 FALSE               Exciting              1.24          0.311
 8       5     5 TRUE                Exciting              0.313         0.231
 9       6     6 FALSE               Neutral               0.710         0.960
10       6     6 TRUE                Neutral               0.442         0.866
# ℹ 166 more rows
# ℹ 1 more variable: ImprovementScore <dbl[,1]>

If you don’t understand every step of that code (or any other dplyr code), it can be helpful to look at the result of running just the first line, then just the first two lines, and so on.

Now we’re finally to reproduce Fig. 2 from Tamir et al., we just need to get the mean differences within each game and each kind of music, and save them to a variable called MeanImprovementScore:

performance_diff_summary_d = performance_diff_d %>%
  group_by(ConfrontationalGame, MusicCondition) %>% 
  summarise(MeanImprovementScore = mean(ImprovementScore, na.rm = T),
            SEMImprovementScore = sd(ImprovementScore, na.rm = T)/n())

`summarise()` has grouped output by 'ConfrontationalGame'. You can override
using the `.groups` argument.

Let’s take a look at your result (if it has NA values, how can you fix it?):

performance_diff_summary_d

# A tibble: 6 × 4
# Groups:   ConfrontationalGame [2]
  ConfrontationalGame MusicCondition MeanImprovementScore SEMImprovementScore
  <lgl>               <fct>                         <dbl>               <dbl>
1 FALSE               Anger                       -0.179               0.0478
2 FALSE               Exciting                    -0.0182              0.0256
3 FALSE               Neutral                      0.114               0.0290
4 TRUE                Anger                        0.0612              0.0319
5 TRUE                Exciting                     0.169               0.0332
6 TRUE                Neutral                     -0.225               0.0350

and plot it!

ggplot(performance_diff_summary_d, aes(x=ConfrontationalGame, y=MeanImprovementScore, fill=MusicCondition)) +
  geom_bar(position="dodge", stat="identity") + 
  geom_errorbar(aes(x = ConfrontationalGame, 
                    ymin = MeanImprovementScore - SEMImprovementScore, 
                    ymax = MeanImprovementScore + SEMImprovementScore),
                position = "dodge") +
  scale_fill_brewer(palette="Set1")

(Bonus: also calculate the SEM in the summary data, and then add errorbars to the plot with geom_errorbar!)

Not quite as exact a replication of the effect as Fig. 1. This concurs with the replication report, which says that the hypothesis 1 effect replicated, but hypothesis 2 did not. Here’s a model just for thoroughness (again, don’t worry too much about it):

performance_model = lm(ImprovementScore ~ ConfrontationalGame * MusicCondition, performance_diff_d)
summary(performance_model)


Call:
lm(formula = ImprovementScore ~ ConfrontationalGame * MusicCondition, 
    data = performance_diff_d)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5402 -0.6284 -0.0744  0.6253  2.8550 

Coefficients:
                                               Estimate Std. Error t value
(Intercept)                                     -0.1786     0.1895  -0.942
ConfrontationalGameTRUE                          0.2398     0.2760   0.869
MusicConditionExciting                           0.1603     0.2657   0.603
MusicConditionNeutral                            0.2926     0.2657   1.101
ConfrontationalGameTRUE:MusicConditionExciting  -0.0530     0.3815  -0.139
ConfrontationalGameTRUE:MusicConditionNeutral   -0.5786     0.3800  -1.523
                                               Pr(>|t|)
(Intercept)                                       0.347
ConfrontationalGameTRUE                           0.386
MusicConditionExciting                            0.547
MusicConditionNeutral                             0.272
ConfrontationalGameTRUE:MusicConditionExciting    0.890
ConfrontationalGameTRUE:MusicConditionNeutral     0.130

Residual standard error: 1.003 on 164 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.02179,   Adjusted R-squared:  -0.008033 
F-statistic: 0.7306 on 5 and 164 DF,  p-value: 0.6014