Part 1: Summary and Reaction

The paper my group and I were assigned to replicate was:

Carstensen, L. L., Shavit, Y. Z., & Barnes, J. T. (2020). Age Advantages in Emotional Experience Persist Even Under Threat From the COVID-19 Pandemic. Psychological Science, 31(11), 1374–1385. https://doi.org/10.1177/0956797620967261

Summary

Literature has shown evidence for emotional well-being improving with age, however the mechanisms to explain this phenomenon were not quite well known. The socioemotional selective theory (SST) argues that because older individuals have shorter time horizons, their goals shift to those which derive more emotional meaning, whereby younger individuals focus more on longer-term goals. In contrast, the strength and vulnerability integration (SAVI) model posits that age-related improvements are a result of avoidance of stressful situations, whereby older individuals avoid stressful situations more. To explore which theory better supports age-related changes in emotional experience, it was hypothesized that researchers would need to expose younger and older individuals to prolonged high-stress conditions. However due to this being highly unethical, this question could not be explored in test conditions.

However, Carstensen and colleagues sought to answer this question during the COVID-19 pandemic, the global pandemic providing an adequate environment when individuals were exposed to prolonged high-stress conditions. Extending on previous research, they aimed to explore the age-related differences in emotional well-being during the pandemic. They hypothesized that if avoidance was the reason for these age-related changes, then the environment of the pandemic would eliminate age differences, and the SAVI model would be supported. However, if they did find age related changes, they hypothesised that this would support the SST theory.

This study involved recruting a sample from Prolific during the 2020 COVID-19 period, the final sample size being 945 American adults, ranging from 18 to 76 year olds. Carstensen and colleagues provided them with a Qualtrics survey collecting their age, and frequency and intensity of positive and negative emotions experienced. They also collected other general demographic statistics (race, income, employment, education, occupants per household), personality traits, and Likert-scale data regarding their personality traits, health, perception of risks during the pandemic, and future time perspectives. Upon completion of the survey, participants received a $4 compensation, and the authors generated a range of descriptive statistics, inferential analyses and figures on the different variables collected.

The main finding of this study was that older age was associated with greater frequency and intensity of positive emotions, in comparison to negative emotions. Furthermore, they found that older ages were positively correlated with a greater perceived risk to COVID-19, however when this factor was controlled for, the effect of better emotional well-being in older ages was still observed. This thus supported the SST theory over the SAVI model. Additionally, when accounting for other characteristics like personality and demographic details, the association between age and well-being was still significant (except for age and intensity of positive emotions). Moreover, participants reported their positive emotions to be more frequent and intense than negative emotions.

Reactions

The logic of the rationale was compelling, as the COVID-19 pandemic allowed the researchers to explore the age-old question on which theory best explains the underlying cause of greater emotional well-being in older people. As mentioned in the summary above, testing this question before COVID-19 would be unethical, as inducing individuals to inescapable and high-stress situations has a high risk of causing psychological damage in individuals. However, the recent pandemic provided an environment where this could be investigated, as it globally affected people and it was not possible to escape the risks posed by COVID-19. This gave the researchers a rare opportunity to explore questions of the relationship between emotional well-being and age, that otherwise may not have ever been answered. Based on the time frame, and the opportunity given, I think the authors designed a well thought out experiment, and it was interesting to see what they were able to find.

I was surprised that the experimenters recorded so many different variables, but did not explore the relationship between each of them deeper. Whilst the experimenters did measure variables that met the aim of their experiment and reported results that accordingly answered their question of the effect between age and emotional well-being, I feel that it might have been worthwhile to dive deeper and explore more of the variables that were recorded. For instance, the frequency and intensity of 16 positive and 13 negative emotions were collected, yet only the average of these positive and negative emotions were explored with age. I think an exploration of each emotion would have provided good insight on how much each emotion contributed to overall well-being. Furthermore, exploring relationships with the other variables collected to be controlled for could uncover more details about the research question.

I can see that this paper relates to improving the reproducibility crisis prevalent in psychology, by taking steps to contribute to open science. In this course, we have learnt a lot about the reproducibility crisis, and how in recent decades steps are being taken to promote open science, which encompasses being open and transparent with experimental and analysis procedures used in experiments, overall contributing to better reproducibility and more credible articles. It was thus really interesting to see that open science practices I learnt in the course were adopted in Carstensen and colleague’s study. They pre-registered their participants and exclusion criteria, ensuring that they do not modify their sample size later to skew their results to confirm their hypothesis. Additionally, they also used R to do their analyses, which is an open-source program and can be accessed by everyone, removing barriers to access. Furthermore, they also provided their data, and a codebook to accompany this, this data sharing ensuring their results can be reproduced and replicated by others. Overall, it was really good to see that open science practices are being implemented, and steps are being taken to improve the reproducibility crisis.

Part 2: Verification

Verification Goals

Our reproducibility goals for Carstensen and Colleagues paper was to reproduce the sample characteristics (i.e. demographic statistics), any statistics reported in text including the mean, standard deviation, and confidence intervals reported in the text, and certain tables and figures.

The sample characteristics to reproduce were (which were all found on page 1376):

“…final sample size of 945 adults”
“49.2% female and ranged from 18 to 76 years (M = 45.15, SD = 16.79)”
“seventy-five percent of partipants identified as White”
“The median household income for the sample was between $50,000 and $60,000”
“Fifty-six percent of participants were currently working for pay”
“88% of participants reported attending at least some college”
“Twenty-three percent of participants reported living alone”

The statistics reported in-text to reproduce were (which were all found on chronologically across pages 1377 and 1378):

Time Horizons
- “Time horizons were measured with Future Time Perspective (FTP) scale… overall FTP score… M = 4.09, SD = 1.35”
- “calculated scores for each subscale: Extension (… M = 4.02, SD = 1.91), Opportunity (… M = 4.73, SD = 1.62), Constraint (… M = 3.74, SD = 1.76)”
Perceived Risk
- “participants rated their own risk of complications from coronovirus (M = 2.36, SD = 1.11) … risk of complications from coronavirus given their current health status (M = 2.45, SD = 1.34) … risk of contracting coronavirus in the general population (M = 2.97, SD = 0.94)”
Effect on Employment
- “extent to which [participants’] employment or retirement status had been affected by the coronavirus pandemic… (M = 1.41, SD = 1.46)”
Subjective Health
- “Subjective health was measured… higher scores indicate better health (M = 3.36, SD = 0.99)”
Personality
- “measured personality traits… means ranged from 3.44 (SD = 1.65) for extraversion to 5.23 (SD = 1.33) for conscientiousness”
Preliminary analyses
- “… participants reported positive emotions (M = 19.7, SD = 0.56) more frequently than negative emotions (M = 1.42, SD = 0.06), $t$(944) = 15.41, … 95% confidence interval (CI) for the mean difference = [0.48, 0.63]”
- “Participants also rated positive emotions (M = 1.93, SD = 0.58) as more intense than negative emotions (M = 1.77, SD = 0.70), though the difference was small, $t$(941) = 4.48, … 95% CI for the mean difference = [0.09, 0.22]”

The table to reproduce was Table 1 from the paper, found on page 1378, which is:

The figure to reproduce was Figure 1 from the paper, found on page 1379, which is:

Locate, Load and Read Packages and Data

Our first step to tackling our verification goals was locating all the files we needed, which we found in the OSF Repository. We then downloaded all the files required and kept them in our working directory.

We then created an R Markdown (RMD) file, and loaded the packages we would need to complete our verification goals by using the library() function from the base package. The main package we found that we needed to load was tidyverse.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

The reason for this is that tidyverse itself includes many different packages, which are the main packages for general data analyses. Some of these include:

ggplot2: provides functions for plotting graphs
dplyr: provides functions for data wrangling
tidyr: provides functions for tidying data
readr: provides functions to read different files and data

Next, we wanted to load the data set provided in the OSF. Since the raw data was stored as a csv file, we used the read_csv() function from the readr package to read the file and load it into our RMD. We also used the “<-” assignment operator to save the file under the name “aaec”.

aaec <- read_csv(file = "AgeAdvantagesEmotionCovid_Data.csv")

## Rows: 945 Columns: 107
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): ResponseId, gender_all, race_all, educ, income, emp, livealone, ge...
## dbl (97): f_aff, f_ener, f_acc, f_ang, f_int, f_calm, f_app, f_cont, f_dis, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Reproducing Sample Characteristics

Confirming Final Sample Size

We first began our verification goals by confirming the sample size of the experiment being 945 participants (after exclusion criteria was placed). To confirm this, we used the count() function from the dplyr package to count the number of rows in the aaec file, the number of rows corresponding to the number of participants.

However, an issue we came across however is that the count() function did not work on other laptops we used, even though it worked on the laptop we were initially working on. We are not too sure why this error occurred, hence we sought for another similar function. Through Google we found the nrow() function from the base package, which did the same thing as count() and counted the number of rows in a data frame.

nrow(aaec)

## [1] 945

The value retrieved confirmed there was 945 participants, confirming that the final sample size reported in the paper. We also found that nrow() worked for every group member and their laptops.

Creating Sample Characterstic Data Frame

Since the aaec dataset had over 100 different columns, we decided to create a new data frame to only include variables we need for sample characteristics, using the knowledge we learnt from Danielle’s video tutorials. We did this by piping the aaec file, and using the select() function from the dplyr package to select our required variables by including the column names between the parentheses. This included age, gender, employment, if participants lived alone, income, and race. We specifically selected the “_bin” variable columns for gender, employment and race, as these columns binarised the values in the variable, which was useful for the future calculations we would need to do.

aaec_samplechar <- aaec %>% 
  select(age, gender_bin_f,educ, emp_bin, livealone, income, race_bin)

We then used the head() function from the utils package, to display the first few rows from the data frame, and confirm we created it correctly.

head(aaec_samplechar)

## # A tibble: 6 × 7
##     age gender_bin_f educ                      emp_bin livealone income race_bin
##   <dbl> <chr>        <chr>                     <chr>   <chr>     <chr>  <chr>   
## 1    18 Non-female   Graduated high school (o… Not_wo… No        Less … White   
## 2    18 Non-female   Graduated high school (o… Not_wo… No        Less … Non-whi…
## 3    18 Non-female   Some college or technica… Not_wo… No        $220,… White   
## 4    18 Non-female   Less than high school     Not_wo… No        Decli… White   
## 5    18 Non-female   Graduated high school (o… Working No        $20,0… Non-whi…
## 6    18 Female       Graduated high school (o… Not_wo… No        $100,… Non-whi…

We also generated a descriptive summary of aaec_samplechar using the summary() function from the base package. We learnt this through Danielle’s video tutorials, and used it to also confirm we collected the right data, and confirm the descriptive statistics for age.

summary(aaec_samplechar)

##       age        gender_bin_f           educ             emp_bin         
##  Min.   :18.00   Length:945         Length:945         Length:945        
##  1st Qu.:30.00   Class :character   Class :character   Class :character  
##  Median :46.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :45.15                                                           
##  3rd Qu.:60.00                                                           
##  Max.   :76.00                                                           
##   livealone            income            race_bin        
##  Length:945         Length:945         Length:945        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##

This confirmed that the mean, standard deviation and range for age reported in the Carstensen and colleague’s paper was correct.

Calculating Gender Statistics

We then moved on to confirming the percentages reported under the sample characteristics, starting off with calculating the percentage of females in the sample.

Our first initial challenge, and what ended up being one of our biggest challenges, was not knowing how to efficiently code, as we were all not familiar with R and did know how to do the required calculations. This was further exacerbated by the researchers not sharing their code. We initially thought we could tackle calculating percentage of females by taking a more mathematical approach and counting how many participants identified as female, and then dividing this by the total sample size (i.e. 945). However as a team we wanted to see if we could find a method by which we could use codeinstead to calculate this.

Thus, our first method involved creating a new data frame “aaec_femalecount” that only included participants who identified as female by piping the aaec_samplechar and then using the filter() function from the dplyr package. This allowed us to filter only the participants that reported themselves as “Female”, by using equality operator “==” in the parentheses of the filter() function.

aaec_femalecount <- aaec_samplechar %>%
  filter(gender_bin_f == "Female")

We then divided the number of rows in the aaec_femalecount data frame by aaec_samplechar data frame, to calculate the proportion of females. We did this by using the nrow() function to count the number of rows in each data frame, using the “/” symbol to divide the number of rows against each other, and then multiplying this with 100 to get a percentage.

nrow(aaec_femalecount) / nrow(aaec_samplechar) *100

## [1] 49.20635

This gave us the required value of 49.2%, and confirmed the original reported value.

However, as a team we agreed that this method was not intuitive, and hence we tried another method, involving using the ifelse() function from the base package. We first used the $ operator to create a new column called “female” in aaec_samplechar, whihc was then assigned to the ifelse() function. What this function did, based on what is contained in the parentheses, is scan through the gender_bin_f column in the aaec_samplechar data frame, and assign all “Female” values with 1, and anything else in the column with “0”.

aaec_samplechar$female <- ifelse(aaec_samplechar$gender_bin_f == "Female", 1,0)

We then used the sum() function from the base package to count the all the values assigned “1” in the female column, and divided this against the sum of of females and non-females. Lastly we multiplied this by 100 to find the percentage of females.

sum(aaec_samplechar$female == "1") / (sum(aaec_samplechar$female == "1") + sum(aaec_samplechar$female == "0")) *100

## [1] 49.20635

This method also gave us the same value, and we found this method to be more intuitive and simpler than our previous method.

Calculating Education Statistics

We then needed to calculate participants who attended college. We did this using the first method we used to calculate percentage of females, as we found that method easier for this instance. Hence we created a new data frame called ‘aaec_filteredcollegecount’, and then piped the aaec_samplechar, filtering the education variable called “educ” using filter(). We then used the %in% operator to identify the different values we wanted in the educ variable column, which were the values indicating a participant had attended college, then using the c() function from the base package to combine these different values.

aaec_filteredcollegecount <- aaec_samplechar %>% 
  filter(educ %in% 
           c("Completed graduate or professional degree", "Completed 4-year college (BA, BS)", "Some college or technical school"
))

After, we used the same nrow() method we used to calculate percentage of females, to calculate percentage of participants who attended college.

nrow(aaec_filteredcollegecount) / nrow(aaec_samplechar) *100

## [1] 87.72487

This code gave us the value 87.7%, which matched the value reported in the paper (88%), allowing us to verify the value.

Calculating Race Statistics

Using the second method we used to calculate gender, which involved the ifelse() function, we calculated the percentage of “White” people in the sample. In this case, we assigned “1” to participants who were “White”, and 0 to all other participants. We then used sum() to calculate White participants and total participants, dividing these amongst each other and multiplying by 100 to get a percentage.

aaec_samplechar$white <- 
  ifelse(aaec_samplechar$race_bin == "White", 1,0)

sum(aaec_samplechar$white == "1") / (sum(aaec_samplechar$white == "1") + sum(aaec_samplechar$white == "0")) * 100

## [1] NA

This however gave us a value of NA, which we were confused by. After some Google searches and trouble shooting, we found that this calculation included participants who did not disclose their race (i.e. N/A values), and hence why our calculation could not yield a numeric value. We thus had to include include the parameter “na.rm” and equate this to “TRUE” in the sum() function, which removed any N/A values.

sum(aaec_samplechar$white == "1", na.rm = TRUE) / (sum(aaec_samplechar$white == "1", na.rm = TRUE) + sum(aaec_samplechar$white == "0", na.rm = TRUE)) * 100

## [1] 75.69002

This gave us the result of 75.69, indicating that 75.69% of participants were White. Rounding this value, it would be 76%, however the article reported that it was 75%. Unfortunately because the authors did not provide their code, we were unable to verify why this discrepancy between their value reported and the value we calculated occurred, but we believe it is either a rounding error or their code was different.

Calculating Employment Statistics

We employed the same method as before, involving the ifelse() function, to calculate the percentage of employed people, assigning “1” to those who work, and “0” to other values.

aaec_samplechar$working <- 
  ifelse(aaec_samplechar$emp_bin == "Working", 1,0)

sum(aaec_samplechar$working == "1") / (sum(aaec_samplechar$working == "1") + sum(aaec_samplechar$working == "0")) * 100

## [1] 56.19048

This number calculated was the same as the one reported in the paper, hence verifying the authors’ value.

Calculating Statistics of those who Live Alone

Again, we employed the same method as before, involving the ifelse() function, to calculate the percentage of those who live alone. assigning “1” to those who do live alone, and “0” to other values.

aaec_samplechar$alone <- 
  ifelse(aaec_samplechar$livealone == "Yes", 1,0)

sum(aaec_samplechar$alone == "1") / (sum(aaec_samplechar$alone == "1") + sum(aaec_samplechar$alone == "0")) * 100

## [1] 22.96296

This number calculated was the same as the one reported in the paper, hence verifying the authors’ value.

Calculating Median Income

We then moved on to calculating the median income. When we first started, we were not sure how to start, because the values for income were reported in income brackets, rather than specific numeric values.

We first started off by doing something we knew how to do, which was filtering out participants who chose the “Decline to answer” option. We did this by piping aaec_samplechar, using the filter() function coupled with the not equals to operator “!=” to remove the “Decline to answer” participants. We also used the assignment operator to assign our new data frame to the name “aaec_filteredincome”.

aaec_filteredincome <- aaec_samplechar %>%
  filter(income != "Decline to answer")

We then used the nrow() function check how many participants were included in the filtered amount, and found this to be 923.

nrow(aaec_filteredincome)

## [1] 923

We then thought we should arrange our income brackets into numeric values, and also into ascending order, to somehow use this to calculate median. So we did that, using our knowledge from Danielle’s video, by piping aaec_filteredincome, and using the mutate() function from the dplyr package to create a new column in our data frame. Within mutate() function, we used the recode() function also from dplyr to re-code the income column, which is the income brackets, to numeric values, re-coding the lowest income bracket to 1, and the highest income bracket to 16. We then piped this, and used the arrange() function to arrange the income brackets in ascending order.

aaec_newfilteredincome <- aaec_filteredincome %>%
  mutate(income = recode(income, "Less than $10,000" = "01", "$10,000 to $20,000" = "02", "$20,000 to $30,000" = "03", "$30,000 to $40,000" = "04", "$40,000 to $50,000" = "05", "$50,000 to $60,000" = "06", "$60,000 to $80,000" = "07", "$80,000 to $100,000" = "08", "$100,000 to $120,000" = "09", "$120,000 to $140,000" = "10", "$140,000 to $160,000" = "11", "$160,000 to $180,000" = "12", "$180,000 to $200,000" = "13", "$200,000 to $220,000" = "14", "$220,000 to $250,000" = "15", "Greater than $250,000" = "16")) %>% 
  arrange(income)

We were then unsure what to do, because when we tried to use the median() function from the stats package it did not work. We then found on Google that the class() function could be used to verify what class the values are.

class(aaec_newfilteredincome$income)

## [1] "character"

We found through this our values were considered characters, and not numeric, hence why errors were occurring when we tried to calculate the median.

By trouble shooting and doing more Google searches, we found that the suppressWarnings() function from the base package could be used to suppress the error we received, and using as.numeric() also from the base package to convert the values to numeric.

suppressWarnings(as.numeric(aaec_filteredincome$income))

##   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##  [76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [101] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [151] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [176] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [201] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [226] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [251] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [276] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [301] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [326] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [351] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [376] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [401] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [426] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [451] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [476] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [501] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [526] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [551] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [576] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [601] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [626] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [651] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [676] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [701] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [726] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [751] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [776] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [801] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [826] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [851] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [876] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [901] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

This however did not work, and changed all our values to N/A, which confused us even more.

Another method we tried was changing our column into a vector, by combining a set of values, using the c() function, which indicated how many participants there were in each income bracket, in asending order. We then changed this to a numeric value using as.numeric(), and confirmed the class was numeric using class(). We then tried to run the income_vector through the median() function, but we did not understand what the outputted value meant.

income_vector <- c('1','2','3','4','79','82','143', '77', '67', '34', '32', '20', '6', '9', '7', '15')

income_numeric <- as.numeric(income_vector)

print(income_numeric)

##  [1]   1   2   3   4  79  82 143  77  67  34  32  20   6   9   7  15

class(income_numeric)

## [1] "numeric"

median(income_numeric)

## [1] 17.5

Since we were not able to find a code that worked to give us our median income bracket, we resorted to using simple mathematics. We used the formula to find the position of the median in our sample, which is total sample + 1, divided by 2.

(923+1)/2

## [1] 462

This gave us the value 462, indicating the median was in the 462nd row of our data frame. We then decided to add the number of rows in each age bracket (and also doing this in ascending order), until we found the value that contained the 462nd row.

53+90+107+102+79+82 #The row which adds 82 to the total is "06"

## [1] 513

We found this to correspond to the “06” value, which is the $50,000 to $60,000 income bracket. This thus verified that the value reported in the original article.

Whilst we were able to verify the value, as a team we wanted to try and see if there was any other way. We thus broke off and tried researching individually. After many Google searches and YouTube videos, we did a method.

We used some of the first code we did, of mutating our data to create a new column and then arranging this, which can be seen below. We then used the median function(), and included the ‘:’ operator in between 1 and 923, to help us find which row contained the median value.

aaec_newfilteredincome <- aaec_filteredincome %>%
  mutate(income = recode(income, "Less than $10,000" = "01", "$10,000 to $20,000" = "02", "$20,000 to $30,000" = "03", "$30,000 to $40,000" = "04", "$40,000 to $50,000" = "05", "$50,000 to $60,000" = "06", "$60,000 to $80,000" = "07", "$80,000 to $100,000" = "08", "$100,000 to $120,000" = "09", "$120,000 to $140,000" = "10", "$140,000 to $160,000" = "11", "$160,000 to $180,000" = "12", "$180,000 to $200,000" = "13", "$200,000 to $220,000" = "14", "$220,000 to $250,000" = "15", "Greater than $250,000" = "16")) %>% 
  arrange(income)

median(1:923)

## [1] 462

This value showed that the median was in the 462nd row of the filtered income data frame, which matched our simple mathematics method.

We then used the square brackets “[]” to find which income bracket the median was in our data frame, whereby this found the value that corresponds to row 462 (i.e. median) and column 6 (i.e. income column).

aaec_newfilteredincome[462, 6]

## # A tibble: 1 × 1
##   income
##   <chr> 
## 1 06

This also gave us the value “06”, verifying that the median income is the $50,000 to $60,000 income bracket.

Statistics Reported In-text

Our next part to verify was the statistics reported in-text. Overall, we used the mean() function from the base package to calculate mean, and sd() function also from the base package to calculate standard deviation.

Time Horizon Statistics

First we calculated the mean of the different time horizon statistics, which included the overall FTP score (FTP_av), extension FTP score (FTP_Ext), opportunity FTP score (FTP_Opp), and constraint FTP score (FTP_Con). We used the $ operator to indicate which column we needed from aaec data frame, and used “na.rm = TRUE” to remove any N/A Values. We repeated this similar process for standard deviation calculation.

mean(aaec$FTP_av, na.rm = TRUE) #Mean for overall FTP score

## [1] 4.094168

mean(aaec$FTP_Ext, na.rm = TRUE) #Mean for extension FTP score

## [1] 4.018028

mean(aaec$FTP_Opp, na.rm = TRUE) #Mean for opportunity FTP score

## [1] 4.727466

mean(aaec$FTP_Con, na.rm = TRUE) #Mean for constraint FTP score

## [1] 3.740191

sd(aaec$FTP_av, na.rm = TRUE) #SD for overall FTP score

## [1] 1.352868

sd(aaec$FTP_Ext, na.rm = TRUE) #SD for extension FTP score

## [1] 1.906157

sd(aaec$FTP_Opp, na.rm = TRUE) #SD for opportunity FTP score

## [1] 1.618943

sd(aaec$FTP_Con, na.rm = TRUE) #SD for constraint FTP score

## [1] 1.755594

The values retrieved matched those reported in text, verifying the reported values.

Perceived Risk Statistics

Using the same method for time horizon statistics, we calculated the mean and standard deviation for perceived risks. This included their own risk of contracting the virus (risk_self), risk of complications from the virus (risk_comp), and risk of contracting the virus from the general population (risk_pop).

mean(aaec$risk_self)

## [1] 2.362963

mean(aaec$risk_comp)

## [1] 2.447619

mean(aaec$risk_pop)

## [1] 2.965079

sd(aaec$risk_self)

## [1] 1.113526

sd(aaec$risk_comp)

## [1] 1.343399

sd(aaec$risk_pop)

## [1] 0.9387196

The values also retrieved matched those reported in text, verifying the reported values.

Effect on Employment Statistics

Using the same method again, we calculated the mean and standard deviation for effect on employment, selecting the emp_affect column from our aaec data frame.

mean(aaec$emp_affect)

## [1] 1.412698

sd(aaec$emp_affect)

## [1] 1.45521

Once again, the values retrieved matched those reported in text, verifying the reported values.

Subjective Health Statistics

Using the same method again, we calculated the mean and standard deviation for the participants’ subjective health, selecting the health column from our aaec data frame.

mean(aaec$health)

## [1] 3.357672

sd(aaec$health)

## [1] 0.9856528

Once again, the values retrieved matched those reported in text, verifying the reported values.

Personality Statistics

Lastly, using the same method again, we calculated the mean and standard deviation for the the participants’ extraversion and conscientiousness, selecting these variables from our aaec data frame.

mean(aaec$Extraversion)

## [1] 3.440212

mean(aaec$Conscientiousness)

## [1] 5.224868

sd(aaec$Extraversion)

## [1] 1.64625

sd(aaec$Conscientiousness)

## [1] 1.327658

The values retrieved matched those reported in the paper, except the mean of conscientiousness, which was one decimal place off (as it was reported at 5.23, but in our calculations is 5.22 when rounded).

Frequency and Intensity of Emotions Statistics

To calculate statistics for emotions, we first created a new data frame, to include the average of positive frequency (avg_f_pos), negative frequency (avg_f_neg), positive intensity (avg_i_pos) and negative intensity (avg_i_neg). We did this by piping the aaec data frame and using the select() function to select our required variables.

# New table with emotion variables
aaec_summarystatistics <- aaec %>%
  select(avg_f_pos, avg_f_neg, avg_i_pos, avg_i_neg)

To calculate the mean of all these variables, we used the colMeans() function from the analytics package, to calculate the mean for each column in the data frame we just created, whereby each column was our different variables. We also included “na.rm = TRUE” to remove participants that reported N/A for any value.

colMeans(aaec_summarystatistics, na.rm = TRUE)

## avg_f_pos avg_f_neg avg_i_pos avg_i_neg 
##  1.972079  1.417620  1.929739  1.771569

The values retrieved all matched the original paper, except for the average intensity of positive emotions, as we calculated 1.93 when rounded but the paper reported 1.92.

To then calculate the standard deviation for our emotion variables, we used the summarise_if() function from the dplyr package. We included “is.numeric” and “sd” in the parentheses to indicate we wanted to calculate standard deviation, and “na.rm = TRUE” again to remove N/A values.

aaec_summarystatistics %>% 
  summarise_if(is.numeric, sd, na.rm = TRUE)

## # A tibble: 1 × 4
##   avg_f_pos avg_f_neg avg_i_pos avg_i_neg
##       <dbl>     <dbl>     <dbl>     <dbl>
## 1     0.563     0.661     0.588     0.700

#NOTE: avg_i_pos should is reported 0.58 in paper, but in ours rounds to 0.59

Similar to the mean, the values retrieved all matched the original paper, except for the average intensity of positive emotions, as we calculated 0.59 when rounded but the paper reported 0.58. This indicates that they either did not round properly, or manipulated the average intensity of positive emotions differently, but we are unsure to the real reason.

We then neded to calculate the mean difference confidence interval (CI) for frequency of positive and negative emotions. We were unsure what function would do this, so we tried doing it using the mathematical formula for mean difference CI to calculate the value. The formula is:

We thus attempted to use this formula and calculate the value using code, by assigning values to the different values in the formula above, and carrying out the calculations. Our attempt can be seen below:

#First Group 
n1 <- 945 #no. of participants 
sd1 <- 0.56 #standard deviation  
m1 <- 1.97 #mean 

#Second Group
n2 <- 945 #no. of participants 
sd2 <- 0.66 #standard deviation 
m2 <- 1.42 #mean 

#Pooled variance calculation
sp = ((n1-1)*sd1^2+(n2-1)*sd2^2)/(n1+n2-2)

#margin of error calculation
marg <- qt(0.975,df=n1+n2-1)*sqrt(sp/n1 + sp/n2) 

#Calculating the CI
ll <- (m1-m2) - marg
ul <- (m1-m2) + marg

#Printing CI values
print(ll)

## [1] 0.4947783

print(ul)

## [1] 0.6052217

This method however gave the wrong value, and hence again as a group we were confused with what we did. We then decided to leave this, move on to the next calculations and come back to it later.

We then moved on to conducted a t-test, and found this could be done from the t.test() function from the stats package. We thus used it to calculate it between frequency of positive and negative emotions, selecting those required columns. We also inputted “paired = TRUE” in the t.test parentheses to indicate we wanted to conduct a paired t-test.

t.test(aaec$avg_f_pos, aaec$avg_f_neg, paired = TRUE)

## 
##  Paired t-test
## 
## data:  aaec$avg_f_pos and aaec$avg_f_neg
## t = 15.409, df = 944, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.4838417 0.6250760
## sample estimates:
## mean difference 
##       0.5544588

This t-value retrieved (-15.41) matched the one reported in the paper, verifying it. Additionally, we found this function gave the mean difference confidence interval, solving our previous issue. We found this also matched the reported values in the paper.

We then repeated the paired t-test for the intensity of positive and negative emotions, using the same method as above.

t.test(aaec$avg_i_neg, aaec$avg_i_pos, paired = TRUE) #t-test result and confidence interval limits for intensity

## 
##  Paired t-test
## 
## data:  aaec$avg_i_neg and aaec$avg_i_pos
## t = -4.4788, df = 941, p-value = 8.43e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.21942630 -0.08571862
## sample estimates:
## mean difference 
##      -0.1525725

These values also matched the original paper, verifying the results.

Recreating Table 1

We then moved onto creating Table 1 from the paper, which involved calculating the mean, standard deviation and CI for the frequency of positive and negative emotions.

Calculating Mean and Standard Deviation

To calculate the mean and standard deviation, we used the mean() function and sd() function respectively. In the parentheses of each one, we included which variable we were calculating, and also “na.rm = TRUE” to remove all N/A values. We assigned each calculation, using the assignment operator, to a name which indicated if it was for mean or standard deviation, and for which emotion. The calculations for all of this can be seen below:

#Mean of frequency positive emotions
mean_calm <- mean(aaec$f_calm, na.rm = TRUE)
mean_quiet <- mean(aaec$f_qui, na.rm = TRUE)
mean_appreciative <- mean(aaec$f_app, na.rm = TRUE)
mean_interested <- mean(aaec$f_int, na.rm = TRUE)
mean_content <- mean(aaec$f_cont, na.rm = TRUE)
mean_happy <- mean(aaec$f_hap, na.rm = TRUE)
mean_relaxed <- mean(aaec$f_rela, na.rm = TRUE)
mean_peaceful <- mean(aaec$f_pea, na.rm = TRUE)
mean_energetic <- mean(aaec$f_ener, na.rm = TRUE)
mean_affectionate <- mean(aaec$f_aff, na.rm = TRUE)
mean_amused <- mean(aaec$f_amu, na.rm = TRUE)
mean_accomplished <- mean(aaec$f_acc, na.rm = TRUE)
mean_joyful <- mean(aaec$f_joy, na.rm = TRUE)
mean_proud <- mean(aaec$f_pro, na.rm = TRUE)
mean_relieved <- mean(aaec$f_reli, na.rm = TRUE)
mean_excited <- mean(aaec$f_exc, na.rm = TRUE)

#SD of frequency of positive emotions
sd_calm <- sd(aaec$f_calm, na.rm = TRUE)
sd_quiet <- sd(aaec$f_qui, na.rm = TRUE)
sd_appreciative <- sd(aaec$f_app, na.rm = TRUE)
sd_interested <- sd(aaec$f_int, na.rm = TRUE)
sd_content <- sd(aaec$f_cont, na.rm = TRUE)
sd_happy <- sd(aaec$f_hap, na.rm = TRUE)
sd_relaxed <- sd(aaec$f_rela, na.rm = TRUE)
sd_peaceful <- sd(aaec$f_pea, na.rm = TRUE)
sd_energetic <- sd(aaec$f_ener, na.rm = TRUE)
sd_affectionate <- sd(aaec$f_aff, na.rm = TRUE)
sd_amused <- sd(aaec$f_amu, na.rm = TRUE)
sd_accomplished <- sd(aaec$f_acc, na.rm = TRUE)
sd_joyful <- sd(aaec$f_joy, na.rm = TRUE)
sd_proud <- sd(aaec$f_pro, na.rm = TRUE)
sd_relieved <- sd(aaec$f_reli, na.rm = TRUE)
sd_excited <- sd(aaec$f_exc, na.rm = TRUE)

#Mean of frequency of negative emotions
mean_concerned <- mean(aaec$f_conc, na.rm = TRUE)
mean_anxious <- mean(aaec$f_anx, na.rm = TRUE)
mean_bored <- mean(aaec$f_bor, na.rm = TRUE)
mean_frustrated <- mean(aaec$f_fru, na.rm = TRUE)
mean_irritated <- mean(aaec$f_irr, na.rm = TRUE)
mean_sad <- mean(aaec$f_sad, na.rm = TRUE)
mean_lonely <- mean(aaec$f_lon, na.rm = TRUE)
mean_fearful <- mean(aaec$f_fear, na.rm = TRUE)
mean_angry <- mean(aaec$f_ang, na.rm = TRUE)
mean_disgusted <- mean(aaec$f_dis, na.rm = TRUE)
mean_guilty <- mean(aaec$f_gui, na.rm = TRUE)
mean_embarassed <- mean(aaec$f_emb, na.rm = TRUE)
mean_ashamed <- mean(aaec$f_sha, na.rm = TRUE)

#SD of frequency of negative emotions
sd_concerned <- sd(aaec$f_conc, na.rm = TRUE)
sd_anxious <- sd(aaec$f_anx, na.rm = TRUE)
sd_bored <- sd(aaec$f_bor, na.rm = TRUE)
sd_frustrated <- sd(aaec$f_fru, na.rm = TRUE)
sd_irritated <- sd(aaec$f_irr, na.rm = TRUE)
sd_sad <- sd(aaec$f_sad, na.rm = TRUE)
sd_lonely <- sd(aaec$f_lon, na.rm = TRUE)
sd_fearful <- sd(aaec$f_fear, na.rm = TRUE)
sd_angry <- sd(aaec$f_ang, na.rm = TRUE)
sd_disgusted <- sd(aaec$f_dis, na.rm = TRUE)
sd_guilty <- sd(aaec$f_gui, na.rm = TRUE)
sd_embarassed <- sd(aaec$f_emb, na.rm = TRUE)
sd_ashamed <- sd(aaec$f_sha, na.rm = TRUE)

Calculating Standard Error

We then needed to calculate the standard error, which is given by the standard deviation divided by the square root of the sample size. We thus assigned the value 945 to “n”, to represent sample size. And the calculated the standard error for each emotion using the standard deviation value we created above, and dividing this by “n”. We assigned each calculation, using the assignment operator, to a name which indicated which emotion the standard error was for. The calculations for all of this can be seen below:

#Create sample size value
n <- 945

#Standard error for positive emotions
se_calm <- sd_calm/sqrt(n)
se_quiet <- sd_quiet/sqrt(n)
se_appreciative <- sd_appreciative/sqrt(n) 
se_interested <- sd_interested/sqrt(n)
se_content <- sd_content/sqrt(n)
se_happy <- sd_happy/sqrt(n)
se_relaxed <- sd_relaxed/sqrt(n)
se_peaceful <- sd_peaceful/sqrt(n)
se_energetic <- sd_energetic/sqrt(n)
se_affectionate <- sd_affectionate/sqrt(n)
se_amused <- sd_amused/sqrt(n)
se_accomplished <- sd_accomplished/sqrt(n)
se_joyful <- sd_joyful/sqrt(n)
se_proud <- sd_proud/sqrt(n)
se_relieved <- sd_relieved/sqrt(n)
se_excited <- sd_excited/sqrt(n)
  
#Standard error for negative emotions
se_concerned <- sd_concerned/sqrt(n)  
se_anxious <- sd_anxious/sqrt(n) 
se_bored <- sd_bored/sqrt(n)  
se_frustrated <- sd_frustrated/sqrt(n)  
se_irritated <- sd_irritated/sqrt(n)
se_sad <- sd_sad/sqrt(n)
se_lonely <- sd_lonely/sqrt(n)
se_fearful <- sd_fearful/sqrt(n)  
se_angry <- sd_angry/sqrt(n)  
se_disgusted <- sd_disgusted/sqrt(n)  
se_guilty <- sd_guilty/sqrt(n)  
se_embarassed <- sd_embarassed/sqrt(n) 
se_ashamed <- sd_ashamed/sqrt(n)

Calculating Confidence Intervals

We then moved to calculating the confidence intervals for each emotion. We used the basic mathematics formula to do so, which is:

Our calculations followed the formula above, where we created the z-value using the qnorm() function from the stats package, to get the value for a 95% CI. We then calculated lower limits of the CI by subtracting the z-value multiplied with the standard error from the mean, and the upper limit of the CI by adding instead of substacting. We used the mean and standard error values calculated previously for each emotion. We did this for each emotion, and we assigned each calculation to a name which indicating which limit and emotion. The calculations for all of this can be seen below:

#Creating z-value
z_value <- qnorm(0.975)

#Lower imits of positive emotions
ll_calm <- mean_calm - (z_value*se_calm)
ll_quiet <- mean_quiet - (z_value*se_quiet)
ll_appreciative <- mean_appreciative - (z_value*se_appreciative)
ll_interested <- mean_interested - (z_value*se_interested)
ll_content <- mean_content - (z_value*se_content)
ll_happy <- mean_happy - (z_value*se_happy)
ll_relaxed <- mean_relaxed - (z_value*se_relaxed)
ll_peaceful <- mean_peaceful - (z_value*se_peaceful)
ll_energetic <- mean_energetic - (z_value*se_energetic)
ll_affectionate <- mean_affectionate - (z_value*se_affectionate)
ll_amused <- mean_amused - (z_value*se_amused)
ll_accomplished <- mean_accomplished - (z_value*se_accomplished)
ll_joyful <- mean_joyful - (z_value*se_joyful)
ll_proud <- mean_proud - (z_value*se_proud)
ll_relieved <- mean_relieved - (z_value*se_relieved)
ll_excited <- mean_excited - (z_value*se_excited)

#Upper limits of positive emotions
ul_calm <- mean_calm + (z_value*se_calm)
ul_quiet <- mean_quiet + (z_value*se_quiet)
ul_appreciative <- mean_appreciative + (z_value*se_appreciative)
ul_interested <- mean_interested + (z_value*se_interested)
ul_content <- mean_content + (z_value*se_content)
ul_happy <- mean_happy + (z_value*se_happy)
ul_relaxed <- mean_relaxed + (z_value*se_relaxed)
ul_peaceful <- mean_peaceful + (z_value*se_peaceful)
ul_energetic <- mean_energetic + (z_value*se_energetic)
ul_affectionate <- mean_affectionate + (z_value*se_affectionate)
ul_amused <- mean_amused + (z_value*se_amused)
ul_accomplished <- mean_accomplished + (z_value*se_accomplished)
ul_joyful <- mean_joyful + (z_value*se_joyful)
ul_proud <- mean_proud + (z_value*se_proud)
ul_relieved <- mean_relieved + (z_value*se_relieved)
ul_excited <- mean_excited + (z_value*se_excited)
  
#Lower limits of negative emotions
ll_concerned <- mean_concerned - (z_value*se_concerned)
ll_anxious <- mean_anxious - (z_value*se_anxious)
ll_bored <- mean_bored - (z_value*se_bored)
ll_frustrated <- mean_frustrated - (z_value*se_frustrated)
ll_irritated <- mean_irritated - (z_value*se_irritated)
ll_sad <- mean_sad - (z_value*se_sad)
ll_lonely <- mean_lonely - (z_value*se_lonely)
ll_fearful <- mean_fearful - (z_value*se_calm)
ll_angry <- mean_angry - (z_value*se_angry)
ll_disgusted <- mean_disgusted - (z_value*se_disgusted)
ll_guilty <- mean_guilty - (z_value*se_guilty)
ll_embarassed <- mean_embarassed - (z_value*se_embarassed)
ll_ashamed <- mean_ashamed - (z_value*se_ashamed)
  
#Upper limits of negative emotions
ul_concerned <- mean_concerned + (z_value*se_concerned)
ul_anxious <- mean_anxious + (z_value*se_anxious)
ul_bored <- mean_bored + (z_value*se_bored)
ul_frustrated <- mean_frustrated + (z_value*se_frustrated)
ul_irritated <- mean_irritated + (z_value*se_irritated)
ul_sad <- mean_sad + (z_value*se_sad)
ul_lonely <- mean_lonely + (z_value*se_lonely)
ul_fearful <- mean_fearful + (z_value*se_calm)
ul_angry <- mean_angry + (z_value*se_angry)
ul_disgusted <- mean_disgusted + (z_value*se_disgusted)
ul_guilty <- mean_guilty + (z_value*se_guilty)
ul_embarassed <- mean_embarassed + (z_value*se_embarassed)
ul_ashamed <- mean_ashamed + (z_value*se_ashamed)

Creating the Table Columns

We then had to start constructing our table to contain all this information we had calculated above. We first created the first column, which we assigned to the value “Valence_and_emotion”. We used the c() function to combine all the names of the different emotions, using the ” ” to indicate the names. We also included “Positive Emotions” and “Negative Emotions” before listing each individual respective emotion, because the original table separated these two valences of emotions.

Valence_and_emotion <- c("Positive emotions", "Calm", "Quiet", "Appreciative", "Interested", "Content", "Happy", "Relaxed", "Peaceful", "Energetic", "Affectionate", "Amused", "Accomplished", "Joyful", "Proud", "Relieved", "Excited", "Negative emotions", "Concerned", "Anxious/worried", "Bored", "Frustrated", "Irritated", "Sad", "Lonely", "Fearful", "Angry", "Disgusted", "Guilty", "Embarassed", "Ashamed")

We then created the second column of the table, which was for the means. We used the same method as above, assigning our second column to variable “Mean”, and using the c() function to combine all the means we calculated for all the emotions above. We had to include ” ” in this section to leave a gap in our values, as these particular values were in the row where the “Positive Emotion” and “Negative Emotion” section headings were, and there was no value that corresponds to them.

Mean <- c(" ", mean_calm, mean_quiet, mean_appreciative, mean_interested, mean_content, mean_happy, mean_relaxed, mean_peaceful, mean_energetic, mean_affectionate, mean_amused, mean_accomplished, mean_joyful, mean_proud, mean_relieved, mean_excited, " ", mean_concerned, mean_anxious, mean_bored, mean_frustrated, mean_irritated, mean_sad, mean_lonely, mean_fearful, mean_angry, mean_disgusted, mean_guilty, mean_embarassed, mean_ashamed)

We then needed to round our mean values to 2 decimal places. When googling how to do this, we found that we had to ensure our values were numeric, and then could use the function round() from the base package. However, when we tried this it did not work, even when we ensured our values were numeric. We thus were very lost on how to round our numbers, and sought help from our tutor Anita. Anita suggested a method to us which was first converting the Mean column into numeric values using the as.numeric(). She then suggested we converting this column back to a character using as.character() function from the base package, and in the parentheses use the round() function round our values. Hence we did this, including the number 2 in the round() function to indicate how many decimal places we wanted. We then used the head() function to view the first values to see if it worked.

Mean <- as.numeric(Mean)
Mean <- as.character(round(Mean,2)) 

head(Mean)

## [1] NA     "2.44" "2.43" "2.4"  "2.28" "2.15"

This method proposed by Anita ended up working, and solved our issue. We understood the method she proposed to us, but were confused by why exactly we had to take these steps for the round() function to work. We believe that we might have had to go through this method to account for the blank values, i.e. the ” ” we included. This instance was quite interesting for me, but as a team we were relieved we finally had a method that worked.

We then used the method above, and applied it to create the third column - the standard deviation column. This can be seen in the code below:

#Create Column
Standard_Deviation <- c(" ", sd_calm, sd_quiet, sd_appreciative, sd_interested, sd_content, sd_happy, sd_relaxed, sd_peaceful, sd_energetic, sd_affectionate, sd_amused, sd_accomplished, sd_joyful, sd_proud, sd_relieved, sd_excited, " ", sd_concerned, sd_anxious, sd_bored, sd_frustrated, sd_irritated, sd_sad, sd_lonely, sd_fearful, sd_angry, sd_disgusted, sd_guilty, sd_embarassed, sd_ashamed)

#Round Values
Standard_Deviation <- as.numeric(Standard_Deviation)
Standard_Deviation <- as.character(round(Standard_Deviation,2))

And again, we used the same method and applied it to create the fourth column - the lower limit column. This can be seen in the code below:

#Create Column
Lower_Limit <- c(" ", ll_calm, ll_quiet, ll_appreciative, ll_interested, ll_content, ll_happy, ll_relaxed, ll_peaceful, ll_energetic, ll_affectionate, ll_amused, ll_accomplished, ll_joyful, ll_proud, ll_relieved, ll_excited, " ", ll_concerned, ll_anxious, ll_bored, ll_frustrated, ll_irritated, ll_sad, ll_lonely, ll_fearful, ll_angry, ll_disgusted, ll_guilty, ll_embarassed, ll_ashamed)

#Round values
Lower_Limit <- as.numeric(Lower_Limit)
Lower_Limit <- as.character(round(Lower_Limit,2))

And again, we used the same method and applied it to create the fifth column - the upper limit column. This can be seen in the code below:

#Create column
Upper_Limit <- c(" ", ul_calm, ul_quiet, ul_appreciative, ul_interested, ul_content, ul_happy, ul_relaxed, ul_peaceful, ul_energetic, ul_affectionate, ul_amused, ul_accomplished, ul_joyful, ul_proud, ul_relieved, ul_excited, " ", ul_concerned, ul_anxious, ul_bored, ul_frustrated, ul_irritated, ul_sad, ul_lonely, ul_fearful, ul_angry, ul_disgusted, ul_guilty, ul_embarassed, ul_ashamed)

#Round values
Upper_Limit <- as.numeric(Upper_Limit)
Upper_Limit <- as.character(round(Upper_Limit,2))

Constructing the Table

Our last step to create the table was combine all the columns together. We did this by using the using data.frame() function from the base package, which turned all our columns into a data frame. We assigned this to the name “Table1_MeanFreq_Emotions”, and used print() function to print our table.

Table1_MeanFreq_Emotions <- data.frame(Valence_and_emotion, Mean, Standard_Deviation, Lower_Limit, Upper_Limit)
print(Table1_MeanFreq_Emotions)

##    Valence_and_emotion Mean Standard_Deviation Lower_Limit Upper_Limit
## 1    Positive emotions <NA>               <NA>        <NA>        <NA>
## 2                 Calm 2.44               0.87        2.39         2.5
## 3                Quiet 2.43               0.87        2.38        2.49
## 4         Appreciative  2.4               0.93        2.35        2.46
## 5           Interested 2.28               0.83        2.23        2.33
## 6              Content 2.15               0.94        2.09        2.21
## 7                Happy 2.13                0.8        2.08        2.19
## 8              Relaxed 2.13               0.89        2.07        2.19
## 9             Peaceful 2.05               0.95        1.99        2.11
## 10           Energetic  1.9                0.8        1.85        1.95
## 11        Affectionate 1.89               0.86        1.83        1.94
## 12              Amused 1.87               0.72        1.83        1.92
## 13        Accomplished 1.84               0.87        1.78        1.89
## 14              Joyful 1.71                0.9        1.65        1.76
## 15               Proud 1.67               0.97        1.61        1.73
## 16            Relieved 1.48               0.88        1.42        1.53
## 17             Excited 1.46               0.79        1.41        1.51
## 18   Negative emotions <NA>               <NA>        <NA>        <NA>
## 19           Concerned 2.23               0.91        2.17        2.29
## 20     Anxious/worried    2               1.06        1.94        2.07
## 21               Bored 1.88               1.12         1.8        1.95
## 22          Frustrated 1.85               0.93        1.79         1.9
## 23           Irritated 1.75               0.89         1.7        1.81
## 24                 Sad 1.69               0.97        1.63        1.75
## 25              Lonely 1.55               1.25        1.47        1.63
## 26             Fearful 1.38               1.06        1.32        1.43
## 27               Angry 1.35               0.89        1.29         1.4
## 28           Disgusted 1.16               0.99         1.1        1.22
## 29              Guilty 0.63               0.87        0.58        0.69
## 30          Embarassed 0.51               0.76        0.47        0.56
## 31             Ashamed 0.44               0.76         0.4        0.49

We thus able to successfully recreate Table 1 from the original paper.

Recreating Figure 1

The last thing we had to recreate as a group is Figure 1, which is s scatteplot that shows the relationship between age and either frequency or intensity of positive or negative age. This involved creating 4 scatter plots, and combining them to make one figure.

Method 1

When we initially created our scatter plots, we used ggplot() to do so, which involved using multiple functions from the ggplot2 package. We did this because we learnt this method through Danielle’s video tutorials. Through out the process of creating our ggplot() we trial-and-errored a lot of the different aesthetics, to get the exact format we wanted.

We created four different scatter plots, which were:

Age vs Frequency of Positive Emotions
Age vs Frequency of Negative Emotions
Age vs Intensity of Positive Emotions
Age vs Intensity of Negative Emotions

We used a similar code for each scattplot for each one, but the aesthetics for each one was mildly tweaked, such that it matched the ones of the original paper.

Overall, what we did for each graph is first use the ggplot() function to create the ggplot, the information in these parentheses editing the global aesthetics of the graph. The “data = aaec” indicated that our values were coming from the aaec data frame, and the aes() function indicated our x and y values. We also assigned the ggplot() to a name that indicated which scatter plot we were creating.

We then used the + sign to add other aspects to our plots. We then used geom_point() to include all the different data points from the x and y variables, using “alpha” in the parentheses to indicate the opacity of our data points, making this number low so all the data points could be seen on the graph. We then used geom_smooth(), with “method = lm” in the parentheses to create a line of best fit, altering the colour to black using “colour = black”.

To then create the labels for our graph, we used the labs() function, including the title, x-axis title and y-axis title in the parentheses. If we required one of these labels to be blank, we used equated them to element_blank().

To alter the theme of the graph, we used theme_bw() to mimic the black and white aesthetic of the figure in the original paper. To futher alter the theme, we used theme() and included different functions and values within the parentheses. This included using plot.title and equating it to the function element_text() function to bold the titles, change the font size, and using “hjust = 0.5” to centre the title. We additionally used axis.title and element_text() to fix the size and bolding of the x and y axis. Furthermore, we used axis.line to make the colour of the axis black. We also used axis.text.x and axis.text.y, equating them to element_blank() when we did not want any values there. Lastly we used plot.background, panel.grid.minor and panel.gridmajor to fix the aesthetics by assigning all of these to element_blank() function to remove all grid lines and background.

The final thing we did is used the coord_cartesian() function to contain arguments for our y-axis limit, whereby we used ylim = c(0, 2.5) to indicate we wanted our graph cut off at 2.5, mimicking the original figure. We also used the scale_y_continuous() function to set which values we wanted on the y-axis.

The exact code for each graph can be seen below, whereby different aesthetic elements were altered to match that of the original paper.

#Age vs Frequency of Negative Emotions Scatterplot 
Freq_Neg_Plot <- ggplot(data = aaec, aes(x = aaec$age, y = aaec$avg_f_neg)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", color = "black") +
  labs(title = "Negative Emotions",
       x = element_blank(),
       y = "Frequency of \nEmotional Experience") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 10, hjust = 0.5),
        axis.title = element_text(face = "bold", size = 10), 
        axis.line = element_line(color = "black"), 
        axis.text.x = element_blank(), 
        axis.ticks.x = element_blank(), 
        plot.background = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) +
  coord_cartesian(ylim = c(0, 2.5)) +
  scale_y_continuous(breaks = seq(from = 0, to = 2.5, by = 0.5))


#Age vs Frequency of Positive Emotions Scatterplot 
Freq_Pos_Plot <- ggplot(data = aaec, aes(x = aaec$age, y = aaec$avg_f_pos)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", color = "black") +
  labs(title = "Positive Emotions",
       x = element_blank(),
       y = element_blank()) +
  theme_bw() +
  theme(plot.title = element_text(face = "bold", size = 10, hjust = 0.5), 
        axis.line = element_line(color = "black"), 
        axis.text.x = element_blank(), axis.ticks.x = element_blank(), 
        axis.text.y = element_blank(), axis.ticks.y = element_blank(), 
        plot.background = element_blank(), panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) +
  coord_cartesian(ylim = c(0, 2.5))

#Age vs Intensity of Negative Emotions Scatterplot 
Int_Neg_Plot <- ggplot(data = aaec, aes(x = aaec$age, y = aaec$avg_i_neg)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", color = "black") +
  labs(title = element_blank(),
       x = "                                                                           Age (Years)",
       y = "Intensity of \nEmotional Experience") +
  theme_bw() +
  theme(axis.title = element_text(face = "bold", size = 10), 
        axis.line = element_line(color = "black"),
        plot.background = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) +
  coord_cartesian(ylim = c(0, 2.5)) +
  scale_y_continuous(breaks = seq(from = 0, to = 2.5, by = 0.5))


#Age vs Intensity of Positive Emotions Scatterplot 
Int_Pos_Plot <- ggplot(data = aaec, aes(x = aaec$age, y = aaec$avg_i_pos)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", color = "black") +
  labs(title = element_blank(),
       x = element_blank(),
       y = element_blank()) +
  theme_bw() +
  theme(axis.line = element_line(color = "black"), 
        axis.text.y = element_blank(), 
        axis.ticks.y = element_blank(), 
        plot.background = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) +
  coord_cartesian(ylim = c(0, 2.5))

After creating all our scatter plots, we needed to combine them all together. Our course co-ordinator Kate informed us a package called patchworks, which can be used to combine graphs together to create one image. We thus loaded the package using the library() function, and combined the graphs together using basic arithmetic.

#Load Package
library(patchwork)

#Combine scatter plots
(Freq_Neg_Plot+Freq_Pos_Plot) / (Int_Neg_Plot+Int_Pos_Plot)

## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

## Warning: Removed 3 rows containing missing values (geom_point).

## `geom_smooth()` using formula 'y ~ x'

Hence we were able to successfully recreate Figure 1 from the original paper, our graphs mimicking the ones included in the original paper.

However one thing we did not like is how we got the x-axis title “Age(Years)” centered between the two graphs. The only method we found on how to do this was adding this title to the Age vs. Intensity of Emotional Experience graph, and then use spaces to push the title across such that it is centers the two bottom graphs when they are combined. We thought whilst it did work, there might be a cleaner code to do this. We thus consulted Anita about this, and whilst she did not know a method for us to do this with our current code, she proposed us another method, which involved creating a new set of code.

Method 2

We decided to thus create a new code for our plot, using the method Anita suggested. What we first did was pipe our aaec data frame to create a new data frame which we called “aaec_figure1_variables”. We then selected variables we needed using select(), which were age and columns starting with avg, which included the average aof positive and negative emotions for frequency and intensity (i.e. 4 columns in total).

We then piped this, and used the pivot_longer() function from the tidyr package to put all our emotion variables into one column, using cols = c() to indicate which columns. We then changed the names of our columns, using names_to() function to change the emotion column to “Values”, and values_to() function to change the column of scores of the emotions to “Scores”, both these functions also from the tidyr package. We then used “values_drop_na = TRUE” to remove any N/A values. All of this allowed us to create a data frame that had the columns of emotions and their scores side by side.

We then piped all of this, and used the mutate() function to create a new column for the measures (i.e. a column for if we are measuring frequency or intensity). We equated measure_type to case_when() function from the dplyr package, and used the grepl() function from the base package within it. We used this function to first search any “f” values within our Values column of our data frame, and then assign this to “Frequency” in a new column called measure_type. We used grepl again, but this time to search for “i” values within our Values column, and then assign this to “Intensity” in the measure_type column. We then repeated this case_when() function method to create a new column called valence_type. This involved using case_when() and grepl() to search for anything that contains “pos” in the Score column and assign it to “Positive”, and also search for anything that contains “neg” in the Score column and assign it to “Negative”.

aaec_figure1_variables <- aaec %>%
  select(age, starts_with("avg")) %>%
  pivot_longer(
    cols = c(avg_f_pos, avg_f_neg, avg_i_pos, avg_i_neg),
    names_to = "Values",
    values_to = "Score",
    values_drop_na = TRUE
    ) %>%
  mutate(
    measure_type = case_when(
      grepl("f", Values) ~ "Frequency",
      grepl("i", Values) ~ "Intensity"
      ), 
    valence_type = case_when( 
      grepl("pos", Values) ~ "Positive",
      grepl("neg", Values) ~ "Negative"
      )
  )

We then printed out our aaec_figure1_variables using the head() function, to view the first few columns to ensure our code worked properly, which it did. The whole point of creating the measure_type and valence_types columns is to extract our information into two more columns to show what a typical response to the frequency or intensity of the emotions would be. For example, avg_f_pos would be extracted to measure type “Frequency” and valence type “Positive”, this formation helping us for the plotting we would do next.

head(aaec_figure1_variables)

## # A tibble: 6 × 5
##     age Values    Score measure_type valence_type
##   <dbl> <chr>     <dbl> <chr>        <chr>       
## 1    18 avg_f_pos  1.81 Frequency    Positive    
## 2    18 avg_f_neg  1.33 Frequency    Negative    
## 3    18 avg_i_pos  2.25 Intensity    Positive    
## 4    18 avg_i_neg  2.33 Intensity    Negative    
## 5    18 avg_f_pos  1.45 Frequency    Positive    
## 6    18 avg_f_neg  2.31 Frequency    Negative

We then began creating the labels for our plots. We first created a new vector called labelsM to contain a string of text that assigns “Frequency” as “Frequency of Emotional Experince” the /n code used for moving the text into a new line (which further helps with the placement of text in the figure). We also contained a string of text that assigns “intensity” as “Intensity of Emotional Experience”. This was done as it would be later used for the Y axis labels.

labelsM <- c(Frequency = "Frequency of\nEmotional Experience", Intensity = "Intensity of\nEmotional Experience")

We repeated the same process as above, but this time created a new vector called labelsV, containing a string of text that assigns “Negative” as “Negative Emotions” and “Positive” as “Positive Emotions”. This was done as it would later be used for the title above the top two plots.

labelsV <- c(Negative = "Negative Emotions", Positive = "Positive Emotions")

We then finally started creating the actual figure, using ggplot(). A lot of the functions used were the same as the fist method we used to create the scatter plot. The only difference is that this one used the facet_grid() function from the ggplot2 package. The facet_grid() function created a matrix of plots, defining them by the row and column variables. Within the function we used rows = vars(measure_type) to make our row variables the previously assigned “measure_type” column we mutated above, which refers to Frequency and Intensity. We also used cols = vars(valence_type) to make our column variables the previously assigned “valence_type” column we mutated, which refers to “positive” or “Negative emotion”. Additionally we used the labeller() function to add the LabelsV and LabelsM labels we created before. lastly we used “switch =”y”“, to move the y axis labels from the right side (where they are by default) to the left.

The rest of the functions to create the plot were the same as Method 1. The whole code for this new method to create the plot can be seen below:

aaec_figure1 <- ggplot(data = aaec_figure1_variables, aes(x = age, y = Score)) +
  geom_point(alpha = 0.1) + 
  geom_smooth(method = "lm", 
              color = "black") + 
  facet_grid(rows = vars(measure_type), 
             cols = vars(valence_type), 
             labeller = labeller(measure_type = labelsM, valence_type = labelsV), 
             switch = "y") +
  theme_bw() + 
  theme(strip.placement = "outside", 
        strip.background = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) + 
  coord_cartesian(ylim = c(0, 2.5)) + 
  labs(title = element_blank(), 
       x = "Age (Years)", 
       y = element_blank())

We then wrote out in the figure name, aaec_figure1, to print out our recreate Figure 1.

aaec_figure1

## `geom_smooth()` using formula 'y ~ x'

This created essentially the same thing we did using the first method, but did solve our issue of centering the x-axis title.

Part 3: Exploratory Analysis

Question 1

For my first exploratory analysis, I wanted to explore each of the individual emotions reported in the paper in more depth. In Carstensen and colleagues experiment, participant’s were asked to give a rating of the frequency and intensity of 16 positive and 13 negative emotions they felt, yet in the results section only the effect of the average positive and negative emotions with age were recorded. One of my first reactions to the results of this paper was that I found it surprising that there was not a more in-depth analysis of the individual emotions, Hence I wanted to see which specific emotions had a significant correlation with age. Since the paper reported that the frequency and intensity of positive emotions were associated with older age, I specifically wanted to investigate the relationship between positive emotions and age. I hypothesised that emotions that had higher scores of frequency and intensities, would have a significant relationship.

Descriptives

I first began my exploration by finding some descriptive values of the frequency and intensity of positive emotions. This included findings the mean and standard deviation of each one.

I first started by creating a new object for each emotion, and using the assignment operator to select which data would be used from the aaec data frame. I then piped the data frame, and used the summarise() function to select the mean and standard deviation of the emotion’s frequency and intensity through the mean() and sd() function. An example of this can be seen below for the positive emotion ‘calm’ below:

calm <- aaec %>%
  summarise(
    mean_frequency = mean(f_calm, na.rm = TRUE),
    sd_frequency = sd(f_calm, na.rm = TRUE),
    mean_intensity = mean(i_calm, na.rm = TRUE),
    sd_intensity = sd(i_calm, na.rm = TRUE)
  )

I was going to do this for each of the 16 positive emotions, but I felt that doing this was very repetitive, and the code would be very long and cumbersome. Since through this R journey I have found that code can be used to shorten calculation processes, I tried to Google and research a method to do this, and came across creating your own functions in R. This particular website had a comprehensive guide on how to create functions, which I used to create my own function.

I thus created a function that could calculate the mean and standard deviation of each emotion. I started by using the assign operator to call my function “emotionsummary.fun”. I used function() to create the function, with 2 arguments within the parentheses. The argument is what differs every time a function is used, such that I can choose which variables I want to calculate the descriptive statistics for. I had the “f_emotion” argument for the frequency values, and “i_emotion” argument for intensity values. This is then followed by the {} brackets, which contains the code the function will run. The code for this section is exactly the same as the one used above to calculate descriptives for the emotion ‘calm’, except it contains the arguments within the sd() and mean() function. It also contains ‘na.rm = TRUE’, to indicate that NA values for the variable are not included.

emotionsummary.fun <- function(f_emotion, i_emotion) {aaec %>%
  summarise(
    mean_frequency = mean(f_emotion, na.rm = TRUE),
    sd_frequency = sd(f_emotion, na.rm = TRUE),
    mean_intensity = mean(i_emotion, na.rm = TRUE),
    sd_intensity = sd(i_emotion, na.rm = TRUE)
  )
}

Now that the function has been made, I can use it to calculate the mean and standard deviation of each positive emotions’ frequency and intensity. This is done by creating a name (e.g ‘calm’), and then assigning our emotionsummary.fun function to it. In the parentheses I included the variables I wanted to calculate. An example is for the calm emotion, we wanted to calculate the frequency by assigning the value aaec$f_calm to argument f_emotion, and calculate the intensity by assigning the aaec $i_calm to the argument i_emotion. This was then done for each positive emotion, and can be seen in the code below:

calm <- emotionsummary.fun(f_emotion = aaec$f_calm, i_emotion = aaec$i_calm)
quiet <- emotionsummary.fun(f_emotion = aaec$f_qui, i_emotion = aaec$i_qui)
appreciative <- emotionsummary.fun(f_emotion = aaec$f_app, i_emotion = aaec$i_app)
interested <- emotionsummary.fun(f_emotion = aaec$f_int, i_emotion = aaec$i_int)
content <- emotionsummary.fun(f_emotion = aaec$f_cont, i_emotion = aaec$i_cont)
happy <- emotionsummary.fun(f_emotion = aaec$f_hap, i_emotion = aaec$i_hap)
relaxed <- emotionsummary.fun(f_emotion = aaec$f_rela, i_emotion = aaec$i_rela)
peaceful <- emotionsummary.fun(f_emotion = aaec$f_pea, i_emotion = aaec$i_pea)
energetic <- emotionsummary.fun(f_emotion = aaec$f_ener, i_emotion = aaec$i_ener)
affectionate <- emotionsummary.fun(f_emotion = aaec$f_aff, i_emotion = aaec$i_aff)
amused <- emotionsummary.fun(f_emotion = aaec$f_amu, i_emotion = aaec$i_amu)
accomplished <- emotionsummary.fun(f_emotion = aaec$f_acc, i_emotion = aaec$i_acc)
joyful <- emotionsummary.fun(f_emotion = aaec$f_joy, i_emotion = aaec$i_joy)
proud <- emotionsummary.fun(f_emotion = aaec$f_pro, i_emotion = aaec$i_pro)
relieved <- emotionsummary.fun(f_emotion = aaec$f_reli, i_emotion = aaec$i_reli)
excited <- emotionsummary.fun(f_emotion = aaec$f_exc, i_emotion = aaec$i_exc)

After creating each of these, I used the rbind() function from the base package to bind all the descriptive rows I created, to make a table. I also used the rownames() function from the base package to name each row after the respective emotion. Finally I used the print() function to view what my table looked like.

#bind rows together
table_pos_emotions <- rbind(calm, quiet, appreciative, interested, content, happy, relaxed, peaceful, energetic, affectionate, amused, accomplished, joyful, proud, relieved, excited)

rownames(table_pos_emotions) <- c("calm", "quiet", "appreciative", "interested", "content", "happy", "relaxed", "peaceful", "energetic", "affectionate", "amused", "accomplished", "joyful", "proud", "relieved", "excited")

## Warning: Setting row names on a tibble is deprecated.

print(table_pos_emotions)

## # A tibble: 16 × 4
##    mean_frequency sd_frequency mean_intensity sd_intensity
##  *          <dbl>        <dbl>          <dbl>        <dbl>
##  1           2.44        0.866           2.21        0.926
##  2           2.43        0.875           2.28        0.955
##  3           2.40        0.933           2.36        1.01 
##  4           2.28        0.827           2.17        0.873
##  5           2.15        0.939           2.03        0.926
##  6           2.13        0.804           2.00        0.860
##  7           2.13        0.886           2.03        0.918
##  8           2.05        0.946           1.97        0.921
##  9           1.90        0.804           1.73        0.862
## 10           1.89        0.861           1.87        0.946
## 11           1.87        0.719           1.97        0.908
## 12           1.84        0.867           1.80        0.891
## 13           1.71        0.898           1.69        0.918
## 14           1.67        0.972           1.75        0.993
## 15           1.48        0.882           1.70        0.883
## 16           1.46        0.785           1.61        0.848

This table thus shows the mean and standard deviation of how frequent and intense each positive emotion was felt by participants, a higher mean indicating that the emotion was more frequent or intense. This table shows that the emotions calm, quiet and appreciative were the most frequent and most intense emotions.

Visualisation

Now I wanted to visualise the relationship each of these emotions had with age. I started off by creating a data frame of each emotions frequency by assigning the aaec table to the name ‘f_pos’. I then piped the aaec data frame, and used the select() function to select the required variable columns. I then piped these, and used the gather() function from the tidyr package to gather all the different columns and collapse them into key-value pairs. I found gather() to be an alterntive code to the pivot() function we learnt in Danielle’s video tutorials, but I found the gather() function easier to use. What the gather() function did was collapse all the columns starting with f_ into one column called “key”, and paired them with their value called “value”. This then made it easy to create a graph after.

f_pos <- aaec %>% 
  select(age, f_calm, f_qui, f_app, f_int, f_cont, f_hap, f_rela, f_pea, f_ener, f_aff, f_amu, f_acc, f_joy, f_pro, f_reli, f_exc, avg_f_pos) %>% 
  gather(key = "variable", value = "value", -age)

Next I wanted to create a graph, and did this using ggplot. Initially I thought of creating a scatterplot for each emotion, but then decided to create one graph that had all the emotions together, such that I could see the differences in relationships beween the emotions. The way i created my plot mimicked the same functions I used with my group in Part 2 of this verification report. The only difference is the data used was from the f_pos data frame, which I created above. Additionally, I only used geom_smooth() because I wanted to only have the line of best fits for each emotion. Within the function I also used the aes() function, equating colour to variable such that each emotion’s line of best fit was a different colour. I also included “se = FALSE”, to not include 95% confidence around the plotted line, as including it made the graph really messy and unreadable.

All other functions included were the same as Part 2 of this verification report, but were tweeked to fit this aesthetics and information of this specific plot. I then used print() to see my graph. The code for my plot can be seen below:

f_pos_plot <- ggplot (f_pos, aes(x = age, y = value)) +
  geom_smooth(method = "lm",aes(colour = variable),  se = FALSE) +
  labs(title = "Positive Emotions",
       x = "Age",
       y = "Frequency of \nPositive Emotions") +
  theme(axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        plot.background = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank(),
        plot.title = element_text(hjust = 0.5))

# Print graphs
print(f_pos_plot)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 714 rows containing non-finite values (stat_smooth).

I repeated this same process of creating key-pair values and constructing a ggplot for the intensity of positive emotions, finally using print() to see my graph.

#Positive Intensity Plot
i_pos <- aaec %>% 
  select(age, i_calm, i_qui, i_app, f_int, i_cont, i_hap, i_rela, i_pea, i_ener, i_aff, i_amu, i_acc, i_joy, i_pro, i_reli, i_exc, avg_i_pos) %>% 
  gather(key = "variable", value = "value", -age)
head(i_pos)

## # A tibble: 6 × 3
##     age variable value
##   <dbl> <chr>    <dbl>
## 1    18 i_calm       3
## 2    18 i_calm       2
## 3    18 i_calm       2
## 4    18 i_calm       3
## 5    18 i_calm       2
## 6    18 i_calm       2

i_pos_plot <- ggplot (i_pos, aes(x = age, y = value)) +
  geom_smooth(method = "lm", aes(colour = variable), se = FALSE) +
  labs(title = "Positive Emotions",
       x = "Age",
       y = "Intensity of \nPositive Emotions") +
  theme(axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        plot.background = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) + 
  theme(plot.title = element_text(hjust = 0.5))

# Print graphs
print(i_pos_plot)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 779 rows containing non-finite values (stat_smooth).

Hence the two graphs allowed me to see the relationship each emotion had with age.

Statistics

After creating all the graphs, I wanted to see what the actual correlations between each emotion and age were. I thus conducted correlation test for each emotion using the cor.test() function from the stats package, and in the parentheses included the two variables I wanted to test, which included age and the emotion for each one. I used this link for guidance on how to conduct and read the test. Hence for the tests, if the p-value reported was less than 0.05, the relationship was considered significant. Further, the strength of the correlations were indicated by the correlation coefficient value, whereby:

r < 0.3 = none or very week correlation
0.3 < r < 0.5 = weak correlation
0.5 < r < 0.7 = moderate correlation
r > 0.7 = strong correlation

I first started off by calculating the correlation between the emotion calm and age, for both its frequency and intensity, whereby cor.test() was used to carry out the correlation test, and print() was used to print the results from the test.

#frequency
print(cor.test(aaec$f_calm, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_calm and aaec$age
## t = 6.0554, df = 915, p-value = 2.043e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1332449 0.2577545
## sample estimates:
##       cor 
## 0.1962908

#intensity
print(cor.test(aaec$i_calm, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_calm and aaec$age
## t = 3.0173, df = 929, p-value = 0.00262
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03448219 0.16174086
## sample estimates:
##        cor 
## 0.09851427

These results shows that the correlation for both frequency and intensity of calm with age are statistically significant, with a weak positive correlation for both.

This correlation process was repeated with the emotion quiet and age.

#frequency
print(cor.test(aaec$f_qui, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_qui and aaec$age
## t = -0.50556, df = 919, p-value = 0.6133
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08118559  0.04797538
## sample estimates:
##         cor 
## -0.01667467

#intensity
print(cor.test(aaec$i_qui, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_qui and aaec$age
## t = -2.0847, df = 917, p-value = 0.03738
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.132757729 -0.004027929
## sample estimates:
##        cor 
## -0.0686787

These results shows that the correlation for intensity of quiet with age is statistically significant, but not statistically significant for its frequency, with a weak negative correlation for both.

Again, this correlation process was repeated with the emotion appreciative and age.

#frequency
print(cor.test(aaec$f_app, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_app and aaec$age
## t = 5.5304, df = 917, p-value = 4.167e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1163419 0.2415215
## sample estimates:
##       cor 
## 0.1796589

#intensity
print(cor.test(aaec$i_app, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_app and aaec$age
## t = 4.178, df = 923, p-value = 3.22e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.07241511 0.19894904
## sample estimates:
##       cor 
## 0.1362377

These results shows that the correlation for both frequency and intensity of appreciative with age are statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion interested and age.

#frequency
print(cor.test(aaec$f_int, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_int and aaec$age
## t = 9.3773, df = 919, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2354094 0.3533666
## sample estimates:
##       cor 
## 0.2955139

#intensity
print(cor.test(aaec$i_int, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_int and aaec$age
## t = 6.6524, df = 930, p-value = 4.916e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1509788 0.2735996
## sample estimates:
##       cor 
## 0.2131283

These results shows that the correlation for both frequency and intensity of interested with age are statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion content and age.

#frequency
print(cor.test(aaec$f_cont, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_cont and aaec$age
## t = 5.1721, df = 900, p-value = 2.854e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1057942 0.2325925
## sample estimates:
##       cor 
## 0.1698964

#intensity
print(cor.test(aaec$i_cont, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_cont and aaec$age
## t = 2.1477, df = 905, p-value = 0.032
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.006144461 0.135677550
## sample estimates:
##        cor 
## 0.07121123

These results shows that the correlation for both frequency and intensity of content with age are statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion happy and age.

#frequency
print(cor.test(aaec$f_hap, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_hap and aaec$age
## t = 2.4803, df = 915, p-value = 0.01331
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01707367 0.14569089
## sample estimates:
##        cor 
## 0.08172252

#intensity
print(cor.test(aaec$i_hap, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_hap and aaec$age
## t = 0.45773, df = 926, p-value = 0.6473
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04936164  0.07931768
## sample estimates:
##       cor 
## 0.0150403

These results shows that the correlation for frequency of happy with age is statistically significant, but not statistically significant for its intensity, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion relaxed and age.

#frequency
print(cor.test(aaec$f_rela, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_rela and aaec$age
## t = 6.4281, df = 912, p-value = 2.081e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1453069 0.2693991
## sample estimates:
##       cor 
## 0.2081906

#intensity
print(cor.test(aaec$i_rela, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_rela and aaec$age
## t = 3.7365, df = 917, p-value = 0.0001981
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05825554 0.18566115
## sample estimates:
##       cor 
## 0.1224628

These results shows that the correlation for both frequency and intensity of relaxed with age are statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion peaceful and age.

#frequency
print(cor.test(aaec$f_pea, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_pea and aaec$age
## t = 6.222, df = 910, p-value = 7.477e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1389111 0.2634675
## sample estimates:
##      cor 
## 0.202006

#intensity
print(cor.test(aaec$i_pea, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_pea and aaec$age
## t = 3.5506, df = 906, p-value = 0.0004041
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05249081 0.18083126
## sample estimates:
##       cor 
## 0.1171501

These results shows that the correlation for both frequency and intensity of peaceful with age are statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion energetic and age.

#frequency
print(cor.test(aaec$f_ener, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_ener and aaec$age
## t = 5.1525, df = 882, p-value = 3.172e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.106200 0.234237
## sample estimates:
##       cor 
## 0.1709401

#intensity
print(cor.test(aaec$i_ener, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_ener and aaec$age
## t = 1.0534, df = 915, p-value = 0.2924
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03000235  0.09931981
## sample estimates:
##        cor 
## 0.03480442

These results shows that the correlation for frequency of energetic with age is statistically significant, but not statistically significant for its intensity, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion affectionate and age.

#frequency
print(cor.test(aaec$f_aff, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_aff and aaec$age
## t = 0.61384, df = 891, p-value = 0.5395
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04510471  0.08604763
## sample estimates:
##        cor 
## 0.02055991

#intensity
print(cor.test(aaec$i_aff, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_aff and aaec$age
## t = -1.2873, df = 888, p-value = 0.1983
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.10856633  0.02261853
## sample estimates:
##         cor 
## -0.04315993

These results shows that the correlation for both frequency and intensity of affectionate with age are not statistically significant, with a weak positive correlation and a weak negative correlation respectively.

Again, this correlation process was repeated with the emotion amused and age.

#frequency
print(cor.test(aaec$f_amu, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_amu and aaec$age
## t = 0.6683, df = 907, p-value = 0.5041
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04290033  0.08708307
## sample estimates:
##        cor 
## 0.02218512

#intensity
print(cor.test(aaec$i_amu, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_amu and aaec$age
## t = 0.16163, df = 907, p-value = 0.8716
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05967770  0.07036571
## sample estimates:
##         cor 
## 0.005366696

These results shows that the correlation for both frequency and intensity of amused with age are not statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion accomplished and age.

#frequency
print(cor.test(aaec$f_acc, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_acc and aaec$age
## t = 6.155, df = 892, p-value = 1.134e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1381023 0.2639161
## sample estimates:
##       cor 
## 0.2018417

#intensity
print(cor.test(aaec$i_acc, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_acc and aaec$age
## t = 1.9399, df = 891, p-value = 0.05271
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.000754348  0.129903739
## sample estimates:
##        cor 
## 0.06485264

These results shows that the correlation for frequency of accomplished with age is statistically significant, but not statistically significant for its intensity, with a weak negative correlation for both.

Again, this correlation process was repeated with the emotion joyful and age.

#frequency
print(cor.test(aaec$f_joy, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_joy and aaec$age
## t = 0.46892, df = 866, p-value = 0.6392
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05066357  0.08238740
## sample estimates:
##        cor 
## 0.01593244

#intensity
print(cor.test(aaec$i_joy, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_joy and aaec$age
## t = 0.75806, df = 861, p-value = 0.4486
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04097949  0.09240165
## sample estimates:
##        cor 
## 0.02582602

These results shows that the correlation for both frequency and intensity of joyful with age are not statistically significant, with a weak positive correlation for both.

Again, this correlation process was repeated with the emotion proud and age.

#frequency
print(cor.test(aaec$f_pro, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_pro and aaec$age
## t = 5.4281, df = 860, p-value = 7.41e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1166505 0.2457928
## sample estimates:
##       cor 
## 0.1820064

#intensity
print(cor.test(aaec$i_pro, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_pro and aaec$age
## t = 0.9934, df = 826, p-value = 0.3208
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03366666  0.10243443
## sample estimates:
##        cor 
## 0.03454404

These results shows that the correlation for frequency of proud with age is statistically significant, but not statistically significant for its intensity, with a weak negative correlation for both.

Again, this correlation process was repeated with the emotion relieved and age.

#frequency
print(cor.test(aaec$f_reli, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_reli and aaec$age
## t = 1.6269, df = 889, p-value = 0.1041
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.01123463  0.11973195
## sample estimates:
##        cor 
## 0.05448298

#intensity
print(cor.test(aaec$i_reli, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_reli and aaec$age
## t = 1.6691, df = 823, p-value = 0.09548
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.01021334  0.12583887
## sample estimates:
##        cor 
## 0.05808245

These results shows that the correlation for both frequency and intensity of relieved with age are not statistically significant, with a weak positive correlation for both.

And lastly, this correlation process was repeated with the emotion excited and age.

#frequency
print(cor.test(aaec$f_exc, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$f_exc and aaec$age
## t = -1.2688, df = 880, p-value = 0.2048
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.10843903  0.02334454
## sample estimates:
##         cor 
## -0.04273312

#intensity
print(cor.test(aaec$i_exc, aaec$age))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$i_exc and aaec$age
## t = -2.9601, df = 856, p-value = 0.003161
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.16646654 -0.03395851
## sample estimates:
##        cor 
## -0.1006589

These results shows that the correlation for intensity of excited with age is statistically significant, but not statistically significant for its frequency, with a weak negative correlation for both.

Results

The results from the correlation tests indicate which emotions have a significant relationship with age, and if its on their frequency and intensity. For the frequencies and intensities of the emotions calm, appreciated, interested, content, relaxed, and peaceful, a significant correlation was found. For the frequencies of the emotions, energetic, happy, accomplished and proud, a significant correlation was also found. Lastly for the intensities of the emotions quiet and excited, a significant correlation was also found. The results also further show that whilst there are significant relationships, none of these had a strong correlation, as they were all weak correlations calculated. Overall, this provides greater insight into the relationship each positive emotion had with age. Thus my hypothesis was partially correct, by those emotions with higher scores of frequencies and intensities were usually the ones with significant correlations.

Question 2

In a similar notion to my first exploratory analysis, I wanted to explore other variables that were recorded in the paper that were not explored. One of these variables included if participants lived alone or not. It was well known that during the pandemic, social interactions were limited due to individuals isolating, or staying at home to avoid the risks of the pandemic. Thus I wanted to explore if living alone, which would have resulted in a lack of physical socialisation during the pandemic, has any effect on an individuals emotional well being. I hypothesised that there would be a significant difference between those who did live alone and those who did not, whereby those who lived alone would experience more negative emotions, and those who did live alone would experience more positive emotions.

Descriptive statistics

I first started my exploration by creating a data frame that just contained the live alone variable, and average of frequencies and intensities of emotions variables. This was done by piping the aaec data frame, selectig the required variables through select(), and assigning it to the name “aaec_livealone_emotions”

aaec_livealone_emotions <- aaec %>% 
  select(livealone, avg_f_pos, avg_i_pos, avg_f_neg, avg_i_neg)

I then created individual data frames who those who did live alone, and those who did not live alone, using the filter() function.

#Those who live alone data frame
aaec_livealonecount <- aaec_livealone_emotions %>%
  filter(livealone == "Yes") 
  
#Those who do not live alone data frame
aaec_dontlivealonecount <- aaec_livealone_emotions %>%
  filter(livealone == "No")

I then created a function, just like I did in my first exploratory question, to calculate descriptives. The process of creating the function was exactly the same, except in this function I included the median(), min() and max() functions cause I wanted to calculate median and range of scores as well. I also piped the dataframe I just made for those who do live alone.

#Create function
summary.livealone.fun <- function(variable) {aaec_livealonecount %>%
  summarise(
    mean = mean(variable, na.rm = TRUE),
    sd = sd(variable, na.rm = TRUE),
    median = median(variable, na.rm = TRUE),
    min = min(variable, na.rm = TRUE),
    max = max(variable, na.rm = TRUE)
  )
}

After this I ran the function for the four different variables of average emotions, creating a row for each one. I then used rbind() function to bind all the rows together and make a dataframe. I then used the rownames() function to change the row names to their corresponding variables. Lastly I used the print() function to see the table I just created.

#Run function for different variables
age_livealone_pos_f <- summary.livealone.fun(variable = aaec_livealonecount$avg_f_pos)
age_livealone_pos_i <- summary.livealone.fun(variable = aaec_livealonecount$avg_i_pos)
age_livealone_neg_f <- summary.livealone.fun(variable = aaec_livealonecount$avg_f_neg)
age_livealone_neg_i <- summary.livealone.fun(variable = aaec_livealonecount$avg_i_neg)

#Create table 
age_livealone <- rbind(age_livealone_pos_f, age_livealone_pos_i, age_livealone_neg_f, age_livealone_neg_i)

rownames(age_livealone) <- c("Avg Frequency of Pos Emotions", "Avg Intensity of Pos Emotions", "Avg Frequency of Neg Emotions", "Avg Intensity of Neg Emotions")

## Warning: Setting row names on a tibble is deprecated.

#Print table
print(age_livealone)

## # A tibble: 4 × 5
##    mean    sd median   min   max
## * <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1  1.97 0.615   2    0.357  3.75
## 2  1.97 0.574   2    0.583  4   
## 3  1.33 0.658   1.33 0      4   
## 4  1.82 0.673   1.8  0.143  4

I then repeated the same procedure of making a function to calculate descriptives, run the function, and create a table of descriptives, but adapted it for those who did not live alone. The code mimics that of above, and can be seen here:

#Create function
summary.dontlivealone.fun <- function(variable) {aaec_dontlivealonecount %>%
  summarise(
    mean = mean(variable, na.rm = TRUE),
    sd = sd(variable, na.rm = TRUE),
    median = median(variable, na.rm = TRUE),
    min = min(variable, na.rm = TRUE),
    max = max(variable, na.rm = TRUE)
  )
}

#Run function for different variables
age_dontlivealone_pos_f <- summary.livealone.fun(variable = aaec_dontlivealonecount$avg_f_pos)
age_dontlivealone_pos_i <- summary.livealone.fun(variable = aaec_dontlivealonecount$avg_i_pos)
age_dontlivealone_neg_f <- summary.livealone.fun(variable = aaec_dontlivealonecount$avg_f_neg)
age_dontlivealone_neg_i <- summary.livealone.fun(variable = aaec_dontlivealonecount$avg_i_neg)

#Create table 
age_dontlivealone <- rbind(age_dontlivealone_pos_f, age_dontlivealone_pos_i, age_dontlivealone_neg_f, age_dontlivealone_neg_i)

rownames(age_dontlivealone) <- c("Avg Frequency of Pos Emotions", "Avg Intensity of Pos Emotions", "Avg Frequency of Neg Emotions", "Avg Intensity of Neg Emotions")

## Warning: Setting row names on a tibble is deprecated.

#Print table
print(age_dontlivealone)

## # A tibble: 4 × 5
##    mean    sd median   min   max
## * <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1  1.97 0.547   1.94     0  3.75
## 2  1.92 0.592   1.94     0  3.81
## 3  1.44 0.660   1.42     0  3.54
## 4  1.76 0.707   1.7      0  3.82

Visualisation

I then wanted to visualise this data, in a way where I could see the difference between those who did live alone and those who did not side by side. I thus decided to create a boxplot to represent this data. I began by creating a function, just like I did above for my exploratory questions. The actual code for the ggplot() mimicked the code I wrote in my first exploratory question, except I used the geom_boxplot() to create a boxplot, and geom_jitter() to plot the different data points on top of the boxplot. All other functions and code mimicked the previous ggplots I have made in this verification report.

I then ran the function, to create 4 different boxplots, and then used print() to see the actual plots. The code for all this can be seen below:

# Create a function
livealone.plot.fun <- function(y_var, plot_title, y_title) {
  aaec_livealone_emotions %>% 
  group_by(livealone) %>% 
  ggplot(aes(x = livealone, y = y_var, fill = livealone)) +
  theme_minimal() +
  geom_boxplot(alpha = .4) +
  geom_jitter(aes(colour = livealone), alpha = .15) +
  labs(title = plot_title,
       x = "Live Alone",
       y = y_title) +
  theme(axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        title = element_text(size = 8),
        axis.title.x = element_text(size = 8),
        axis.title.y = element_text(size = 8),
        plot.background = element_blank(), 
        plot.title = element_text(hjust = 0.5),
        legend.position="none")
}

# Use function create plots
pos_f_boxplot <- livealone.plot.fun(y_var = aaec$avg_f_pos, 
                               plot_title = "Effect of Living Alone on \nFrequency of Positive Emotions", 
                                y_title = "Frequency of \nPositive Emotions"
  )

pos_i_boxplot <- livealone.plot.fun(y_var = aaec$avg_i_pos, 
                                plot_title = "Effect of Living Alone on \nIntensity of Positive Emotions", 
                                y_title = "Intensity of \nPositive Emotions"
  )

neg_f_boxplot <- livealone.plot.fun(y_var = aaec$avg_f_neg, 
                                plot_title = "Effect of Living Alone on \nFrequency of Negative Emotions", 
                                y_title = "Frequency of \nNegative Emotions"
  )

neg_i_boxplot <- livealone.plot.fun(y_var = aaec$avg_i_neg, 
                                plot_title = "Effect of Living Alone on \nIntensity of Negative Emotions", 
                                y_title = "Intensity of \nNegative Emotions"
  )

# Print graphs
boxplot1 <- (pos_f_boxplot + neg_f_boxplot)
boxplot2 <- (pos_i_boxplot + neg_i_boxplot)

print(boxplot1)

print(boxplot2)

## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

## Warning: Removed 3 rows containing missing values (geom_point).

Statistics

I then ran some statistically analyses to test significance. I used the ANOVA test to test significance between group means, whereby a p value < 0.05 is considered significant.

I found the code for an ANOVA test to be aov() function from the stats package. In the parentheses of the function I entered the two variables I wanted to test for, seperating them with a “~”, and then specified which data frame these variables were being retrieved from. I then used the summary() function to retrieve the results for the ANOVA test.

The ANOVA test for the frequency of positive emotions between those who live alone and those who did not was found to not be significant.

aov.fpos.livealone <- aov(avg_f_pos ~ livealone, data = aaec_livealone_emotions)
summary(aov.fpos.livealone)

##              Df Sum Sq Mean Sq F value Pr(>F)
## livealone     1    0.0  0.0011   0.004  0.952
## Residuals   943  298.9  0.3170

The ANOVA test for the intensity of positive emotions between those who live alone and those who did not was found to not be significant.

aov.ipos.livealone <- aov(avg_i_pos ~ livealone, data = aaec_livealone_emotions)
summary(aov.ipos.livealone)

##              Df Sum Sq Mean Sq F value Pr(>F)
## livealone     1    0.4  0.3789   1.096  0.295
## Residuals   943  326.0  0.3457

The ANOVA test for the frequency of negative emotions between those who live alone and those who did not was found to be significant.

aov.fneg.livealone <- aov(avg_f_neg ~ livealone, data = aaec_livealone_emotions)
summary(aov.fneg.livealone)

##              Df Sum Sq Mean Sq F value Pr(>F)  
## livealone     1    2.1  2.1087   4.841  0.028 *
## Residuals   943  410.8  0.4356                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA test for the intensity of negative emotions between those who live alone and those who did not was found to not be significant.

aov.ineg.livealone <- aov(avg_f_pos ~ livealone, data = aaec_livealone_emotions)
summary(aov.ineg.livealone)

##              Df Sum Sq Mean Sq F value Pr(>F)
## livealone     1    0.0  0.0011   0.004  0.952
## Residuals   943  298.9  0.3170

Results

This exploratory question showed that significance was only found for the frequency of negative emotions between those who live alone and those who do not. This indicates that living alone during the pandemic and lacking physical socialisation could have had an impact on how frequently an individual felt negative emotions. My hypothesis again was thus only partially supported, as only one of my ANOVA tests came out significant.

Question 3

My last exploratory question was again in similar notion to my previous two questions, where I wanted to see if there was any relationship between perceived risks individuals had, and their subjective rating of their own health. Specifically for the perceived risks, the experiments recorded an individual’s perceived risk to COVID-19 in term of risk to themselves, risk to others, and risk of contracting in the general population. For all variables higher scores indicated higher perceptions of either risk or health. Thus I wanted to see if this perception of their own health, impacted their perceptions on risk with COVID-19. I hypothesized that the different risks would be positively correlated, such that all the perception of each risk would affect the perception of other types of risks. Additionally I hypothesised that the correlation between the different risks and subjective health would be negative, such that if an individual perceived themselves as more healthy, they perceive COVID-19 as less of a risk.

Descriptive statistics

Like all my other exploratory questions, I selected the columns I needed, and then created my own function, running it, and creating a table of descriptives after. This function and and the overall process of making a table mimicked that in my second exploratory question, whereby the code can be seen below:

#Select required variables
health_and_risk <- aaec %>% 
  select(risk_self, risk_comp, risk_pop, health)

#Create function
healthsummary.fun <- function(var1) {health_and_risk %>% 
    summarise(
      mean = mean(var1),
      sd = sd(var1),
      median = median(var1),
      min = min(var1),
      max = max(var1)
    )
  }

#Run function for different variables
risk_self <- healthsummary.fun(var1 =health_and_risk$risk_self)
risk_comp <- healthsummary.fun(var1 =health_and_risk$risk_comp)
risk_pop <- healthsummary.fun(var1 =health_and_risk$risk_pop)
health <- healthsummary.fun(var1 =health_and_risk$health)

#Create table
table_health_and_risk <- rbind(risk_self, risk_comp, risk_pop, health)

rownames(table_health_and_risk) <- c("risk_self", "risk_comp", "risk_pop", "health")

## Warning: Setting row names on a tibble is deprecated.

#Print table
print(table_health_and_risk)

## # A tibble: 4 × 5
##    mean    sd median   min   max
## * <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1  2.36 1.11       2     0     5
## 2  2.45 1.34       2     0     5
## 3  2.97 0.939      3     0     5
## 4  3.36 0.986      3     1     5

Visualisation and Statistics

I then wanted to generate a correlation matrix showing the varying degrees of correlations between all the selected variables. I found online that there was a function that could do this, and was from the corrplot package. I thus used the library() function to load the package first.

library(corrplot)

## corrplot 0.92 loaded

I then piped the health_and_risk data frame I created previously to find the correlations between all the variables using the cor() function. I then used the then used corrplot() function to plot the correlation matrix.

correlation_health <- health_and_risk %>% 
  cor()

corrplot(correlation_health)

The matrix shows that there is a positive correlation between:

risk to self and risk of complication
risk to self and risk from general population
risk of complication and from general population

The matrix also shows that there is a negative correlation between:

risk to self and subjective health
risk of complication and subjective health
risk from general population and subjective health

I then wanted to graph this out into scatter plots. Similar to my second exploratory analysis, I created a function to create a ggplot(). My code however was similar to the code I created in my verification report to create Figure 1. This code to make a function for my scatterplot can thus be seen below, whereby most functions mimick those I have used in previous questions:

# Create function
health.fun <- function(x_var, y_var, plot_title, x_title, y_title) {
  health_and_risk %>% 
  ggplot(aes(x = x_var, y = y_var)) +
  theme_minimal() +
  geom_jitter(alpha = .2) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = plot_title,
       x = x_title,
       y = y_title) +
  theme(axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        title = element_text(size = 8),
        axis.title.x = element_text(size = 8),
        axis.title.y = element_text(size = 8),
        plot.background = element_blank(), 
        plot.title = element_text(hjust = 0.5))
}

I then ran the function to to graph the 6 correlations as viewed in the correlation matrix produced above. The method of running the code, and printing it mimicked what I did in my second exploratory question. I also ran a correlation test after creating each graph, using the cor.test() function, my method mimicking what I did in my first exploratory analysis. The code for these procedures can be seen below, the first correlation I explored being risk to self and risk of complication:

# Relationship between risk_self and risk_comp
risk_comp.self <- health.fun(x_var = health_and_risk$risk_self, y_var = health_and_risk$risk_comp, plot_title = " ", x_title = "Risk to Self", y_title = "Risk to Comp")

print(risk_comp.self)

## `geom_smooth()` using formula 'y ~ x'

print(cor.test(aaec$risk_self, aaec$risk_comp))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$risk_self and aaec$risk_comp
## t = 20.365, df = 943, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5067733 0.5954685
## sample estimates:
##      cor 
## 0.552684

From the graph and correlation test we can see that there is a moderate correlation between risk to self and risk of complication, and this is statistically significant.

I repeated this procedure to run my function to make a scatterplot, and do a correlation test to see the relationship between risk to self and risk from general population.

# Relationship between risk_self and risk_pop
risk_pop.self <- health.fun(x_var = health_and_risk$risk_self, y_var = health_and_risk$risk_pop, plot_title = " ", x_title = "Risk to Self", y_title = "Risk to Pop")

print(risk_pop.self)

## `geom_smooth()` using formula 'y ~ x'

print(cor.test(aaec$risk_self, aaec$risk_pop))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$risk_self and aaec$risk_pop
## t = 12.303, df = 943, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3156177 0.4255833
## sample estimates:
##       cor 
## 0.3719045

From the graph and correlation test we can see that there is a moderate correlation between risk to self and risk from general population, and this is statistically significant

Again, I repeated this procedure to run my function to make a scatter plot, and do a correlation test to see the relationship between risk to self and subjective health.

# Relationship between risk_self and health
health.self <- health.fun(x_var = health_and_risk$risk_self, y_var = health_and_risk$health, plot_title = " ", x_title = "Risk to Self", y_title = "Subjective Health")

print(health.self)

## `geom_smooth()` using formula 'y ~ x'

print(cor.test(aaec$risk_self, aaec$health))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$risk_self and aaec$health
## t = -9.2113, df = 943, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3447685 -0.2277130
## sample estimates:
##        cor 
## -0.2873131

From the graph and correlation test we can see that there is a weak negative correlation between risk to self and subjective health, and this is statistically significant.

Again, I repeated this procedure to run my function to make a scatter plot, and do a correlation test to see the relationship between risk of complication and risk from general population.

# Relationship between risk_comp and risk_pop
risk_comp.pop <- health.fun(x_var = health_and_risk$risk_comp, y_var = health_and_risk$risk_pop, plot_title = " ", x_title = "Risk to Comp", y_title = "Risk to Pop")

print(risk_comp.pop)

## `geom_smooth()` using formula 'y ~ x'

print(cor.test(aaec$risk_comp, aaec$risk_pop))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$risk_comp and aaec$risk_pop
## t = 10.367, df = 943, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2614135 0.3759574
## sample estimates:
##       cor 
## 0.3198537

From the graph and correlation test we can see that there is a moderate correlation between risk of complication and risk from general population, and this is statistically significant

Again, I repeated this procedure to run my function to make a scatter plot, and do a correlation test to see the relationship between risk of complication and subjective health.

# Relationship between risk_comp and health
risk_comp.health <- health.fun(x_var = health_and_risk$risk_comp, y_var = health_and_risk$health, plot_title = " ", x_title = "Risk to Comp", y_title = "Subjective Health")

print(risk_comp.health)

## `geom_smooth()` using formula 'y ~ x'

print(cor.test(aaec$risk_comp, aaec$health))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$risk_comp and aaec$health
## t = -17.59, df = 943, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5435883 -0.4474574
## sample estimates:
##        cor 
## -0.4970464

From the graph and correlation test we can see that there is a moderate negative correlation between risk of complication and sujective health, and this is statistically significant

Lastly, I repeated this procedure to run my function to make a scatter plot, and do a correlation test to see the relationship between risk from general population and subjective health.

# Relationship between risk_pop and health
risk_pop.health <- health.fun(x_var = health_and_risk$risk_pop, y_var = health_and_risk$health, plot_title = " ", x_title = "Risk to Pop", y_title = "Subjective Health")

print(risk_pop.health)

## `geom_smooth()` using formula 'y ~ x'

print(cor.test(aaec$risk_pop, aaec$health))

## 
##  Pearson's product-moment correlation
## 
## data:  aaec$risk_pop and aaec$health
## t = -3.5103, df = 943, p-value = 0.0004688
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.17606781 -0.05016149
## sample estimates:
##        cor 
## -0.1135706

From the graph and correlation test we can see that there is a weak negative correlation between risk from general population and subjective health, and this is statistically significant

Results

The scatter plots and correlation tests show that there is varying levels of correlation between all the variables, and how they are all statistically significant. They also indicate that the different risks are positively correlated with each other, such that the higher the perception of one type of risk is, the higher the perception of the other type of risk is. It also indicates that the different risks are negatively correlated with subjective health, such that a higher rating of subjective health would mean a lower perception of risk to COVID-19. Overall, my hypothesis for this exploratory question was supported.

Part 4: Reccomendations

1. Publish code script

My first recommendation is that Carstensen and colleagues publish the code they used to produce their statistics, figures, and tables. This would include a script of code that shows which packages, functions, and variables were specifically used in their calculations, such that the readers can know exactly how the authors retrieved to their reported values. My group and I faced great difficulty when starting off our verification process, as we did not know where to start and what code was most appropriate to use. We had to trial and error many different functions and coding methods to try and reach the reported values in the paper, which at times was a very long and cumbersome process. In future, to make the replication process easier, it would be beneficial for the authors to publish their code and include it in their OSF. This could be done by uploading their code to open-source programs like R or GitHub, which can be accessed by everyone, and then attaching it to their OSF. Some resources that can help with this is this link which shows how to publish code from R, and this link for GitHub.

2. Include a README or RMD File

My next recommendation would be including a file that clearly states what analysis methods were conducted by the author. One challenge my group and I faced was when calculating the statistics regarding race, we were not sure if we were meant to include participants that indicated N/A. We only figured this out through trial-and-error, and furtmore faced this issue with other variables. Thus I think it would be good for the authors to share a file that documents their analysis method, and also their exclusion criteria, so readers can have a better understanding on how they manipulated and calculated their data. A way to go about doing this is creating either a README file or an RMD file. A README file is a file that contains key information on the project, and how to run various analyses. Alternatively, an RMD file is an R Markdown file that can be created via RStudio, allowing users to create a file that incorporates plain text and R code. Both a README and RMD files would provide sufficient information for readers to understand how the analyses were run. More information on how to create a README file can be found here, and for RMD file can be found here.

3. Improve consistency of information across files in OSF

My final recommendation would be ensuring consistency and clear organisation across all files in the OSF. My group and I came across variable names named differently across files in the OSF, an example being that in one document, the income variable was “income”, and in another was”income_num”, which confused us at first because we were not sure if they were the same thing. The discrepancy between variable names, as well there being 107 columns in the data set, made our process of sorting through the data confusing as we were unsure what variable we needed. For consistency and clarity, the authors should thus ensure consistent variable names to make it easier for readers, and should check their files thoroughly before publishing. A website that could help is this website which has information on the best practices for coding in R by detailing naming conventions for variables.

Verification Report

Shreya Bhatt

07/08/2022

Part 1: Summary and Reaction

Summary

Reactions

Part 2: Verification

Verification Goals

Locate, Load and Read Packages and Data

Reproducing Sample Characteristics

Statistics Reported In-text

Recreating Table 1

Recreating Figure 1

Part 3: Exploratory Analysis

Question 1

Question 2

Question 3

Part 4: Reccomendations

1. Publish code script

2. Include a README or RMD File

3. Improve consistency of information across files in OSF