Goals for week 10

My goals for week 10 are to finish off my exploratory analyses.

3b. Age and social distancing.

I was interested to see how age of participants influenced their compliance with social distancing regulations during Covid-19. Canning, Karra, Dayula, Guo & Bloom (2020) found in their study on social distancing in the United States that people over the age of 50 had less than half the predicted number of close contacts than those who were younger than 30. Thus I want to see if those results are consistent in the Folk et al. paper. Study 2 was used in this analysis as it measured both age and social distancing. I assessed the age range of participants in the data set and found that there were a lot more participants that were under 40 than there were over 40. Social distancing was measured using a yes or no question that asked if they were practicising social distancing or not. A score of 1 = they were social distancing, 0 = they were not social distancing.

Load libraries

library(dplyr) # used for data frame manipulation functions
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse) # used for dplyr
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.1
## v readr   1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(gt) # used for table making and to convert decimal to percentage
## Warning: package 'gt' was built under R version 4.0.5

Read in the data.

study_2 <- read.csv("Study 2.csv")

I chose age ranges 18 - 22 for the younger demographic, and 40 - 72 for the older demographic. This was mainly because those ranges of age both contained equal number of participants at n = 75. Here is how I calculated the number of participants by age (individually and total).

Calculating number of participants by age, Young group: 18 -> 22 year olds

Created a Younger_age dataframe and selected Age, filtered it to show me only participants 18 or older, but no older than 22. I then grouped by age to group individuals of each age into 1 number. I then added a sample size column that calculated the number of participants per age. Below that is code to sum the total sample size of participants aged 18 to 22, and the total was n = 75.

# Calculates number of participants in each age from 18 - 22
Younger_age <- study_2 %>% 
  select(Age) %>% 
  filter(Age >= 18, Age <= 22) %>% 
  group_by(Age) %>% 
  summarise(Sample_size_young = n())

Younger_age
## # A tibble: 5 x 2
##     Age Sample_size_young
## * <int>             <int>
## 1    18                12
## 2    19                13
## 3    20                17
## 4    21                12
## 5    22                21
# Calculates the total number of participants aged 18 - 22
Younger_age$Sample_size_young %>% 
  sum()
## [1] 75

Calculating number of participants by age, Old group: 40 -> 72 year olds

Here I calculate the sample size of participants that were aged from 40 to 72. I basically did the same thing as I did for the younger group, except changed the values when filtering.

Older_age <- study_2 %>% 
  select(Age) %>% 
  filter(Age >= 40, Age <= 73) %>% 
  group_by(Age) %>% 
  summarise(Sample_size_old = n())

Older_age
## # A tibble: 28 x 2
##      Age Sample_size_old
##  * <int>           <int>
##  1    40               5
##  2    41               4
##  3    42               5
##  4    43               3
##  5    44               1
##  6    45               5
##  7    46               7
##  8    47               3
##  9    48               2
## 10    49               5
## # ... with 18 more rows
# Calculates the total number of participants aged 40 - 72
Older_age$Sample_size_old %>% 
  sum()
## [1] 75

(Young group) Calculating social distancing compliance for individual age groups

This shows me the number of participants and the percentage that practical social distancing per age. I reused the old code from when I was just counting the participants and added another variable to my summarise function called “Social_Distancing” which calculates the mean of social distancing for each individual age group.

# Calculates social distancing compliance of each age from 18 - 22
Dist_by_young <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 18, Age <= 22) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing" = mean(SocialDistancing))
  

Dist_by_young
## # A tibble: 5 x 3
##     Age Sample_size Social_Distancing
## * <int>       <int>             <dbl>
## 1    18          12             0.917
## 2    19          13             1    
## 3    20          17             0.882
## 4    21          12             0.917
## 5    22          21             0.857

(Young group) Total mean social distancing compliance for ages (18 - 22)

I then wanted to find the total mean of social distancing across all ages in the young group. To do this, i reused the code from above and piped in a second summarise function to summarise the means we had just calculated into a new “Social distancing” variable that is the mean of those means. I also added in an “Age_range” variable that displays the young age range group we’re calculating. A sample_size varible was added again that sums the total of sample sizes we got in the previous code chunk (above).

# Calculates social distancing compliance of each age from 18 - 22
Dist_by_young <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 18, Age <= 22) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing" = mean(SocialDistancing)) %>% 
  summarise(Age_range = "18 - 22",
            Sample_size = sum(Sample_size),
            "Social distancing" = mean(Social_Distancing))
#  fmt_percent(columns = vars("Social Distancing")) # fmt_percent from the gt() pack

Dist_by_young
## # A tibble: 1 x 3
##   Age_range Sample_size `Social distancing`
##   <chr>           <int>               <dbl>
## 1 18 - 22            75               0.915

So roughly 91.5% of our 75 young participants, aged 18 - 22, practice social distancing.

(Old group) Calculating social distancing compliance for individual age groups

Here we are selecting Age and SocialDistancing, filtering for Ages between 40 and 72, grouping them together and calculating the sample size AND the percentage that social distance in that age group.

# Calculates social distancing compliance of each age from 40 - 72
Dist_by_old <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 40, Age <= 72) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing_old" = mean(SocialDistancing))

Dist_by_old
## # A tibble: 28 x 3
##      Age Sample_size Social_Distancing_old
##  * <int>       <int>                 <dbl>
##  1    40           5                   1  
##  2    41           4                   1  
##  3    42           5                   1  
##  4    43           3                   1  
##  5    44           1                   1  
##  6    45           5                   1  
##  7    46           7                   1  
##  8    47           3                   1  
##  9    48           2                   0.5
## 10    49           5                   0.8
## # ... with 18 more rows

(Old group) Total mean social distancing compliance for ages (40 - 72)

We then add onto the code above to find the total mean social distancing compliance for those aged 40 to 72 by summarising a new “Social distancing” variable that takes the mean of all the social distancing percentages. Sample size was also summed for all ages (40 - 72), and an Age_range variable for 40 - 72 was included.

# Calculates social distancing compliance of each age from 40 - 72
Dist_by_old <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 40, Age <= 72) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing_Old" = mean(SocialDistancing)) %>% 
  summarise(Age_range = "40 - 72",
            Sample_size = sum(Sample_size),
            "Social distancing" = mean(Social_Distancing_Old))


Dist_by_old
## # A tibble: 1 x 3
##   Age_range Sample_size `Social distancing`
##   <chr>           <int>               <dbl>
## 1 40 - 72            75               0.907

Merging the old and young dataframes together!!

Finally, I get to merge the two together into a nice gt plot and use fmt_percent (from the gt package) to convert the decimal for Social Distancing into a percentage.

Dist_by_age <- rbind(Dist_by_young, Dist_by_old)

gt(Dist_by_age) %>% 

fmt_percent(columns = vars("Social distancing")) # fmt_percent from the gt() pack
Age_range Sample_size Social distancing
18 - 22 75 91.46%
40 - 72 75 90.65%

The results in the final table show no significant difference in social distancing compliance between younger and older age groups. With compliance only 0.81% higher for younger participants.

3c. How does loneliness differ between genders during Covid-19

In my first reaction to the paper, I was surprised that the isolation of Covid-19 had no real impact on social connectedness and loneliness for the most introverted and extraverted participants. As a result, I wanted to explore to see if participant loneliness differed between genders during Covid-19 compared to before Covid-19. In study 2, the researchers measured loneliness pre-Covid using a 20-item loneliness measure. Participants loneliness scores were re-measured during Covid-19 (in early April). A score of 1 (never lonely) was the minimum score, while a score of 4 (often lonely) was the maximum score.

Read in libraries

library(ggplot2) # for creating the graph
library(tidyverse) # for ggplot
library(dplyr) # for dataframe manipulation

Read in the data

Study_2 <- read.csv("Study 2.csv")

Determining gender ratio

Calling from Study_2, grouping by gender and using summarise to generate a column that specifies the percentage of genders in study 2. This is the same code that was used in the gender demographics for the verification section.

#55% males overall in study 2
Study_2 %>% 
  group_by (Gender) %>% 
  summarise (percent = 100 * n() / nrow (Study_2)) 
## # A tibble: 3 x 2
##   Gender     percent
## * <chr>        <dbl>
## 1 Female      44.9  
## 2 Male        54.8  
## 3 Non-Binary   0.298

Because the sample size is too small for the ‘Non-Binary’ category (n = 1), I will just use male and female because there’s almost an even 50/50 split between them.

Making data long

Because I’m choosing to compare the two visually, I’m going to make my data frame long to make life easier. I select the variables I need and then pipe into a pivot_longer function. cols lets me select the columns I want to pivot. names_to lets me send the “T1Lonely” and “T2Lonely” variable names into the cells, and values_to sends the values of the loneliness scores to a new column called “values”, which aligns each value to its class.

GenderLonely_long <- Study_2 %>% 
  select(Gender, T1Lonely, T2Lonely) %>% 
  pivot_longer(cols = T1Lonely:T2Lonely, 
               names_to = "time",
               values_to = "values")

GenderLonely_long
## # A tibble: 672 x 3
##    Gender time     values
##    <chr>  <chr>     <dbl>
##  1 Male   T1Lonely   1.42
##  2 Male   T2Lonely   1.47
##  3 Female T1Lonely   3.16
##  4 Female T2Lonely   2.89
##  5 Female T1Lonely   2.68
##  6 Female T2Lonely   2.74
##  7 Male   T1Lonely   1.42
##  8 Male   T2Lonely   2.21
##  9 Male   T1Lonely   1.42
## 10 Male   T2Lonely   1.11
## # ... with 662 more rows

Creating the dataframe

I spent a lot of time fiddling around with this section. Originally I had split up my dataframes into male and female groups and merged them like I did in my 2nd exploratory analysis. But I knew with the data being long, that there would be a way to do it all in 1 code chunk. It turned out to be a lot simpler than I had thought. I plugged in my long dataset, grouped by gender and class, and then I created a new column that takes the means of the loneliness scores, grouped by gender and time. Finally, I filtered out the n = 1 non-binary participant due to the small sample size.

Gender_lonely <- GenderLonely_long %>% 
  group_by(Gender, time) %>% 
  summarise(Loneliness = mean(values)) %>% 
  filter(Gender != "Non-Binary")
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
Gender_lonely
## # A tibble: 4 x 3
## # Groups:   Gender [2]
##   Gender time     Loneliness
##   <chr>  <chr>         <dbl>
## 1 Female T1Lonely       2.08
## 2 Female T2Lonely       2.00
## 3 Male   T1Lonely       2.16
## 4 Male   T2Lonely       2.12

Graphing

I tried to simplify the graph as much as I could. I set my aesthetics to group the lines by Gender, and set my axes; linetype was set to Gender as well so that the lines would look different. I added geom_line() to add the line to graph, and geom_point to insert the points of means. scale_y_continuous lets me set the y axis limits with the limits argument, and bring the graph onto the x-axis with the expand argument.

Gender_lonelyline <- ggplot(Gender_lonely,
                            aes(group = Gender, 
                                x = time, 
                                y = Loneliness, 
                                linetype = Gender)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(limits = c(1, 4), expand = c(0, 0)) +
  labs(x = "Time") +
  scale_x_discrete(labels = c("Before Pandemic", "During Pandemic")) +
  theme_gray()

Gender_lonelyline

Interestingly, the loneliness scores do not differ much between pre-covid and during covid, there is only a very slight decrease for both genders. On average, females seem to experience slightly less loneliness compared to males, both pre and during Covid-19.

Next steps

My next steps are to finish off my recommendations and polish up my report before I send it in for submission!