Week 8 Learning Log

Antonia Boulton

11/04/2021

My goals for this week:

Now that our group presentations are done (and all went amazingly!!), my main goal for this week was to start thinking about the questions I am going to ask for my exploratory analysis. My other goal was to try and apply I what learned during the second part of the Intro to stats in R with the same memory test data that I made up from last week.

Successes and challenges:

For simplicity, the questions that I asked were related to study 4. Some of the questions that I have come up with include:

  • Gender and motivation to wear a mask

  • Age and motivation to wear a mask

  • Household size and motivation to wear a mask

The focus of study 4 was whether a higher amount of empathy increases motivation to wear a face mask. However, I wanted to explore whether other factors, such as gender, age, or the number of people in the household, could predict the level of motivation to wear a face mask.

So I was able to successfully produce the descriptive statistics for my three data questions, as well as bar graphs for each question. However, this all came with A LOT of challenges. I won’t be able to list them all because then my learning log would be several pages long!

One of my main challenges was that for some reason, I could not import my study 4 data and it took me hours and hours to figure out what the problem was. One of the solutions was copying the data of the specific variables I was analysing, pasting it into another SPSS document (which I titled “explore1”) and then importing that into RStudio cloud. I have no doubt that there is a much better solution to this, but I went with it because my brain was incredibly fried at that point.

Other errors I got were:

Error in usemethod(“rescale”) : no applicable method for ‘rescale’ applied to an object of class “c(‘haven_labelled’, ‘vctrs vector’, ‘double’)”

and

Error in match.arg(alternative) : ‘arg’ must be null or a character vector

I had no idea what they meant though, so I had to seek a lot of help from Google!

## Descriptive statistics

explore1 <- explore1 %>%
  na.omit

exploreQ1 <- explore1 %>%
  group_by(Gender) %>%
  select(Q22_1) %>%
  summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
            se = sd/sqrt(n)) %>%
   mutate(Gender = case_when(Gender == 1 ~ "Male", 
         Gender == 2 ~ "Female",
         Gender == 3 ~ "No information"))
## Adding missing grouping variables: `Gender`
gt(exploreQ1)
Gender mean sd n se
Male 3.663317 1.251801 796 0.04436890
Female 4.025000 1.099722 720 0.04098421
## bar plot
exploreQ1_plot <- exploreQ1 %>%
  ggplot(aes(x=Gender, y=mean, fill=Gender)) + 
  geom_col() + labs(title = "Motivation to wear a face mask across genders", x = "Gender", y = "Motivation to wear a face mask") +
  scale_y_continuous(expand = c(0,0), 
                     limits = c(0, 5))
print(exploreQ1_plot)

## Descriptive statistics

explore1$Age <- as.numeric(as.character(explore1$Age))

exploreQ2 <- explore1 %>%
  group_by(Age) %>%
  select(Q22_1) %>%
  summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
            se = sd/sqrt(n)) 
## Adding missing grouping variables: `Age`
gt(exploreQ2)
Age mean sd n se
18 3.830189 1.1220834 53 0.1541300
19 3.769231 1.2239182 39 0.1959838
20 3.860465 1.3016728 43 0.1985032
21 3.755556 1.2089732 45 0.1802231
22 3.959184 1.1357667 49 0.1622524
23 4.068966 1.0738040 58 0.1409974
24 4.000000 1.0127394 40 0.1601282
25 3.733333 1.1769922 60 0.1519490
26 3.827586 1.2159457 58 0.1596615
27 4.018519 1.0368810 54 0.1411016
28 3.655172 1.2502873 58 0.1641708
29 3.819672 1.1903528 61 0.1524091
30 3.508475 1.4065473 59 0.1831169
31 3.723077 1.2563301 65 0.1558286
32 3.500000 1.3084805 34 0.2244026
33 3.918367 1.2047948 49 0.1721135
34 3.959184 1.1357667 49 0.1622524
35 3.500000 1.3206764 44 0.1990995
36 4.166667 1.0854312 30 0.1981717
37 3.487805 1.3439113 41 0.2098837
38 3.973684 1.2408986 38 0.2013003
39 3.727273 1.2825055 44 0.1933450
40 3.655172 1.1425492 29 0.2121661
41 3.608696 1.4377739 23 0.2997966
42 3.960000 0.9780934 25 0.1956187
43 3.863636 1.1668213 22 0.2487671
44 3.695652 1.0632191 23 0.2216965
45 3.928571 1.2744954 28 0.2408570
46 3.947368 1.1772701 19 0.2700843
47 3.846154 1.3445045 13 0.3728985
48 3.952381 1.1608700 21 0.2533226
49 4.052632 1.1290942 19 0.2590320
50 3.846154 0.8987170 13 0.2492593
51 4.384615 1.1208971 13 0.3108809
52 3.687500 0.7932003 16 0.1983001
53 3.647059 1.2718675 17 0.3084732
54 3.666667 1.3904436 21 0.3034197
55 3.863636 1.3902879 22 0.2964104
56 4.000000 0.8819171 19 0.2023257
57 4.277778 0.8947925 18 0.2109046
58 4.500000 0.6741999 12 0.1946247
59 4.444444 0.7264832 9 0.2421611
60 3.647059 1.2718675 17 0.3084732
61 4.272727 0.7862454 11 0.2370619
62 3.400000 1.6733201 5 0.7483315
63 3.750000 1.8929694 4 0.9464847
64 3.833333 1.4719601 6 0.6009252
65 4.714286 0.4879500 7 0.1844278
67 5.000000 0.0000000 2 0.0000000
68 4.000000 1.0000000 5 0.4472136
69 5.000000 0.0000000 2 0.0000000
71 4.000000 1.4142136 2 1.0000000
72 4.000000 NA 1 NA
76 1.000000 NA 1 NA
## bar plot
exploreQ2_plot <- exploreQ2 %>%
  ggplot(aes(x=Age, y=mean, fill=Age)) + 
  geom_col() + labs(title = "Motivation to wear a face mask across age", x = "Age", y = "Motivation to wear a face mask")
  scale_y_continuous(expand = c(0,0), 
                     limits = c(0, 5))
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --    5
print(exploreQ2_plot)

## Descriptive statistics

explore1$Household_size <- as.numeric(as.character(explore1$Household_size))

exploreQ3 <- explore1 %>%
  group_by(Household_size) %>%
  select(Q22_1) %>%
  summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
            se = sd/sqrt(n)) 
## Adding missing grouping variables: `Household_size`
gt(exploreQ3)
Household_size mean sd n se
1 3.710526 1.189325 342 0.06431131
2 3.886051 1.201534 509 0.05325704
3 3.959752 1.164267 323 0.06478156
4 3.709402 1.219018 234 0.07968972
5 3.974359 1.104587 78 0.12506989
6 3.809524 1.289149 21 0.28131534
7 3.200000 1.483240 5 0.66332496
8 5.000000 NA 1 NA
11 3.000000 NA 1 NA
12 5.000000 NA 1 NA
20 1.000000 NA 1 NA
## bar plot
exploreQ3_plot <- exploreQ3 %>%
  ggplot(aes(x=Household_size, y=mean, fill=Household_size)) + 
  geom_col() + labs(title = "Motivation to wear a face mask across household size", x = "Number of people in the household", y = "Motivation to wear a face mask")
  scale_y_continuous(expand = c(0,0), 
                     limits = c(0, 6))
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --    6
print(exploreQ3_plot)

Next steps in my coding journey:

From the look of the descriptive statistics and plots, it looks like neither gender, age, nor household size are great predictors of motivation to wear a face mask. That being said, I will come back to them and run a significance test to confirm whether this is the case - and I will apply more of the intro to stats in R tutorial content for this, as the amount of time I spent on my initial exploratory analysis meant that I didn’t have time to achieve my other learning goal.

I will also make sure to go to the coding Q and A sessions next week to ask more specific questions about the exploratory analysis and expectations for the verification report.

Finally, I plan to come up with more possible questions for my exploratory analysis that are more related to the research area, which is empathy.