My goals for this week:
Now that our group presentations are done (and all went amazingly!!), my main goal for this week was to start thinking about the questions I am going to ask for my exploratory analysis. My other goal was to try and apply I what learned during the second part of the Intro to stats in R with the same memory test data that I made up from last week.
Successes and challenges:
For simplicity, the questions that I asked were related to study 4. Some of the questions that I have come up with include:
Gender and motivation to wear a mask
Age and motivation to wear a mask
Household size and motivation to wear a mask
The focus of study 4 was whether a higher amount of empathy increases motivation to wear a face mask. However, I wanted to explore whether other factors, such as gender, age, or the number of people in the household, could predict the level of motivation to wear a face mask.
So I was able to successfully produce the descriptive statistics for my three data questions, as well as bar graphs for each question. However, this all came with A LOT of challenges. I won’t be able to list them all because then my learning log would be several pages long!
One of my main challenges was that for some reason, I could not import my study 4 data and it took me hours and hours to figure out what the problem was. One of the solutions was copying the data of the specific variables I was analysing, pasting it into another SPSS document (which I titled “explore1”) and then importing that into RStudio cloud. I have no doubt that there is a much better solution to this, but I went with it because my brain was incredibly fried at that point.
Other errors I got were:
Error in usemethod(“rescale”) : no applicable method for ‘rescale’ applied to an object of class “c(‘haven_labelled’, ‘vctrs vector’, ‘double’)”
and
Error in match.arg(alternative) : ‘arg’ must be null or a character vector
I had no idea what they meant though, so I had to seek a lot of help from Google!
## Descriptive statistics
explore1 <- explore1 %>%
na.omit
exploreQ1 <- explore1 %>%
group_by(Gender) %>%
select(Q22_1) %>%
summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
se = sd/sqrt(n)) %>%
mutate(Gender = case_when(Gender == 1 ~ "Male",
Gender == 2 ~ "Female",
Gender == 3 ~ "No information"))## Adding missing grouping variables: `Gender`
gt(exploreQ1)| Gender | mean | sd | n | se |
|---|---|---|---|---|
| Male | 3.663317 | 1.251801 | 796 | 0.04436890 |
| Female | 4.025000 | 1.099722 | 720 | 0.04098421 |
## bar plot
exploreQ1_plot <- exploreQ1 %>%
ggplot(aes(x=Gender, y=mean, fill=Gender)) +
geom_col() + labs(title = "Motivation to wear a face mask across genders", x = "Gender", y = "Motivation to wear a face mask") +
scale_y_continuous(expand = c(0,0),
limits = c(0, 5))
print(exploreQ1_plot)## Descriptive statistics
explore1$Age <- as.numeric(as.character(explore1$Age))
exploreQ2 <- explore1 %>%
group_by(Age) %>%
select(Q22_1) %>%
summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
se = sd/sqrt(n)) ## Adding missing grouping variables: `Age`
gt(exploreQ2)| Age | mean | sd | n | se |
|---|---|---|---|---|
| 18 | 3.830189 | 1.1220834 | 53 | 0.1541300 |
| 19 | 3.769231 | 1.2239182 | 39 | 0.1959838 |
| 20 | 3.860465 | 1.3016728 | 43 | 0.1985032 |
| 21 | 3.755556 | 1.2089732 | 45 | 0.1802231 |
| 22 | 3.959184 | 1.1357667 | 49 | 0.1622524 |
| 23 | 4.068966 | 1.0738040 | 58 | 0.1409974 |
| 24 | 4.000000 | 1.0127394 | 40 | 0.1601282 |
| 25 | 3.733333 | 1.1769922 | 60 | 0.1519490 |
| 26 | 3.827586 | 1.2159457 | 58 | 0.1596615 |
| 27 | 4.018519 | 1.0368810 | 54 | 0.1411016 |
| 28 | 3.655172 | 1.2502873 | 58 | 0.1641708 |
| 29 | 3.819672 | 1.1903528 | 61 | 0.1524091 |
| 30 | 3.508475 | 1.4065473 | 59 | 0.1831169 |
| 31 | 3.723077 | 1.2563301 | 65 | 0.1558286 |
| 32 | 3.500000 | 1.3084805 | 34 | 0.2244026 |
| 33 | 3.918367 | 1.2047948 | 49 | 0.1721135 |
| 34 | 3.959184 | 1.1357667 | 49 | 0.1622524 |
| 35 | 3.500000 | 1.3206764 | 44 | 0.1990995 |
| 36 | 4.166667 | 1.0854312 | 30 | 0.1981717 |
| 37 | 3.487805 | 1.3439113 | 41 | 0.2098837 |
| 38 | 3.973684 | 1.2408986 | 38 | 0.2013003 |
| 39 | 3.727273 | 1.2825055 | 44 | 0.1933450 |
| 40 | 3.655172 | 1.1425492 | 29 | 0.2121661 |
| 41 | 3.608696 | 1.4377739 | 23 | 0.2997966 |
| 42 | 3.960000 | 0.9780934 | 25 | 0.1956187 |
| 43 | 3.863636 | 1.1668213 | 22 | 0.2487671 |
| 44 | 3.695652 | 1.0632191 | 23 | 0.2216965 |
| 45 | 3.928571 | 1.2744954 | 28 | 0.2408570 |
| 46 | 3.947368 | 1.1772701 | 19 | 0.2700843 |
| 47 | 3.846154 | 1.3445045 | 13 | 0.3728985 |
| 48 | 3.952381 | 1.1608700 | 21 | 0.2533226 |
| 49 | 4.052632 | 1.1290942 | 19 | 0.2590320 |
| 50 | 3.846154 | 0.8987170 | 13 | 0.2492593 |
| 51 | 4.384615 | 1.1208971 | 13 | 0.3108809 |
| 52 | 3.687500 | 0.7932003 | 16 | 0.1983001 |
| 53 | 3.647059 | 1.2718675 | 17 | 0.3084732 |
| 54 | 3.666667 | 1.3904436 | 21 | 0.3034197 |
| 55 | 3.863636 | 1.3902879 | 22 | 0.2964104 |
| 56 | 4.000000 | 0.8819171 | 19 | 0.2023257 |
| 57 | 4.277778 | 0.8947925 | 18 | 0.2109046 |
| 58 | 4.500000 | 0.6741999 | 12 | 0.1946247 |
| 59 | 4.444444 | 0.7264832 | 9 | 0.2421611 |
| 60 | 3.647059 | 1.2718675 | 17 | 0.3084732 |
| 61 | 4.272727 | 0.7862454 | 11 | 0.2370619 |
| 62 | 3.400000 | 1.6733201 | 5 | 0.7483315 |
| 63 | 3.750000 | 1.8929694 | 4 | 0.9464847 |
| 64 | 3.833333 | 1.4719601 | 6 | 0.6009252 |
| 65 | 4.714286 | 0.4879500 | 7 | 0.1844278 |
| 67 | 5.000000 | 0.0000000 | 2 | 0.0000000 |
| 68 | 4.000000 | 1.0000000 | 5 | 0.4472136 |
| 69 | 5.000000 | 0.0000000 | 2 | 0.0000000 |
| 71 | 4.000000 | 1.4142136 | 2 | 1.0000000 |
| 72 | 4.000000 | NA | 1 | NA |
| 76 | 1.000000 | NA | 1 | NA |
## bar plot
exploreQ2_plot <- exploreQ2 %>%
ggplot(aes(x=Age, y=mean, fill=Age)) +
geom_col() + labs(title = "Motivation to wear a face mask across age", x = "Age", y = "Motivation to wear a face mask")
scale_y_continuous(expand = c(0,0),
limits = c(0, 5))## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 5
print(exploreQ2_plot)## Descriptive statistics
explore1$Household_size <- as.numeric(as.character(explore1$Household_size))
exploreQ3 <- explore1 %>%
group_by(Household_size) %>%
select(Q22_1) %>%
summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
se = sd/sqrt(n)) ## Adding missing grouping variables: `Household_size`
gt(exploreQ3)| Household_size | mean | sd | n | se |
|---|---|---|---|---|
| 1 | 3.710526 | 1.189325 | 342 | 0.06431131 |
| 2 | 3.886051 | 1.201534 | 509 | 0.05325704 |
| 3 | 3.959752 | 1.164267 | 323 | 0.06478156 |
| 4 | 3.709402 | 1.219018 | 234 | 0.07968972 |
| 5 | 3.974359 | 1.104587 | 78 | 0.12506989 |
| 6 | 3.809524 | 1.289149 | 21 | 0.28131534 |
| 7 | 3.200000 | 1.483240 | 5 | 0.66332496 |
| 8 | 5.000000 | NA | 1 | NA |
| 11 | 3.000000 | NA | 1 | NA |
| 12 | 5.000000 | NA | 1 | NA |
| 20 | 1.000000 | NA | 1 | NA |
## bar plot
exploreQ3_plot <- exploreQ3 %>%
ggplot(aes(x=Household_size, y=mean, fill=Household_size)) +
geom_col() + labs(title = "Motivation to wear a face mask across household size", x = "Number of people in the household", y = "Motivation to wear a face mask")
scale_y_continuous(expand = c(0,0),
limits = c(0, 6))## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 6
print(exploreQ3_plot)Next steps in my coding journey:
From the look of the descriptive statistics and plots, it looks like neither gender, age, nor household size are great predictors of motivation to wear a face mask. That being said, I will come back to them and run a significance test to confirm whether this is the case - and I will apply more of the intro to stats in R tutorial content for this, as the amount of time I spent on my initial exploratory analysis meant that I didn’t have time to achieve my other learning goal.
I will also make sure to go to the coding Q and A sessions next week to ask more specific questions about the exploratory analysis and expectations for the verification report.
Finally, I plan to come up with more possible questions for my exploratory analysis that are more related to the research area, which is empathy.