Here is some code I’m proud of. Here, I’m trying to plot the proportion of responses in each age category that said ‘yes’ to different types of outings frequently.
I start off by plotting the counts.
tableG <- rbindlist(tibblist, use.names = FALSE) %>%
rename(Type = GroceryPharmacy) %>%
kbl %>%
kable_styling()
tableG
| Type | 18 to 24 years | 25 to 34 years | 35 to 44 years | 45 to 54 years | 55+ |
|---|---|---|---|---|---|
| GroceryPharmacy | 828 | 706 | 618 | 417 | 1191 |
| ShopOther | 791 | 764 | 735 | 558 | 1468 |
| FriendsFamily | 1092 | 966 | 918 | 661 | 1634 |
Next, I attempted to create a graph. Since creating a graph would
require a long-form count, I had to utilise count(), which
meant a long and arduous process of creating a data frame for each age
group, for each outing type, then combining them using
list() and rbindlist() again. Here is an
example of the process I went through for the 18-24 year-old group:
countag <- data %>%
filter(age_categories == 1, Adhere_shop_groceries == 1) %>% # age category 1 refers to 18-24y/os
group_by(age_categories, Adhere_shop_groceries) %>%
mutate(Adhere_shop_groceries = case_when(Adhere_shop_groceries == 1 ~ "Groceries")) %>% #to be able to differentiate between the 3, since they would otherwise all be labeled '1'
count() %>%
as.data.frame()
countao <- data %>%
filter(age_categories == 1, Adhere_shop_other == 1) %>%
group_by(age_categories, Adhere_shop_other) %>%
mutate(Adhere_shop_other = case_when(Adhere_shop_other == 1 ~ "Shop Other")) %>%
count() %>%
as.data.frame()
countaf<- data %>%
filter(age_categories == 1, Adhere_meet_friends == 1) %>%
group_by(age_categories, Adhere_meet_friends) %>%
mutate(Adhere_meet_friends = case_when(Adhere_meet_friends == 1 ~ "Meet Friends")) %>%
count() %>%
as.data.frame()
combine <- list(countag, countao, countaf)
combines <- rbindlist(combine, use.name=FALSE) %>%
rename(Type = Adhere_shop_groceries) %>% #creating a heading
mutate(age_categories = case_when(age_categories == 1 ~ "18-24y/o")) %>% #to be able to differentiate between the age categories later, when multiple would come into play
mutate(perc = (n/2711)*100) #mutating a new column to add percentage values - 2711 refers to the total number of respondents in this age category, which I manually added using the previous 'agecount' table
After this, I tried using merge() to combine all the
combined data frames from each age category - before realising that
obviously, merge() is only able to merge 2 data frames at a
time. So I merged them in pairs, then all together.
mulbine <- merge(combines, combines2, all = TRUE) #first 2
mulbine2 <- merge(combines3, combines4, all = TRUE) #second 2
mulbine3 <- merge(mulbine, mulbine2, all = TRUE) #4
mulbine4 <- merge(mulbine3, combines5, all = TRUE) #all!
Thankfully, it seemed to work.
print(mulbine4)
## age_categories Type n perc
## 1: 18-24y/o Groceries 828 30.54224
## 2: 18-24y/o Meet Friends 1092 40.28034
## 3: 18-24y/o Shop Other 791 29.17743
## 4: 25-34y/o Groceries 706 28.98194
## 5: 25-34y/o Meet Friends 966 39.65517
## 6: 25-34y/o Shop Other 764 31.36289
## 7: 35-44y/o Groceries 618 27.21268
## 8: 35-44y/o Meet Friends 918 40.42272
## 9: 35-44y/o Shop Other 735 32.36460
## 10: 45-54y/o Groceries 417 25.48900
## 11: 45-54y/o Meet Friends 661 40.40342
## 12: 45-54y/o Shop Other 558 34.10758
## 13: 55+y/o Groceries 1191 27.74284
## 14: 55+y/o Meet Friends 1634 38.06196
## 15: 55+y/o Shop Other 1468 34.19520
I finally had a data frame that was ready to be plotted. I kept it
simple with a geom_bar():
mulbinep <- mulbine4 %>%
ggplot(aes(age_categories, perc, fill = Type)) +
geom_bar(position = 'dodge', stat='identity', width=0.5) +
ggtitle("Differences in Types of Outings across Age") +
labs(x="Age Categories", y="% of people engaging in frequent outings", fill="Type of Outing") +
theme_bw()
print(mulbinep)