Learning Log W10

Here is some code I’m proud of. Here, I’m trying to plot the proportion of responses in each age category that said ‘yes’ to different types of outings frequently.

Type	18 to 24 years	25 to 34 years	35 to 44 years	45 to 54 years	55+
GroceryPharmacy	828	706	618	417	1191
ShopOther	791	764	735	558	1468
FriendsFamily	1092	966	918	661	1634

Visualising the data

Next, I attempted to create a graph. Since creating a graph would require a long-form count, I had to utilise count(), which meant a long and arduous process of creating a data frame for each age group, for each outing type, then combining them using list() and rbindlist() again. Here is an example of the process I went through for the 18-24 year-old group:

countag <- data %>%   
  filter(age_categories == 1, Adhere_shop_groceries == 1) %>% # age category 1 refers to 18-24y/os
  group_by(age_categories, Adhere_shop_groceries) %>% 
  mutate(Adhere_shop_groceries = case_when(Adhere_shop_groceries == 1 ~ "Groceries")) %>% #to be able to differentiate between the 3, since they would otherwise all be labeled '1'
  count() %>% 
  as.data.frame()

countao <- data %>% 
  filter(age_categories == 1, Adhere_shop_other == 1) %>% 
  group_by(age_categories, Adhere_shop_other) %>% 
  mutate(Adhere_shop_other = case_when(Adhere_shop_other == 1 ~ "Shop Other")) %>% 
  count() %>% 
  as.data.frame()

countaf<- data %>% 
  filter(age_categories == 1, Adhere_meet_friends == 1) %>% 
  group_by(age_categories, Adhere_meet_friends) %>% 
  mutate(Adhere_meet_friends = case_when(Adhere_meet_friends == 1 ~ "Meet Friends")) %>% 
  count() %>% 
  as.data.frame()

combine <- list(countag, countao, countaf)
combines <- rbindlist(combine, use.name=FALSE) %>% 
  rename(Type = Adhere_shop_groceries) %>% #creating a heading
  mutate(age_categories = case_when(age_categories == 1 ~ "18-24y/o")) %>% #to be able to differentiate between the age categories later, when multiple would come into play
  mutate(perc = (n/2711)*100) #mutating a new column to add percentage values - 2711 refers to the total number of respondents in this age category, which I manually added using the previous 'agecount' table

After this, I tried using merge() to combine all the combined data frames from each age category - before realising that obviously, merge() is only able to merge 2 data frames at a time. So I merged them in pairs, then all together.

mulbine <- merge(combines, combines2, all = TRUE) #first 2
mulbine2 <- merge(combines3, combines4, all = TRUE) #second 2
mulbine3 <- merge(mulbine, mulbine2, all = TRUE) #4
mulbine4 <- merge(mulbine3, combines5, all = TRUE) #all!

Thankfully, it seemed to work.

print(mulbine4)

##     age_categories         Type    n     perc
##  1:       18-24y/o    Groceries  828 30.54224
##  2:       18-24y/o Meet Friends 1092 40.28034
##  3:       18-24y/o   Shop Other  791 29.17743
##  4:       25-34y/o    Groceries  706 28.98194
##  5:       25-34y/o Meet Friends  966 39.65517
##  6:       25-34y/o   Shop Other  764 31.36289
##  7:       35-44y/o    Groceries  618 27.21268
##  8:       35-44y/o Meet Friends  918 40.42272
##  9:       35-44y/o   Shop Other  735 32.36460
## 10:       45-54y/o    Groceries  417 25.48900
## 11:       45-54y/o Meet Friends  661 40.40342
## 12:       45-54y/o   Shop Other  558 34.10758
## 13:         55+y/o    Groceries 1191 27.74284
## 14:         55+y/o Meet Friends 1634 38.06196
## 15:         55+y/o   Shop Other 1468 34.19520

I finally had a data frame that was ready to be plotted. I kept it simple with a geom_bar():

mulbinep <- mulbine4 %>% 
  ggplot(aes(age_categories, perc, fill = Type)) +
  geom_bar(position = 'dodge', stat='identity', width=0.5) +
  ggtitle("Differences in Types of Outings across Age") +
  labs(x="Age Categories", y="% of people engaging in frequent outings", fill="Type of Outing") +
  theme_bw()

print(mulbinep)

Learning Log W10

June Kam

2022-08-07

Visualising the data