Objectives

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Look at the data

data("housing")
str(housing)
## 'data.frame':    72 obs. of  5 variables:
##  $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
##  $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
##  $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Freq: int  21 21 28 34 22 36 10 11 36 61 ...
summary(housing)
##      Sat         Infl           Type      Cont         Freq      
##  Low   :24   Low   :24   Tower    :18   Low :36   Min.   : 3.00  
##  Medium:24   Medium:24   Apartment:18   High:36   1st Qu.:10.00  
##  High  :24   High  :24   Atrium   :18             Median :19.50  
##                          Terrace  :18             Mean   :23.35  
##                                                   3rd Qu.:31.75  
##                                                   Max.   :86.00

1. First plot - Satisfaction by house type

plot1 <- ggplot(data = housing, aes(x=Sat,y=Freq)) + #, fill=Sat)) +
  geom_col() + facet_wrap(~ Type) + labs(x="Satisfaction", y="Frequency", fill="Contact") +
  scale_fill_viridis_d(option = "D") + theme_bw()
plot1

Looking at the chart above, we can say people living in towers appear to be most satisfied where as people living in Terrace type houses tend to be least satisfied among the four house types (in terms of proportion)

2. Contact vs Influence - what affects Satisfaction more?

ggplot(data = housing, aes(x=Cont, y=Freq, fill=Sat)) + 
  scale_fill_viridis_d(option = "D") + labs(x="Contact", y="Proportion", fill="Satisfaction") +
  geom_col(position = "fill") + facet_grid(vars(Infl),vars(Type)) + theme_bw() +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.4), axis.title.x.top = element_text("Contact"))

I created the above chart to try and see if I can identify which of the two scores, Influence and Contact, contribute more towards overall satisfaction. I can see Influence score have higher impact on overall satisfaction. Although Contact score affect the overall satisfaction, the impact looks smaller than Influence scores and in a couple of cases, the overall satisfaction has decreases with increase in Contact score (Type - Terrace, Influence - Medium and Low) indicating residents in terrace type houses value influence more than contact. Overall, the satisfaction scores increase more rapidly with increase in influence scores vs an increase in contact scores.

3. Mosaic Plot

tab1 <- as.table(xtabs(Freq ~ Cont + Infl + Sat, housing))
mosaic(tab1, shade = TRUE, legend = TRUE, labeling = labeling_values())

The mosaic chart confirm that influence has a high impact than contact on satisfaction scores.

1. 55% of the respondants with Low Contact but High Influence reported High Satisfaction. Only 24% respondents reported a low Satisfaction score in this group.

2. However, only 29% of the respondants with High Contact but Low Influence reported High satisfaction. 43% respondents reported a low Satisfaction score in this group.

4. Fourth plot - Terrace vs Tower Scores

typet <- c("Tower", "Terrace")
#housing_sub <- filter(housing, Type %in% typet)

plot4.1 <- housing %>% #mutate(Sat = factor(Sat, levels=rev(levels(Sat)))) %>%
  filter(Type %in% typet) %>%
  ggplot(aes(x=Type,y=Freq, fill=Infl)) + geom_col() + facet_grid(vars(Infl)) +
  labs(y="Frequency", title = "Influence Scores by Type") +
  scale_fill_viridis_d(option = "D") + theme_bw() + theme(legend.position = "none")

plot4.2 <- housing %>% #mutate(Sat = factor(Sat, levels=rev(levels(Sat)))) %>%
  filter(Type %in% typet) %>%
  ggplot(aes(x=Type,y=Freq, fill=Cont)) + geom_col() + facet_grid(vars(Cont)) +
  labs(y="Frequency", title = "Contact Scores by Type") +
  scale_fill_viridis_d(option = "D") + theme_bw() + theme(legend.position = "none")

ggarrange(plot4.1, plot4.2)

Now that we know (1) influence has a greater impact than contact on overall satisfaction and (2) residents in towers are most satisfied and thoes in terraces are least, I wanted to check if the residents in towers have more influence on the management than residents in terrace type houses. The plot on the left helps answer this question. We can see that Towers have higher number of residents with High Influence score compared to residents in Terraces. The chart on the right shows that the number of residents with high Contact scores are same for both Tower and Terrance residents. We can also see there is a much higher number of residents with Low Contact score in Towers vs Terrace but this low score does not translate into a similar precent of low overall satisfaction scores.

5. Fifth plot

plot5 <- housing %>% mutate(Sat = factor(Sat, levels=rev(levels(Sat)))) %>%
  ggballoonplot(x = "Cont", y = "Infl",
 size = "Freq", fill = "Freq",
 facet.by = c("Sat", "Type"),
 ggtheme = theme_bw()) + scale_fill_viridis_c(option = "D")
ggpar(plot5, xlab = "Influence", ylab = "Contact", title = "Frequency of responses")

Observation from the bubble chart -

1. There are quite a few residents who have a high Contact score and a high Influence score but still have a Satisfaction score of Low. So there are factors other that Contact and Influence that are important to the residents.