The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this week’s lecture, we discussed a number of visualization approaches in order to explore a data set with categorical variables. This assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is iterative and repetitive nature of exploring a data set. It takes time to understand what the data is and what is interesting about the data (patterns).
For this week, we will be exploring the Copenhagen Housing Conditions Survey:
Your task for this assignment is to use ggplot and the facet_grid and facet_wrap functions to explore the Copenhagen Housing Conditions Survey. Your objective is to develop 5 report quality visualizations (at least 4 of the visualizations should use the facet_wrap or facet_grid functions) and identify interesting patterns and trends within the data that may be indicative of large scale trends. For each visualization you need to write a few sentences about the trends and patterns that you discover from the visualization.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
str(housing)
## 'data.frame': 72 obs. of 5 variables:
## $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
## $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
## $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: int 21 21 28 34 22 36 10 11 36 61 ...
ggplot(housing, aes(x = Sat, y = Freq)) + geom_bar(aes(fill = Cont), stat = "identity", color = "white", position = position_dodge(0.9)) + facet_wrap(~Type) + labs(title = "Rental Cost Shared & Householders' Satisfactions", x = "Householders' Satisfaction", y = "The Number of Residents", fill = "Cost Shared")
When the rental shared with other residents is high, tenants of the tower and terrace property types show an opposite pattern of satisfaction - the majority of tower residents are content with their current housing circumstances while most of the terrace residents showcase much dissatisfaction.
When the rental shared with other residents is low, residents of all property types show a similar pattern of satisfaction, with most residents having either low or high satisfaction. There is only a small number of people having medium feelings about their property.
ggplot(housing, aes(x = Infl, y = Freq)) + geom_bar(aes(fill = Cont), stat = "identity", color = "white", position = position_dodge(0.9)) + facet_wrap(~Type) + labs(title = "Rental Cost Shared & Householders' Influence on Management", x = "Householders' Influence", y = "The Number of Residents", fill = "Cost Shared")
When the rental shared with other residents is high - in all rental property types, fewer residents have a high-perceived degree of influence on the management of the property.
Looking more closely, we notice that for terrace type property, there is a much larger difference in the number of residents showing a low, medium and high perceived degree of influence on the management of the property. The majority of residents feel like having a small influence on the management, followed by medium, and high.
When cost shared with other residents is low, there isn’t a very clear trend of satisfaction across property types.
housing_new <- housing %>% group_by(Type, Sat) %>% summarise(Freq = sum(Freq))
ggplot(housing_new, aes(Sat, Freq)) + geom_point(aes(color = Type)) + facet_grid(Type ~ ., scales = "free", space = "free") + theme_light() + theme(strip.text.y = element_text(angle = 0), legend.position = "none") + labs(y = "The Number of Residents", x = "Hoseholders' Satisfaction", title = "Residents' Satisfaction by Different Rental Types")
From the graph above, we can conclude that the majority of residents in both tower and atrium property types are satisfied with their housing conditions while unsatisfied residents are rare.
On the other hand, terrace property householders are mostly unsatisfied, and only a few feel satisfied or neutral about their current housing circumstances.
Furthermore, most apartment residents are either very satisfied or unsatisfied. This property type also has the largest sample size based on how much of the estate it has in the graph.
housing2 <- housing %>% group_by(Infl, Cont) %>% summarise(Freq = sum(Freq))
ggplot(housing2, aes(Cont, Freq)) + geom_point(aes(color = Infl)) + facet_grid(Infl ~ ., scales = "free", space = "free") + theme_light() + theme(strip.text.y = element_text(angle = 0), legend.position = "none") + labs(y = "The Number of Residents", x = "Cost Shared With Other Residents", title = "Rental Shared & Householders' Perceived Influence on Management")
When cost shared with other residents is high, most householders feel like they have either a low or medium influence on property management. However, when the rental shared is low, residents (comparatively) are more likely to have a high perceived influence over property management. The majority of the residents in this dataset have low or medium perceived influence.
housing3 <- housing %>% group_by(Infl, Sat) %>% summarise(Freq = sum(Freq))
ggplot(housing3, aes(Sat, Freq)) + geom_point(aes(color = Infl)) + facet_grid(Infl ~ ., scales = "free", space = "free") + theme_light() + theme(strip.text.y = element_text(angle = 0), legend.position = "none") + labs(y = "The Number of Residents", x = "Hoseholders' Satisfaction", title = "Residents' Satisfaction & Perceived Influence on Management")
From the graph, we can conclude that when the householder’s satisfaction is high, the majority of the residents have either medium or high perceived influence on the management. However, when the satisfaction is low, most residents feel like they have very little influence on the management.