The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this week’s lecture, we discussed a number of visualization approaches in order to explore a data set with categorical variables. This assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is iterative and repetitive nature of exploring a data set. It takes time to understand what the data is and what is interesting about the data (patterns).
For this week, we will be exploring the Copenhagen Housing Conditions Survey:
Your task for this assignment is to use ggplot and the facet_grid and facet_wrap functions to explore the Copenhagen Housing Conditions Survey. Your objective is to develop 5 report quality visualizations (at least 4 of the visualizations should use the facet_wrap or facet_grid functions) and identify interesting patterns and trends within the data that may be indicative of large scale trends. For each visualization you need to write a few sentences about the trends and patterns that you discover from the visualization.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
str(housing)
## 'data.frame': 72 obs. of 5 variables:
## $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
## $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
## $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: int 21 21 28 34 22 36 10 11 36 61 ...
# place code for vis here
p1 <- ggplot(housing, aes(x=Sat, y=Freq)) +
geom_bar(
aes(fill = Cont), stat = "identity", color = "white",
position = position_dodge(0.9)
) +
ggtitle("Plot of Number of Resident by Influence") + xlab("Management Influence") + ylab("Number of Resident") + labs(fill = "Contact")
p1 + facet_wrap(~Type)
# This graph shows the relation between householders' influence on the management of the property and the cost shared with other residents.In general, the more amount of cost shared with other residents, the higher perceived degree of influence householders have on the management of the property.
# place code for vis here
p2 <- ggplot(housing) +
geom_bar(aes(x = Cont, y = Freq, fill = Sat), stat = 'identity', width =.4, position = "dodge") +
labs(title="Number of Resident by Satisfaction with Rental Type ") +
labs(fill = "Satisfaction") +
xlab("Contact") +
ylab("Number of Resident") +
facet_wrap(~Type)
p2
# This graph shows the pattern of satisfaction of householders with their present housing circumstances. For atrium, residents are relatively have higher satisfaction. In constrast, for terrace, residents have relatively lower satisfaction. In most types of accomodation, the satisfaction of residents' with high contact is higher than the ones with low contact.
# place code for vis here
data3 <- housing %>%
group_by(Infl, Sat) %>%
summarise(Freq = sum(Freq))
p3 <- ggplot(data3, aes(Sat, Freq)) +
geom_point(aes(color = Infl), size = 3) +
facet_grid(Infl ~ ., scales = "free", space = "free") +
labs(y = "Number of Residents",
x = "Satisfaction",
title = "Number of Residents by Influence",
color = "Influence")
p3
# This graph shows the relation between satisfaction of householders with their present housing circumstances and perceived degree of influence householders have on the management of the property. When the influence on the management is low, residents are less satisfied. In contrast, when the influence on the management is high, more residents are satisfied.
# place code for vis here
p4 <- ggplot(housing, aes(x=Sat, y=Freq)) + geom_point(aes(color = Cont), size = 2) +
facet_grid(Cont~.) +
labs(y = "Number of Residents",
x = "Satisfaction",
title = "Number of Resident by Satisfaction with Contact",
color = "Contact")
p4
# This graph shows the relation between contact and satisfaction. For high contact residents that are afforded with other residents, residents have higher satisfaction. For both low and high contact residents that are afforded with other residents, residents have midium satisfaction.
# place code for vis here
p5 <- ggplot(housing) +
geom_bar(aes(x = Sat, y = Freq, fill = Cont), stat = 'identity', width =.4, position = "dodge") +
labs(title="Number of Resident by Satisfaction with Influence ") +
labs(fill = "Contact") +
xlab("Satisfaction") +
ylab("Number of Resident") +
facet_wrap(~Infl)
p5
# This graph shows the relation between contact and satisfaction. In each level of satisfaction, more residents have high contact that are afforded with other residents.This trend is significant in the group of medium to high satisfacton and low to medium influence.