The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this week’s lecture, we discussed a number of visualization approaches in order to explore a data set with categorical variables. This assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is iterative and repetitive nature of exploring a data set. It takes time to understand what the data is and what is interesting about the data (patterns).
For this week, we will be exploring the Copenhagen Housing Conditions Survey:
Your task for this assignment is to use ggplot and the facet_grid and facet_wrap functions to explore the Copenhagen Housing Conditions Survey. Your objective is to develop 5 report quality visualizations (at least 4 of the visualizations should use the facet_wrap or facet_grid functions) and identify interesting patterns and trends within the data that may be indicative of large scale trends. For each visualization you need to write a few sentences about the trends and patterns that you discover from the visualization.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
str(housing)
## 'data.frame': 72 obs. of 5 variables:
## $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
## $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
## $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: int 21 21 28 34 22 36 10 11 36 61 ...
head(housing)
## Sat Infl Type Cont Freq
## 1 Low Low Tower Low 21
## 2 Medium Low Tower Low 21
## 3 High Low Tower Low 28
## 4 Low Medium Tower Low 34
## 5 Medium Medium Tower Low 22
## 6 High Medium Tower Low 36
# place code for vis here
ggplot(housing, aes(x=Cont, y=Freq)) +
geom_point() +
facet_wrap(~Type) +
theme_light() +
labs(x = 'Afforded Conatct',
y = 'Numbers of Residents',
title = 'Afforded Contact VS Number of Residents')
Above graph shows the trend between Afforded Contact and Number of Residents.Based on the graph we can see that Apartment type have a higest number of Resident in each class Howerver, Atrium have the lowers number of Resident in each class.
# place code for vis here
ggplot(housing, aes(x=Sat, y=Freq)) +
geom_point()+
facet_grid(Cont~.)+
theme_light() +
labs(x = 'Satisfaction',
y = 'Numbers of Residents',
title = 'Satisfaction Vs Numbers of Residents')
Above graph show the trend between Satisfactions and Number of Residence across the different afforded contact. From the graph we can say that Resident with the Low afforded contact have low satisfaction and Resident with the High afforded contact have the high satisfaction however residents with the Medium afforded contact have medium satisfaction.
# place code for vis here
graph_3 = housing %>%
group_by(Cont, Type, Infl) %>%
summarise(Freq = sum(Freq))
ggplot(graph_3, aes(x = Infl, y = Cont)) +
geom_point(aes(col=Infl, size=Freq)) +
theme_light() +
facet_grid(Type ~ .)+
labs(x = 'Influence of Management',
y = 'Contact',
title = 'Contact Vs Influence on Management, Resident ')
Above graph shows the Contact Vs Influence on Management and frequencies across the different residence type. From the graph we can see that the higher the contact the higher the house hold and have the high influence on the Management. We can see similar trend in Atrium and Terrace Residence.
# place code for vis here
graph_2 = housing %>%
group_by(Sat, Type, Infl) %>%
summarise(Freq = sum(Freq))
ggplot(graph_2, aes(x = Infl, y = Freq))+
geom_point(aes(col=Infl)) +
theme_light() +
facet_wrap(~Type) +
labs(x = 'Influence of Management',
y = 'Numbers of Residents',
title = 'Resident Vs Satisfaction, Rental Accomodation and Management Influence')
Above graph shows the number of Resident have an influence of management across different type of accommodation. Based on the graph, Apartment resident have a high degree of influence on the Management then the Atrium Residence.
# place code for vis here
ggplot(housing) +
geom_bar(aes(x = Sat, y = Freq, fill = Cont), stat = 'identity', width =.4, position = "dodge") +
labs(title="Number of Resident by Satisfaction with their Rental Type ")+
xlab("Satisfaction")+
ylab("Number of Resident") +
theme_light() +
facet_wrap(~Type)
Above graph show the Number of Resident are satisfied across the different type of accommodation. form the graph we can see that Residents from the Apartment are more Satisfied the other type of accommodation. Also, Residents from the Atrium the lowest in satisfaction. When we look at the cost shared with other residence is low we can see the similar pattern across the different accommodation type.