The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this week’s lecture, we discussed a number of visualization approaches in order to explore a data set with categorical variables. This assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is iterative and repetitive nature of exploring a data set. It takes time to understand what the data is and what is interesting about the data (patterns).
For this week, we will be exploring the Copenhagen Housing Conditions Survey:
Your task for this assignment is to use ggplot and the facet_grid and facet_wrap functions to explore the Copenhagen Housing Conditions Survey. Your objective is to develop 5 report quality visualizations (at least 4 of the visualizations should use the facet_wrap or facet_grid functions) and identify interesting patterns and trends within the data that may be indicative of large scale trends. For each visualization you need to write a few sentences about the trends and patterns that you discover from the visualization.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
str(housing)
## 'data.frame': 72 obs. of 5 variables:
## $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
## $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
## $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: int 21 21 28 34 22 36 10 11 36 61 ...
housing
## Sat Infl Type Cont Freq
## 1 Low Low Tower Low 21
## 2 Medium Low Tower Low 21
## 3 High Low Tower Low 28
## 4 Low Medium Tower Low 34
## 5 Medium Medium Tower Low 22
## 6 High Medium Tower Low 36
## 7 Low High Tower Low 10
## 8 Medium High Tower Low 11
## 9 High High Tower Low 36
## 10 Low Low Apartment Low 61
## 11 Medium Low Apartment Low 23
## 12 High Low Apartment Low 17
## 13 Low Medium Apartment Low 43
## 14 Medium Medium Apartment Low 35
## 15 High Medium Apartment Low 40
## 16 Low High Apartment Low 26
## 17 Medium High Apartment Low 18
## 18 High High Apartment Low 54
## 19 Low Low Atrium Low 13
## 20 Medium Low Atrium Low 9
## 21 High Low Atrium Low 10
## 22 Low Medium Atrium Low 8
## 23 Medium Medium Atrium Low 8
## 24 High Medium Atrium Low 12
## 25 Low High Atrium Low 6
## 26 Medium High Atrium Low 7
## 27 High High Atrium Low 9
## 28 Low Low Terrace Low 18
## 29 Medium Low Terrace Low 6
## 30 High Low Terrace Low 7
## 31 Low Medium Terrace Low 15
## 32 Medium Medium Terrace Low 13
## 33 High Medium Terrace Low 13
## 34 Low High Terrace Low 7
## 35 Medium High Terrace Low 5
## 36 High High Terrace Low 11
## 37 Low Low Tower High 14
## 38 Medium Low Tower High 19
## 39 High Low Tower High 37
## 40 Low Medium Tower High 17
## 41 Medium Medium Tower High 23
## 42 High Medium Tower High 40
## 43 Low High Tower High 3
## 44 Medium High Tower High 5
## 45 High High Tower High 23
## 46 Low Low Apartment High 78
## 47 Medium Low Apartment High 46
## 48 High Low Apartment High 43
## 49 Low Medium Apartment High 48
## 50 Medium Medium Apartment High 45
## 51 High Medium Apartment High 86
## 52 Low High Apartment High 15
## 53 Medium High Apartment High 25
## 54 High High Apartment High 62
## 55 Low Low Atrium High 20
## 56 Medium Low Atrium High 23
## 57 High Low Atrium High 20
## 58 Low Medium Atrium High 10
## 59 Medium Medium Atrium High 22
## 60 High Medium Atrium High 24
## 61 Low High Atrium High 7
## 62 Medium High Atrium High 10
## 63 High High Atrium High 21
## 64 Low Low Terrace High 57
## 65 Medium Low Terrace High 23
## 66 High Low Terrace High 13
## 67 Low Medium Terrace High 31
## 68 Medium Medium Terrace High 21
## 69 High Medium Terrace High 13
## 70 Low High Terrace High 5
## 71 Medium High Terrace High 6
## 72 High High Terrace High 13
library(ggplot2)
# place code for vis here
ggplot(housing, aes(x=Cont, y=Freq)) +
geom_point() +
facet_wrap(~Type) +
theme_light() +
labs(x = 'Afforded Conatct',
y = 'Numbers of Residents',
title = 'Afforded Contact VS Number of Residents')
## as we can see the graph above, apartment type has the highest number of residents in every different class. however, atrium type has the smallest number of residents among four different types.
# place code for vis here
ggplot(housing, aes(x=Sat, y=Freq)) +
geom_point()+
facet_grid(Cont~.)+
theme_light() +
labs(x = 'Satisfaction',
y = 'Numbers of Residents',
title = 'Satisfaction Vs Numbers of Residents')
## We can see from the chart above, Residents who have the Low afforded contact have low satisfaction. Residents who have the High afforded contact have the high satisfaction
# place code for vis here
graph_3 = housing %>%
group_by(Cont, Type, Infl) %>%
summarise(Freq = sum(Freq))
ggplot(graph_3, aes(x = Infl, y = Cont)) +
geom_point(aes(col=Infl, size=Freq)) +
theme_light() +
facet_grid(Type ~ .)+
labs(x = 'Influence of Management',
y = 'Contact',
title = 'Contact Vs Influence on Management, Resident ')
## From the chart above, we can know that the higher the contact the higher the house hold and have the high influence on the Management.
# place code for vis here
graph_2 = housing %>%
group_by(Sat, Type, Infl) %>%
summarise(Freq = sum(Freq))
ggplot(graph_2, aes(x = Infl, y = Freq))+
geom_point(aes(col=Infl)) +
theme_light() +
facet_wrap(~Type) +
labs(x = 'Influence of Management',
y = 'Numbers of Residents',
title = 'Resident Vs Satisfaction, Rental Accomodation and Management Influence')
## Based on the chart we created, we can see that apartment residents have a high degree of influence on the Management versus the Atrium Residence.
# place code for vis here
ggplot(housing) +
geom_bar(aes(x = Sat, y = Freq, fill = Cont), stat = 'identity', width =.4, position = "dodge") +
labs(title="Number of Resident by Satisfaction with their Rental Type ")+
xlab("Satisfaction")+
ylab("Number of Resident") +
theme_light() +
facet_wrap(~Type)
## As we can see from the four charts above, apartment residents are more Satisfied the other type of accommodation.And atrium residents have the lowest satisfaction.