The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this week’s lecture, we discussed a number of visualization approaches in order to explore a data set with categorical variables. This assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is iterative and repetitive nature of exploring a data set. It takes time to understand what the data is and what is interesting about the data (patterns).
For this week, we will be exploring the Copenhagen Housing Conditions Survey:
Your task for this assignment is to use ggplot and the facet_grid and facet_wrap functions to explore the Copenhagen Housing Conditions Survey. Your objective is to develop 5 report quality visualizations (at least 4 of the visualizations should use the facet_wrap or facet_grid functions) and identify interesting patterns and trends within the data that may be indicative of large scale trends. For each visualization you need to write a few sentences about the trends and patterns that you discover from the visualization.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
str(housing)
## 'data.frame': 72 obs. of 5 variables:
## $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
## $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
## $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: int 21 21 28 34 22 36 10 11 36 61 ...
# place code for vis here
library(ggplot2)
p<-ggplot(housing, aes(x=Sat, y=Freq)) + geom_point(shape=1)
p+facet_grid(Cont~.)
Low contact residents that are afforded with other residents have the low satisfaction on present housing circumstances, which have the highest numbers of residents in each class. High contact residents that are afforded with other residents have the highest satisfaction on present housing circumstances, which have the highest numbers of residents in each class. Both low and high contact residents afforded with other residents have the medium satisfaction on present housing circumstances, which have the lowest numbers of residents in each class. ## 2. Second plot
# place code for vis here
p1<-ggplot(housing, aes(x=Infl, y=Freq)) + geom_point(shape=1)
p1+facet_grid(Cont~.)
Low contact residents that are afforded with other residents have low influence on management, which have the highest numbers of residents in each class. High contact residents that are afforded with other residents have Medium influence on management , which have the highest numbers of residents in each class.
# place code for vis here
p2<-ggplot(housing, aes(x=Cont, y=Freq)) + geom_point(shape=1)
p2+facet_wrap(~Type,ncol=2)
Atrium housing type has the lowest number of residents in each class, while apartment housing type has the highest numbers of residents.
# place code for vis here
p3<-ggplot(housing, aes(x=Sat,y=Freq)) + geom_point(shape=1)
p3+facet_grid(~Type)
Residents in tower, apartment, and atrium with high satisfaction on present housing circumstances have the highest numbers of residents in each class. Residents in terrace with low satisfaction on present housing circumstances have the highest numbers of residents in each class.
# place code for vis here
p4<-ggplot(housing, aes(x=Sat, y=Freq)) + geom_point(shape=1)
p4+facet_grid(Type~Infl)
Residents in tower, apartment and atrium with high satisfaction on present housing circumstances and medium influence on management have most residents in each class. But residents in terrance with low satisfaction on present housing circumstances and low influence on management have the most residents in each class. For all four housing types, residents with low satisfaction on present housing circumstances and high influence on management have the lowest numbers of residents in each class.