knitr::opts_chunk$set(echo = TRUE)
library(MASS)
library(tidyverse)
## -- Attaching packages ----------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.4
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts -------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::select() masks MASS::select()
The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this week’s lecture, we discussed a number of visualization approaches in order to explore a data set with categorical variables. This assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is iterative and repetitive nature of exploring a data set. It takes time to understand what the data is and what is interesting about the data (patterns).
For this week, we will be exploring the Copenhagen Housing Conditions Survey:
Your task for this assignment is to use ggplot and the facet_grid and facet_wrap functions to explore the Copenhagen Housing Conditions Survey. Your objective is to develop 5 report quality visualizations (at least 4 of the visualizations should use the facet_wrap or facet_grid functions) and identify interesting patterns and trends within the data that may be indicative of large scale trends. For each visualization you need to write a few sentences about the trends and patterns that you discover from the visualization.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
library(ggplot2)
str(housing)
## 'data.frame': 72 obs. of 5 variables:
## $ Sat : Ord.factor w/ 3 levels "Low"<"Medium"<..: 1 2 3 1 2 3 1 2 3 1 ...
## $ Infl: Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 2 2 3 3 3 1 ...
## $ Type: Factor w/ 4 levels "Tower","Apartment",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ Cont: Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: int 21 21 28 34 22 36 10 11 36 61 ...
# place code for vis here
ggplot(housing, aes(Sat, Freq)) +
geom_bar(
aes(fill = Type), stat = 'identity',
position = position_dodge()
) +
labs(x = 'Satisfaction Level',
y = '# of Residents',
title = 'Figure1: Satisfaction by Rental Types') +
facet_wrap(~Type)
Figure1 shows the satisfaction level of each type of rental accomodations. Obviously Terrace renters has the lowest level of satisfaction among all types, while the apartment renters are quite divided: they have more ‘love and hate’ than okay.
# place code for vis here
housing_1 = housing %>%
group_by(Infl, Type) %>%
summarise(Freq = sum(Freq))
ggplot(housing_1, aes(Infl, Freq)) +
geom_bar(
aes(fill = Infl), stat = 'identity',
position = position_dodge()
)+
labs(x = 'Perceived Degree of Influence',
y = '# of Residents',
title = 'Figure2: Degree of Influence by Rental Types'
) +
scale_fill_discrete(name = 'Influence') +
facet_wrap(~Type)
Figure2 shows the perceived degree of influence of each type of rental accomodations. Most of tower & apartment residents have medium influence, while atrium & terrace redidents have lower influence than 2 other types. From my point of view, towers and apartments can accomodate more residents than other building types, and that’s why their residents have bigger influence and more power of negotiating.
# place code for vis here
ggplot(housing, aes(Freq, Sat)) +
geom_point(aes(size = Freq, color = Freq)) +
facet_grid(Infl ~ .) +
guides(
size = guide_legend(title = '# of Residents'),
color = guide_legend(title = '# of Residents')
) +
labs(x = '# of Residents',
y = 'Satisfaction Level',
title = 'Figure3: Satisfaction Level by Influence Degree')
Figure3 shows that residents with low influence on property management tend to have lower satisfaction level and vice versa. It make sense because if you can make suggestions to the property management and get accepted, you’ll tend to live happier in the community. Otherwise, you’ll have to live on what you have and keep complaining.
# place code for vis here
ggplot(housing, aes(Freq, Sat)) +
geom_point(aes(size = Freq, color = Freq)) +
facet_grid(Type ~ .) +
guides(
size = guide_legend(title = '# of Residents'),
color = guide_legend(title = '# of Residents')
) +
labs(x = '# of Residents',
y = 'Satisfaction Level',
title = 'Figure4: Satisfaction Level by Rental Type')
Figuer4 shows that the apartment residents have highest number of extremely satisified as well as extremely not satisified among all rental types, and terrace residents have highest level of dissatisfaction.
# place code for vis here
ggplot(housing, aes(Freq, Infl)) +
geom_point(aes(size = Freq, color = Freq)) +
facet_grid(Cont ~ .) +
guides(
size = guide_legend(title = '# of Residents'),
color = guide_legend(title = '# of Residents')
) +
labs(x = '# of Residents',
y = 'Degree of Influence',
title = 'Figure5: Degree of Influence by Affordability')
Figure5 shows that residents with lower affordability usually have low degree of influence towards the property management and vice versa. Generally speaking, the more moeny you make, the better location you’ll live in, and the more power you have toward negotiating with property management company.