In this exercise you will learn to plot data using the ggplot2 package. To this end, you will make your own note of 4.1 Categorical vs. Categorical from Data Visualization with R.

# Load package
library(tidyverse)

# Load data
data(SaratogaHouses, package="mosaicData")
glimpse(SaratogaHouses)
## Observations: 1,728
## Variables: 16
## $ price           <int> 132500, 181115, 109000, 155000, 86060, 120000, 1…
## $ lotSize         <dbl> 0.09, 0.92, 0.19, 0.41, 0.11, 0.68, 0.40, 1.21, …
## $ age             <int> 42, 0, 133, 13, 0, 31, 33, 23, 36, 4, 123, 1, 13…
## $ landValue       <int> 50000, 22300, 7300, 18700, 15000, 14000, 23300, …
## $ livingArea      <int> 906, 1953, 1944, 1944, 840, 1152, 2752, 1662, 16…
## $ pctCollege      <int> 35, 51, 51, 51, 51, 22, 51, 35, 51, 44, 51, 51, …
## $ bedrooms        <int> 2, 3, 4, 3, 2, 4, 4, 4, 3, 3, 7, 3, 2, 3, 3, 3, …
## $ fireplaces      <int> 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ bathrooms       <dbl> 1.0, 2.5, 1.0, 1.5, 1.0, 1.0, 1.5, 1.5, 1.5, 1.5…
## $ rooms           <int> 5, 6, 8, 5, 3, 8, 8, 9, 8, 6, 12, 6, 4, 5, 8, 4,…
## $ heating         <fct> electric, hot water/steam, hot water/steam, hot …
## $ fuel            <fct> electric, gas, gas, gas, gas, gas, oil, oil, ele…
## $ sewer           <fct> septic, septic, public/commercial, septic, publi…
## $ waterfront      <fct> No, No, No, No, No, No, No, No, No, No, No, No, …
## $ newConstruction <fct> No, No, No, No, Yes, No, No, No, No, No, No, No,…
## $ centralAir      <fct> No, No, No, No, Yes, No, No, No, No, No, No, No,…

Q1 Stacked bar chart Plot the relationship between newConstruction and heating type.

Hint: See the code in 4.1.1 Stacked bar chart.

ggplot(SaratogaHouses, 
       aes(x = newConstruction, 
           fill = heating)) + 
  geom_bar(position = "stack")

Q2 What is the most common heating system overall? Discuss your reason.

Hint: See the stacked bar chart you created in the previous question.

Hot Air is the most common heating system. Hot air has over 1500 systems while others have around 500, and 200.

Q3 Grouped bar chart Plot the relationship between newConstruction and heating type.

Hint: See the code in 4.1.2 Grouped bar chart.

ggplot(SaratogaHouses, 
       aes(x = newConstruction, 
           fill = heating)) + 
  geom_bar(position = "dodge")

Q4 Segmented bar chart Plot the relationship between newContruction and heating type.

Hint: See the code in 4.1.3 Segmented bar chart.

ggplot(SaratogaHouses, 
       aes(x = newConstruction, 
           fill = heating)) + 
  geom_bar(position = "fill") +
  labs(y = "Proportion")

Q5 In which type of houses (new or old) is the proportion of hot air heating system higher? Discuss your reason.

Hint: See the segmented bar chart you created in the previous question.

In new houses because there are way more hot air systems going in almost at 100%

Q6 Rename the construction type as new and old.

Hint: See the code in 4.1.4 Improving the color and labeling.

library(ggplot2)

ggplot(SaratogaHouses, 

       aes(x = factor(newConstruction,
                         labels = c("new", 
                                    "old")), 
           fill = heating)) + 
  geom_bar(position = "fill") +
  labs(y = "Proportion")

Q7 Add labels to the axes.

Hint: See the code in 4.1.4 Improving the color and labeling.

library(ggplot2)

ggplot(SaratogaHouses, 
       aes(x = factor(newConstruction,
                         labels = c("new", 
                                    "old")), 
           fill = heating)) + 
  geom_bar(position = "fill") +
  labs(y = "Proportion")+
  labs(y = "Percent", 
       fill = "Heating Types",
       x = "Construction Type",
       title = "Old Vs New houses")+
  theme_minimal()

Q8 Hide the messages and the code, but display results of the code from the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.