In this exercise you will follow through our textbook, History of Teacup Giraffes, Module: Introduction to the Normal Distribution. Make sure to read the module before you do this exercise.

The code chunk below creates the teacup giraffes dataset simiar to the on in the textbook. You will work with this data set for the quiz.

# Load the package
library(tidyverse)
library(caret)

# Import data
data(Sacramento, package = "caret")

# Print the first 6 rows
head(Sacramento)
##         city    zip beds baths sqft        type price latitude longitude
## 1 SACRAMENTO z95838    2     1  836 Residential 59222 38.63191 -121.4349
## 2 SACRAMENTO z95823    3     1 1167 Residential 68212 38.47890 -121.4310
## 3 SACRAMENTO z95815    2     1  796 Residential 68880 38.61830 -121.4438
## 4 SACRAMENTO z95815    2     1  852 Residential 69307 38.61684 -121.4391
## 5 SACRAMENTO z95824    2     1  797 Residential 81900 38.51947 -121.4358
## 6 SACRAMENTO z95841    3     1 1122       Condo 89921 38.66260 -121.3278
# Get a sense of the dataset
glimpse(Sacramento)
## Rows: 932
## Columns: 9
## $ city      <fct> SACRAMENTO, SACRAMENTO, SACRAMENTO, SACRAMENTO, SACRAMENT...
## $ zip       <fct> z95838, z95823, z95815, z95815, z95824, z95841, z95842, z...
## $ beds      <int> 2, 3, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 1, 3, 2, 2, 2, 2, 2, ...
## $ baths     <dbl> 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, ...
## $ sqft      <int> 836, 1167, 796, 852, 797, 1122, 1104, 1177, 941, 1146, 90...
## $ type      <fct> Residential, Residential, Residential, Residential, Resid...
## $ price     <int> 59222, 68212, 68880, 69307, 81900, 89921, 90895, 91002, 9...
## $ latitude  <dbl> 38.63191, 38.47890, 38.61830, 38.61684, 38.51947, 38.6626...
## $ longitude <dbl> -121.4349, -121.4310, -121.4438, -121.4391, -121.4358, -1...
summary(Sacramento)
##              city          zip           beds           baths      
##  SACRAMENTO    :438   z95823 : 61   Min.   :1.000   Min.   :1.000  
##  ELK_GROVE     :114   z95828 : 45   1st Qu.:3.000   1st Qu.:2.000  
##  ROSEVILLE     : 48   z95758 : 44   Median :3.000   Median :2.000  
##  CITRUS_HEIGHTS: 35   z95835 : 37   Mean   :3.276   Mean   :2.053  
##  ANTELOPE      : 33   z95838 : 37   3rd Qu.:4.000   3rd Qu.:2.000  
##  RANCHO_CORDOVA: 28   z95757 : 36   Max.   :8.000   Max.   :5.000  
##  (Other)       :236   (Other):672                                  
##       sqft                type         price           latitude    
##  Min.   : 484   Condo       : 53   Min.   : 30000   Min.   :38.24  
##  1st Qu.:1167   Multi_Family: 13   1st Qu.:156000   1st Qu.:38.48  
##  Median :1470   Residential :866   Median :220000   Median :38.62  
##  Mean   :1680                      Mean   :246662   Mean   :38.59  
##  3rd Qu.:1954                      3rd Qu.:305000   3rd Qu.:38.69  
##  Max.   :4878                      Max.   :884790   Max.   :39.02  
##                                                                    
##    longitude     
##  Min.   :-121.6  
##  1st Qu.:-121.4  
##  Median :-121.4  
##  Mean   :-121.4  
##  3rd Qu.:-121.3  
##  Max.   :-120.6  
## 

Q1 How many homes are there in the Sacramento dataset?

Hint: See the result of glimpse(Sacramento).

932 homes

Q2 What is the median home price?

Hint: See the result of summary(Sacramento).

$220,000

Q3 What types of homes are there?

Hint: See the type variable.

condo, multi-family, residential

Q4 Graph the distribution of home prices.

Hint: Create a histogram.

p <- ggplot(data = Sacramento , aes(x = price)) + 
  geom_histogram()

p

Q5 Are home prices normally distributed? Why or why not?

Hint: Discuss in terms of the characterstics of the normal distribution you learned in Quiz2-a.

no the data is not symmetrical

Q6 Check if the distribution looks different among different types of homes.

Hint: Add different colors for each group in the histogram. In addition, try other types of plots, such as density plot. See Data Visualization with R: Ch4.3.

p <- ggplot(data = Sacramento , aes(x = price, fill = type)) + 
  geom_density(alph = .4)
## Warning: Ignoring unknown parameters: alph
p

Q7 Describe the distributions of home prices among different types. How are they similar and different?

Hint: Discuss price ranges, typical values, and shapes of the distribution.

the distribution for multi-family is pretty normal but for condos and residential its not symmetrical.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.