In this exercise you will follow through our textbook, History of Teacup Giraffes, Module: Introduction to the Normal Distribution. Make sure to read the module before you do this exercise.
The code chunk below creates the teacup giraffes dataset simiar to the on in the textbook. You will work with this data set for the quiz.
# Load the package
library(tidyverse)
library(caret)
# Import data
data(Sacramento, package = "caret")
# Print the first 6 rows
head(Sacramento)
## city zip beds baths sqft type price latitude longitude
## 1 SACRAMENTO z95838 2 1 836 Residential 59222 38.63191 -121.4349
## 2 SACRAMENTO z95823 3 1 1167 Residential 68212 38.47890 -121.4310
## 3 SACRAMENTO z95815 2 1 796 Residential 68880 38.61830 -121.4438
## 4 SACRAMENTO z95815 2 1 852 Residential 69307 38.61684 -121.4391
## 5 SACRAMENTO z95824 2 1 797 Residential 81900 38.51947 -121.4358
## 6 SACRAMENTO z95841 3 1 1122 Condo 89921 38.66260 -121.3278
# Get a sense of the dataset
glimpse(Sacramento)
## Rows: 932
## Columns: 9
## $ city <fct> SACRAMENTO, SACRAMENTO, SACRAMENTO, SACRAMENTO, SACRAMENT...
## $ zip <fct> z95838, z95823, z95815, z95815, z95824, z95841, z95842, z...
## $ beds <int> 2, 3, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 1, 3, 2, 2, 2, 2, 2, ...
## $ baths <dbl> 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, ...
## $ sqft <int> 836, 1167, 796, 852, 797, 1122, 1104, 1177, 941, 1146, 90...
## $ type <fct> Residential, Residential, Residential, Residential, Resid...
## $ price <int> 59222, 68212, 68880, 69307, 81900, 89921, 90895, 91002, 9...
## $ latitude <dbl> 38.63191, 38.47890, 38.61830, 38.61684, 38.51947, 38.6626...
## $ longitude <dbl> -121.4349, -121.4310, -121.4438, -121.4391, -121.4358, -1...
summary(Sacramento)
## city zip beds baths
## SACRAMENTO :438 z95823 : 61 Min. :1.000 Min. :1.000
## ELK_GROVE :114 z95828 : 45 1st Qu.:3.000 1st Qu.:2.000
## ROSEVILLE : 48 z95758 : 44 Median :3.000 Median :2.000
## CITRUS_HEIGHTS: 35 z95835 : 37 Mean :3.276 Mean :2.053
## ANTELOPE : 33 z95838 : 37 3rd Qu.:4.000 3rd Qu.:2.000
## RANCHO_CORDOVA: 28 z95757 : 36 Max. :8.000 Max. :5.000
## (Other) :236 (Other):672
## sqft type price latitude
## Min. : 484 Condo : 53 Min. : 30000 Min. :38.24
## 1st Qu.:1167 Multi_Family: 13 1st Qu.:156000 1st Qu.:38.48
## Median :1470 Residential :866 Median :220000 Median :38.62
## Mean :1680 Mean :246662 Mean :38.59
## 3rd Qu.:1954 3rd Qu.:305000 3rd Qu.:38.69
## Max. :4878 Max. :884790 Max. :39.02
##
## longitude
## Min. :-121.6
## 1st Qu.:-121.4
## Median :-121.4
## Mean :-121.4
## 3rd Qu.:-121.3
## Max. :-120.6
##
Hint: See the result of glimpse(Sacramento).
932 homes
Hint: See the result of summary(Sacramento).
$220,000
Hint: See the type variable.
condo, multi-family, residential
Hint: Create a histogram.
p <- ggplot(data = Sacramento , aes(x = price)) +
geom_histogram()
p
Hint: Discuss in terms of the characterstics of the normal distribution you learned in Quiz2-a.
no the data is not symmetrical
Hint: Add different colors for each group in the histogram. In addition, try other types of plots, such as density plot. See Data Visualization with R: Ch4.3.
p <- ggplot(data = Sacramento , aes(x = price, fill = type)) +
geom_density(alph = .4)
## Warning: Ignoring unknown parameters: alph
p
Hint: Discuss price ranges, typical values, and shapes of the distribution.
the distribution for multi-family is pretty normal but for condos and residential its not symmetrical.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.