Homework 3

Part 1

library(haven)
library(xts)

col_data <- read.csv("/Users/iancopeland/Documents/R/stats_1/datasets/col_us_2016.csv")

1.1 Bedroom apartments

m <- mean(col_data$X1BR_apt)
sd <- sd(col_data$X1BR_apt)
  (1475.00 - m) / sd #chs
[1] 0.6213872
    (806.25 - m) / sd #chat
[1] -0.7378202
      (894.30 - m) / sd #sa
[1] -0.5588621
Variable 1: 1 bedroom apartment (X1BR_apt)
City z-score Summary
Charleston 0.62 Charleston’s mean cost of rent for a 1 bedroom apartment is more expensive than 71.2% of US cities.
Chattanooga -0.74 Chattanooga’s mean cost of rent for a 1 bedroom apartment is cheaper than 77% of US cities.
San Antonio -0.56 San Antonio’s mean cost of rent for a 1 bedroom apartment is cheaper than 71.2% of US cities.

1.2 Utility cost

m1 <- mean(col_data$Utilities)

sd1 <- sd(col_data$Utilities)

(189.06 - m1) / sd1 #chs
[1] 0.9358781
(122.11 - m1) / sd1 #chat
[1] -0.8163078
(135.46 - m1) / sd1 #sa
[1] -0.4669175
Variable 2: Basic utility cost (utilities)
City z-score Summary
Charleston 0.94 Charleston’s mean cost of utilities is more expensive than 82.6% of US cities.
Chattanooga -0.82 Chattanooga’s mean cost of utilities is cheaper than 79.3% of US cities.
San Antonio -0.47 San Antonio’s mean cost of utilities is cheaper than 68% of US cities.

1.3 Beer

m2 <- mean(col_data$Beer)

sd2 <- sd(col_data$Beer)

(1.65 - m2) / sd2 #chs
[1] -0.5420667
(2.67 - m2) / sd2 #chat
[1] 1.328618
(1.98 - m2) / sd2 #sa
[1] 0.06315483
Variable 3: Beer (Beer)
City z-score Summary
Charleston -0.54 Charleston’s beer prices are on average cheaper than 70.5% of US cities.
Chattanooga 1.33 Chattanooga’s beer prices are on average more expensive than 90.8% of US cities.
San Antonio 0.06 San Antonio’s beer prices are on average more expensive than 52.3% of US cities.

Part 2

grad_data <- read_sav("/Users/iancopeland/Documents/R/stats_1/datasets/grad_rates_sav.sav")

2.1 Mean graduation rate

m_x <- mean(grad_data$gradrate)

sd_x <- sd(grad_data$gradrate)

  print(m_x)
[1] 48.91702
  print(sd_x)
[1] 19.94411

2.2 20 samples

set.seed(0)

n = 20

sample_means = rep(grad_data$gradrate, n)

for(i in 1:n){
  sample_means[i] = mean(rnorm(30, mean = 48.9, sd = 19.9))
}

first(sample_means, n=20)
 [1] 49.33682 48.38715 48.70933 49.53925 43.03658 50.92297 52.27180 50.59589
 [9] 48.55560 50.58571 48.56179 50.28900 48.77118 51.61966 44.80247 47.64498
[17] 50.05546 46.45337 49.14148 42.94104

2.3 Sample means

Sample means
Sample Number Mean
1 49.33
2 48.34
3 48.71
4 49.54
5 43.04
6 50.92
7 52.27
8 50.6
9 48.56
10 50.59
11 48.56
12 50.29
13 48.78
14 51.62
15 44.8
16 47.64
17 50.06
18 46.45
19 49.14
20 42.94
means <- c(48.34, 48.71, 49.54, 43.04, 50.92, 52.27, 50.6, 48.56, 50.59, 48.56, 50.29, 48.78, 51.62, 44.8, 47.64, 50.06, 46.45, 49.14, 42.94)

m_sample <- mean(means)

  print(m_sample)
[1] 48.57105
  • This calculated sample mean is very close to our population mean of 48.91.
hist(sample_means, 
     main = "", 
     xlab = "Sample Means",
     col = "steelblue")

  • This histogram shows a normal distribution among the means of our sampling distribution. We can conclude that our sample is representative of the population.