1. (a) The mean of the observations in our sample is a statistic and a number.
(b) The mean of the random variable X is a parameter and a number.
(c) The mean of the sample (\( X_1 + \cdots + X_{50} \))/50 is a random variable.
2. (a) The cutoff value \( z_{0.05} \) is 1.645.
qnorm(1 - 0.05, mean = 0, sd = 1)
## [1] 1.645
(b) A 90% CI for the mean using only the first observation is
\( (8.03 - 1.645*5, 8.03 + 1.645*5) = (-0.195, 16.255). \)
high = 8.03 + 1.645 * 5
low = 8.03 - 1.645 * 5
print(c(low, high))
## [1] -0.195 16.255
(c) First we compute the mean and the standard deviation of the observed values. Then we compute the 90% CI.
y = c(8.03, 8.04, 4.06, 12.78, 13.78, 10.31, 9.51, 7.27)
n = length(y)
obs.mean = mean(y)
sigma = 5
zstar = qnorm(1 - 0.05, mean = 0, sd = 1)
low = obs.mean - zstar * sigma/sqrt(n)
high = obs.mean + zstar * sigma/sqrt(n)
print(c(low, high))
## [1] 6.315 12.130
A 90% CI for the mean is (6.315, 12.13), computed using all the observed values.
3. (a) There were 510 fines assessed in California in 2009.
setwd("/Users/traves/Dropbox/SM339/Homework1")
fines = read.csv("CAfines.csv")
names(fines)
## [1] "X" "DOTN" "Name" "Amount"
length(fines$Amount)
## [1] 510
(b) The maximum fine assessed in California in 2009 was
$36,370.
max(fines$Amount)
## [1] 36370
(c) Let obs.mean be the mean of the list of 510 amounts in fines$Amount and let s be the standard deviation of these amounts. Let \( t_{0.025,509} = 1.964636 \) be the cutoff point for a Student-t distribution with 509 degrees of freedom at the 95% confidence level. Then the formula for the lower bound of the 95% CI is \[ obs.mean - t_{0.025,509}*s/sqrt(510). \]
qt(1 - 0.025, df = 509)
## [1] 1.965
(d) A 95% CI for the mean of \( X \) is \( (4888,6060) \) (in dollars, rounded to nearest dollar).
tstar = qt(1 - 0.025, df = 509)
obs.mean = mean(fines$Amount)
s = sd(fines$Amount)
low = obs.mean - tstar * s/sqrt(510)
high = obs.mean + tstar * s/sqrt(510)
print(c(low, high))
## [1] 4888 6060
We could have used the following code instead:
t.test(fines$Amount, conf.level = 0.95)
##
## One Sample t-test
##
## data: fines$Amount
## t = 18.35, df = 509, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 4888 6060
## sample estimates:
## mean of x
## 5474
(e) We could enter:
require(lattice)
## Loading required package: lattice
require(latticeExtra)
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
pdf(file = "plotoutput.pdf", height = 5, width = 5)
hist(fines$Amount, main = "Histogram of Fines", xlab = "Fine Amounts (in $)")
dev.off()
## pdf
## 2
We get a pdf file with the following graphic:
hist(fines$Amount, main = "Histogram of Fines", xlab = "Fine Amounts (in $)")
We could also have replaced the second line with the following to get a density plot:
densityplot(fines$Amount, main = "Histogram of Fines", xlab = "Fine Amounts (in $)")
(f) The mean of a sample is a random variable and if the sample is large enough the mean is approximately distributed as a normal random variable. This is the content of the Central Limit Theorem.
(g) There were 46 fines issued in California in 2009 for more than $15,000.
length(fines$Amount[fines$Amount > 15000])
## [1] 46
Or use code like this:
big = fines$Amount > 15000
table(big)
## big
## FALSE TRUE
## 464 46