Solutions to USNA SM339 Homework 1

1. (a) The mean of the observations in our sample is a statistic and a number.

(b) The mean of the random variable X is a parameter and a number.

(c) The mean of the sample ($ X_1 + \cdots + X_{50} $)/50 is a random variable.

2. (a) The cutoff value $ z_{0.05} $ is 1.645.

qnorm(1 - 0.05, mean = 0, sd = 1)

## [1] 1.645

(b) A 90% CI for the mean using only the first observation is $ (8.03 - 1.645*5, 8.03 + 1.645*5) = (-0.195, 16.255). $

high = 8.03 + 1.645 * 5
low = 8.03 - 1.645 * 5
print(c(low, high))

## [1] -0.195 16.255

(c) First we compute the mean and the standard deviation of the observed values. Then we compute the 90% CI.

y = c(8.03, 8.04, 4.06, 12.78, 13.78, 10.31, 9.51, 7.27)
n = length(y)
obs.mean = mean(y)
sigma = 5
zstar = qnorm(1 - 0.05, mean = 0, sd = 1)
low = obs.mean - zstar * sigma/sqrt(n)
high = obs.mean + zstar * sigma/sqrt(n)
print(c(low, high))

## [1]  6.315 12.130

A 90% CI for the mean is (6.315, 12.13), computed using all the observed values.

3. (a) There were 510 fines assessed in California in 2009.

setwd("/Users/traves/Dropbox/SM339/Homework1")
fines = read.csv("CAfines.csv")
names(fines)

## [1] "X"      "DOTN"   "Name"   "Amount"

length(fines$Amount)

## [1] 510

(b) The maximum fine assessed in California in 2009 was $36,370.

max(fines$Amount)

## [1] 36370

(c) Let obs.mean be the mean of the list of 510 amounts in fines$Amount and let s be the standard deviation of these amounts. Let $ t_{0.025,509} = 1.964636 $ be the cutoff point for a Student-t distribution with 509 degrees of freedom at the 95% confidence level. Then the formula for the lower bound of the 95% CI is \[ obs.mean - t_{0.025,509}*s/sqrt(510). \]

qt(1 - 0.025, df = 509)

## [1] 1.965

(d) A 95% CI for the mean of $ X $ is $ (4888,6060) $ (in dollars, rounded to nearest dollar).

tstar = qt(1 - 0.025, df = 509)
obs.mean = mean(fines$Amount)
s = sd(fines$Amount)
low = obs.mean - tstar * s/sqrt(510)
high = obs.mean + tstar * s/sqrt(510)
print(c(low, high))

## [1] 4888 6060

We could have used the following code instead:

t.test(fines$Amount, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  fines$Amount 
## t = 18.35, df = 509, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0 
## 95 percent confidence interval:
##  4888 6060 
## sample estimates:
## mean of x 
##      5474

(e) We could enter:

require(lattice)

## Loading required package: lattice

require(latticeExtra)

## Loading required package: latticeExtra

## Loading required package: RColorBrewer

pdf(file = "plotoutput.pdf", height = 5, width = 5)
hist(fines$Amount, main = "Histogram of Fines", xlab = "Fine Amounts (in $)")
dev.off()

## pdf 
##   2

We get a pdf file with the following graphic:

hist(fines$Amount, main = "Histogram of Fines", xlab = "Fine Amounts (in $)")

plot of chunk unnamed-chunk-10

We could also have replaced the second line with the following to get a density plot:

densityplot(fines$Amount, main = "Histogram of Fines", xlab = "Fine Amounts (in $)")

plot of chunk unnamed-chunk-11

(f) The mean of a sample is a random variable and if the sample is large enough the mean is approximately distributed as a normal random variable. This is the content of the Central Limit Theorem.

(g) There were 46 fines issued in California in 2009 for more than $15,000.

length(fines$Amount[fines$Amount > 15000])

## [1] 46

Or use code like this:

big = fines$Amount > 15000
table(big)

## big
## FALSE  TRUE 
##   464    46