Exam # 1

Brendan Mulholland

date()
## [1] "Thu Sep 27 13:58:04 2012"

Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.

(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by

0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72

salary = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
salary
##  [1] 0.57 0.89 1.08 1.12 1.18 1.07 1.17 1.38 1.44 1.72

Find the differences from one year to next.

saldiff = diff(salary)
saldiff
## [1]  0.32  0.19  0.04  0.06 -0.11  0.10  0.21  0.06  0.28

For all years that there was an increase in average salary over the previous year, what is the average over all increases?

mean(saldiff > 0)
## [1] 0.8889

(2) Create a histogram from the sample of temperatures in the airquality data frame.

Temperature = c(airquality$Temp)
hist(Temperature)

plot of chunk unnamed-chunk-5

What are the maximum and minimum temperatures in the sample?

max(Temperature)
## [1] 97
min(Temperature)
## [1] 56

What are the mean and median temperatures?

mean(Temperature)
## [1] 77.88
median(Temperature)
## [1] 79

Arrange the temperatures from smallest to largest.

sort(Temperature, decreasing = FALSE)
##   [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
##  [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
##  [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
##  [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
##  [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97

What is the 97th percentile temperature value?

quantile(Temperature, 0.97)
## 97% 
##  93 

What is the rank of the 50th temperature value in the series?

rank(Temperature)[50]
## [1] 42

(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set? What are the names of the columns?

require(UsingR)
## Loading required package: UsingR
## Loading required package: MASS
dim(carbon)
## [1] 24  2
names(carbon)
## [1] "Monoxide" "Site"    

Create side-by-side box plots showing the distribution of monoxide levels at the different sites.

require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
## 
## movies
ggplot(carbon, aes(x = factor(Site), y = Monoxide)) + xlab("Site") + ylab("Carbon Monoxide Levels") + 
    geom_boxplot()

plot of chunk unnamed-chunk-12

(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R.

dailyice = read.table("http://myweb.fsu.edu/jelsner/DailyIceVolume.txt", header = TRUE)

Subset the data frame for the years 2000-2011, inclusive.

subset = subset(dailyice, Year <= 2011 & Year >= 2000)

For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume.

require(ggplot2)
ggplot(subset, aes(x = factor(Year), y = Vol)) + xlab("Year") + geom_boxplot()

plot of chunk unnamed-chunk-15

Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?

ameans = tapply(subset$Vol, subset$Year, mean)
t.test(diff(ameans))
## 
##  One Sample t-test
## 
## data:  diff(ameans) 
## t = -2.229, df = 10, p-value = 0.04991
## alternative hypothesis: true mean is not equal to 0 
## 95 percent confidence interval:
##  -1.1732056 -0.0002893 
## sample estimates:
## mean of x 
##   -0.5867 
## 

p-value is close to zero so I conclude that the data are consistent with the rule.