Exam # 1

Michael Patterson

date()
## [1] "Thu Sep 27 13:56:05 2012"

Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.

(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by

0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72

Find the differences from one year to next. For all years that there was an increase in average salary over the previous year, what is the average over all increases?

cash = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
diff(cash)
## [1]  0.32  0.19  0.04  0.06 -0.11  0.10  0.21  0.06  0.28
cashchange = c(diff(cash))
mean(cashchange[-which(cashchange < 0)])
## [1] 0.1575

(2) Create a histogram from the sample of temperatures in the airquality data frame. What are the maximum and minimum temperatures in the sample? What are the mean and median temperatures? Arrange the temperatures from smallest to largest. What is the 97th percentile temperature value? What is the rank of the 50th temperature value in the series?

attach(airquality)
max(Temp)
## [1] 97
min(Temp)
## [1] 56
mean(Temp)
## [1] 77.88
median(Temp)
## [1] 79
sort(Temp, decreasing = FALSE)
##   [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
##  [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
##  [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
##  [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
##  [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97
quantile(Temp, 0.97)
## 97% 
##  93 
rank(Temp)[50]
## [1] 42
detach(airquality)

(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set? What are the names of the columns? Create side-by-side box plots showing the distribution of monoxide levels at the different sites.

install.packages("UsingR")
## Installing package(s) into 'C:/Users/quant07/Documents/R/win-library/2.15'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
require(UsingR)
## Loading required package: UsingR
## Loading required package: MASS
attach(carbon)
dim(carbon)  #Dimensions of the data set
## [1] 24  2
names(carbon)
## [1] "Monoxide" "Site"    
install.packages("ggplot2")
## Installing package(s) into 'C:/Users/quant07/Documents/R/win-library/2.15'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
## 
## movies
ggplot(carbon, aes(x = as.factor(Site), y = Monoxide)) + geom_boxplot() + ylab("Monoxide") + 
    xlab("Site")

plot of chunk unnamed-chunk-4

detach(carbon)

(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R. Subset the data frame for the years 2000-2011, inclusive. For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume. Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?

loc = "http://myweb.fsu.edu/jelsner/DailyIceVolume.txt"
inc = read.table(loc, header = TRUE)
attach(inc)
oughts = subset(inc, Year >= 2000, c(Year, day, Vol), drop = FALSE)
cutoughts = subset(oughts, Year < 2012, c(Year, day, Vol), drop = FALSE)
ggplot(cutoughts, aes(as.factor(Year), Vol)) + geom_boxplot() + ylab("Volume") + 
    xlab("Year")

plot of chunk unnamed-chunk-5

detach(inc)
require("reshape2")
## Loading required package: reshape2
incwide = dcast(cutoughts, day ~ Year, value.var = "Vol")
incdata = as.data.frame(incwide)
attach(incdata)

means = c(mean(incdata[1:365, 2]), mean(incdata[1:365, 3]), mean(incdata[1:365, 
    4]), mean(incdata[1:365, 5]), mean(incdata[1:365, 6]), mean(incdata[1:365, 
    7]), mean(incdata[1:365, 8]), mean(incdata[1:365, 9]), mean(incdata[1:365, 
    10]), mean(incdata[1:365, 11]), mean(incdata[1:365, 12]), mean(incdata[1:365, 
    13]))
t.test(diff(means))
## 
##  One Sample t-test
## 
## data:  diff(means) 
## t = -2.229, df = 10, p-value = 0.04991
## alternative hypothesis: true mean is not equal to 0 
## 95 percent confidence interval:
##  -1.1732056 -0.0002893 
## sample estimates:
## mean of x 
##   -0.5867 
## 
# with P value of .05. The differences are significant