date()
## [1] "Thu Sep 27 13:52:39 2012"
Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.
(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by
0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72
Find the differences from one year to next. For all years that there was an increase in average salary over the previous year, what is the average over all increases?
salary = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
diff(salary)
## [1] 0.32 0.19 0.04 0.06 -0.11 0.10 0.21 0.06 0.28
which(diff(salary) > 0)
## [1] 1 2 3 4 6 7 8 9
mean(diff(salary)[-5])
## [1] 0.1575
(2) Create a histogram from the sample of temperatures in the airquality data frame. What are the maximum and minimum temperatures in the sample? What are the mean and median temperatures? Arrange the temperatures from smallest to largest. What is the 97th percentile temperature value? What is the rank of the 50th temperature value in the series?
require(UsingR)
## Loading required package: UsingR
## Loading required package: MASS
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
attach(airquality)
hist(Temp, main = "Histogram of Temperatures", xlab = "Temperature")
max(Temp)
## [1] 97
min(Temp)
## [1] 56
mean(Temp)
## [1] 77.88
median(Temp)
## [1] 79
sort(Temp)
## [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
## [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
## [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
## [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
## [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97
quantile(Temp, probs = c(0.97))
## 97%
## 93
rank(Temp)
## [1] 23.5 38.0 46.5 12.5 1.0 20.0 17.5 7.5 10.0 31.0 46.5
## [12] 31.0 20.0 27.5 5.5 15.5 20.0 3.0 27.5 12.5 7.5 42.0
## [23] 10.0 10.0 3.0 5.5 3.0 23.5 91.0 77.5 57.0 71.5 46.5
## [34] 23.5 112.0 117.0 77.5 101.0 129.0 138.0 129.0 148.0 144.0 101.0
## [45] 83.0 77.5 65.0 38.0 17.5 42.0 57.0 65.0 57.0 57.0 57.0
## [56] 50.5 71.5 42.0 83.0 65.0 107.5 112.0 117.0 91.0 112.0 107.5
## [67] 107.5 133.0 144.0 144.0 135.5 101.0 42.0 91.0 140.5 83.0 91.0
## [78] 101.0 112.0 129.0 117.0 46.5 91.0 101.0 123.0 117.0 101.0 123.0
## [89] 133.0 123.0 107.5 91.0 91.0 91.0 101.0 123.0 117.0 129.0 135.5
## [100] 138.0 138.0 144.0 123.0 123.0 101.0 83.0 77.5 65.0 77.5 57.0
## [111] 71.5 71.5 65.0 38.0 50.5 77.5 91.0 123.0 133.0 153.0 150.5
## [122] 152.0 150.5 140.5 144.0 148.0 148.0 129.0 112.0 83.0 71.5 50.5
## [133] 42.0 91.0 57.0 65.0 35.0 35.0 71.5 23.5 57.0 27.5 101.0
## [144] 15.5 35.0 91.0 31.0 14.0 33.0 65.0 50.5 57.0 27.5
rank(Temp)[50]
## [1] 42
detach(airquality)
(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set? What are the names of the columns? Create side-by-side box plots showing the distribution of monoxide levels at the different sites.
head(carbon)
## Monoxide Site
## 1 0.106 1
## 2 0.127 1
## 3 0.132 1
## 4 0.105 1
## 5 0.117 1
## 6 0.109 1
dim(carbon)
## [1] 24 2
names(carbon)
## [1] "Monoxide" "Site"
require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
##
## movies
ggplot(carbon, aes(x = factor(Site), y = Monoxide)) + geom_boxplot()
(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R. Subset the data frame for the years 2000-2011, inclusive. For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume. Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?
loc = "http://myweb.fsu.edu/jelsner/DailyIceVolume.txt"
Ice = read.table(loc, header = TRUE)
head(Ice)
## Year day Vol
## 1 1979 1 26.41
## 2 1979 2 26.50
## 3 1979 3 26.58
## 4 1979 4 26.67
## 5 1979 5 26.77
## 6 1979 6 26.87
attach(Ice)
Ice.df = subset(Ice, Year == 2000:2011)
## Warning: longer object length is not a multiple of shorter object length
ggplot(Ice.df, aes(x = factor(Year), y = Vol)) + geom_boxplot()
t.test(Vol[Year == 2000], Vol[Year == 2011]) #since we have a very small p-vale, we can reject the null hypothesis and say that they are not the same.
##
## Welch Two Sample t-test
##
## data: Vol[Year == 2000] and Vol[Year == 2011]
## t = 14.27, df = 723.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 5.566 7.342
## sample estimates:
## mean of x mean of y
## 19.63 13.18
##
detach(Ice)