date()
## [1] "Thu Sep 27 14:52:17 2012"
Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.
(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by
0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72
Find the differences from one year to next. For all years that there was an increase in average salary over the previous year, what is the average over all increases?
(2) Create a histogram from the sample of temperatures in the airquality data frame. What are the maximum and minimum temperatures in the sample? What are the mean and median temperatures? Arrange the temperatures from smallest to largest. What is the 97th percentile temperature value? What is the rank of the 50th temperature value in the series?
(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set? What are the names of the columns? Create side-by-side box plots showing the distribution of monoxide levels at the different sites.
(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R. Subset the data frame for the years 2000-2011, inclusive. For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume. Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?
(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by
0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72
Find the differences from one year to next.
avgSalary = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
diff(avgSalary)
## [1] 0.32 0.19 0.04 0.06 -0.11 0.10 0.21 0.06 0.28
For all years that there was an increase in average salary over the previous year, what is the average over all increases?
year = c(1990:1999)
BS = data.frame(year, avgSalary)
BS
## year avgSalary
## 1 1990 0.57
## 2 1991 0.89
## 3 1992 1.08
## 4 1993 1.12
## 5 1994 1.18
## 6 1995 1.07
## 7 1996 1.17
## 8 1997 1.38
## 9 1998 1.44
## 10 1999 1.72
increase = sum(diff(avgSalary) > 0)
increase/length(year)
## [1] 0.8
(2) Create a histogram from the sample of temperatures in the airquality data frame.
attach(airquality)
hist(airquality$Temp)
What are the maximum and minimum temperatures in the sample?
summary(airquality$Temp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 56.0 72.0 79.0 77.9 85.0 97.0
What are the mean and median temperatures?
mean(airquality$Temp)
## [1] 77.88
median(airquality$Temp)
## [1] 79
Arrange the temperatures from smallest to largest.
sort(airquality$Temp)
## [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
## [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
## [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
## [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
## [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97
What is the 97th percentile temperature value?
quantile(airquality$Temp, probs = 0.97)
## 97%
## 93
What is the rank of the 50th temperature value in the series?
rank(airquality$Temp)[50]
## [1] 42
(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set?
require("UsingR")
## Loading required package: UsingR
## Loading required package: MASS
attach(carbon)
dim(carbon)
## [1] 24 2
What are the names of the columns?
head(carbon)
## Monoxide Site
## 1 0.106 1
## 2 0.127 1
## 3 0.132 1
## 4 0.105 1
## 5 0.117 1
## 6 0.109 1
names(carbon)
## [1] "Monoxide" "Site"
Create side-by-side box plots showing the distribution of monoxide levels at the different sites.
require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
##
## movies
ggplot(carbon, aes(x = Site, y = Monoxide)) + geom_boxplot() + facet_grid(~Site)
(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R.
loc = "http://myweb.fsu.edu/jelsner/DailyIceVolume.txt"
DIV = read.table(loc, header = TRUE)
Subset the data frame for the years 2000-2011, inclusive.
attach(DIV)
subDIV = subset(DIV, select = c(Year, day, Vol), subset = (Year > 1999))
For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume.
require(ggplot2)
ggplot(subDIV, aes(x = Year, y = Vol)) + geom_boxplot() + facet_grid(~Year)
Perform a t test on the difference in annual means between 2000 and 2011.
t.test(mean(subDIV$Year[1, 1]), mean(DIV$Year[1, 13]))
## Error: incorrect number of dimensions
What do you conclude?