date()
## [1] "Thu Sep 27 13:58:04 2012"
Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.
(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by
0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72
salary = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
salary
## [1] 0.57 0.89 1.08 1.12 1.18 1.07 1.17 1.38 1.44 1.72
Find the differences from one year to next.
saldiff = diff(salary)
saldiff
## [1] 0.32 0.19 0.04 0.06 -0.11 0.10 0.21 0.06 0.28
For all years that there was an increase in average salary over the previous year, what is the average over all increases?
mean(saldiff > 0)
## [1] 0.8889
(2) Create a histogram from the sample of temperatures in the airquality data frame.
Temperature = c(airquality$Temp)
hist(Temperature)
What are the maximum and minimum temperatures in the sample?
max(Temperature)
## [1] 97
min(Temperature)
## [1] 56
What are the mean and median temperatures?
mean(Temperature)
## [1] 77.88
median(Temperature)
## [1] 79
Arrange the temperatures from smallest to largest.
sort(Temperature, decreasing = FALSE)
## [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
## [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
## [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
## [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
## [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97
What is the 97th percentile temperature value?
quantile(Temperature, 0.97)
## 97%
## 93
What is the rank of the 50th temperature value in the series?
rank(Temperature)[50]
## [1] 42
(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set? What are the names of the columns?
require(UsingR)
## Loading required package: UsingR
## Loading required package: MASS
dim(carbon)
## [1] 24 2
names(carbon)
## [1] "Monoxide" "Site"
Create side-by-side box plots showing the distribution of monoxide levels at the different sites.
require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
##
## movies
ggplot(carbon, aes(x = factor(Site), y = Monoxide)) + xlab("Site") + ylab("Carbon Monoxide Levels") +
geom_boxplot()
(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R.
dailyice = read.table("http://myweb.fsu.edu/jelsner/DailyIceVolume.txt", header = TRUE)
Subset the data frame for the years 2000-2011, inclusive.
subset = subset(dailyice, Year <= 2011 & Year >= 2000)
For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume.
require(ggplot2)
ggplot(subset, aes(x = factor(Year), y = Vol)) + xlab("Year") + geom_boxplot()
Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?
ameans = tapply(subset$Vol, subset$Year, mean)
t.test(diff(ameans))
##
## One Sample t-test
##
## data: diff(ameans)
## t = -2.229, df = 10, p-value = 0.04991
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -1.1732056 -0.0002893
## sample estimates:
## mean of x
## -0.5867
##
p-value is close to zero so I conclude that the data are consistent with the rule.