date()
## [1] "Thu Sep 27 13:58:22 2012"
Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.
(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by
0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72
Find the differences from one year to next.
money = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
differences = diff(money)
differences
## [1] 0.32 0.19 0.04 0.06 -0.11 0.10 0.21 0.06 0.28
For all years that there was an increase in average salary over the previous year, what is the average over all increases?
average = mean(differences[-5])
average
## [1] 0.1575
(2) Create a histogram from the sample of temperatures in the airquality data frame.
require(UsingR)
## Loading required package: UsingR
## Loading required package: MASS
data(airquality)
hist(airquality$Temp)
What are the maximum and minimum temperatures in the sample?
temperatures = airquality$Temp
maximum = max(temperatures)
maximum
## [1] 97
minimum = min(temperatures)
minimum
## [1] 56
What are the mean and median temperatures? Arrange the temperatures from smallest to largest.
sortT = sort(temperatures)
sortT
## [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
## [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
## [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
## [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
## [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97
meanT = mean(temperatures)
meanT
## [1] 77.88
medianT = median(temperatures)
medianT
## [1] 79
What is the 97th percentile temperature value?
T97 = quantile(temperatures, probs = 0.97)
T97
## 97%
## 93
What is the rank of the 50th temperature value in the series?
rank(temperatures, na.last = TRUE, ties.method = 50)
## Error: 'arg' must be NULL or a character vector
(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites.
What are the dimensions of this data set?
require(UsingR)
data(carbon)
dimensions = dim(carbon)
dimensions
## [1] 24 2
What are the names of the columns?
names(carbon)
## [1] "Monoxide" "Site"
Create side-by-side box plots showing the distribution of monoxide levels at the different sites.
require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
##
## movies
ggplot(carbon, aes(x = factor(Site), y = Monoxide)) + geom_boxplot()
(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R.
loc = "http://myweb.fsu.edu/jelsner/DailyIceVolume.txt"
DIV = read.table(loc, header = TRUE)
Subset the data frame for the years 2000-2011, inclusive.
SubD = subset(DIV, DIV$Year >= 2000 & DIV$Year <= 2011)
For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume.
ggplot(DIV, aes(x = factor(Year), y = Vol)) + geom_boxplot()
Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?
m1 = mean(SubD$Year == 2000)
m2 = mean(SubD$Year == 2001)
m3 = mean(SubD$Year == 2002)
m4 = mean(SubD$Year == 2003)
m5 = mean(SubD$Year == 2004)
m6 = mean(SubD$Year == 2005)
m7 = mean(SubD$Year == 2006)
m8 = mean(SubD$Year == 2007)
m9 = mean(SubD$Year == 2008)
m10 = mean(SubD$Year == 2009)
m11 = mean(SubD$Year == 2010)
m12 = mean(SubD$Year == 2011)
m = c(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12)
m = mean(SubD$Vol)
t.test(SubD$Vol, m)
## Error: not enough 'y' observations