Exam # 1

Param Maharaj

date()
## [1] "Thu Sep 27 13:59:23 2012"

Due Date/Time: September 27, 2012, 1:45pm
Each question is worth 25 points.

(1) The average salary in major league baseball for the years 1990-1999 are given (in millions) by

0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72

Find the differences from one year to next. For all years that there was an increase in average salary over the previous year, what is the average over all increases?

salaries = c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.44, 1.72)
diff(salaries)
## [1]  0.32  0.19  0.04  0.06 -0.11  0.10  0.21  0.06  0.28
mean(diff(salaries) > "0")
## [1] 0.8889

(2) Create a histogram from the sample of temperatures in the airquality data frame. What are the maximum and minimum temperatures in the sample? What are the mean and median temperatures? Arrange the temperatures from smallest to largest. What is the 97th percentile temperature value? What is the rank of the 50th temperature value in the series?

require(UsingR)
## Loading required package: UsingR
## Loading required package: MASS
require(ggplot2)
## Loading required package: ggplot2
## Attaching package: 'ggplot2'
## The following object(s) are masked from 'package:UsingR':
## 
## movies
require(spgwr)
## Loading required package: spgwr
## Loading required package: sp
## Loading required package: maptools
## Loading required package: foreign
## Loading required package: lattice
## Checking rgeos availability: FALSE Note: when rgeos is not available,
## polygon geometry computations in maptools depend on gpclib, which has a
## restricted licence. It is disabled by default; to enable gpclib, type
## gpclibPermit()
## NOTE: This package does not constitute approval of GWR as a method of
## spatial analysis
require(knitr)
names(airquality)
## [1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"    
hist(airquality$Temp)

plot of chunk airqualityHistogram

max(airquality$Temp)
## [1] 97
min(airquality$Temp)
## [1] 56
mean(airquality$Temp)
## [1] 77.88
median(airquality$Temp)
## [1] 79
sort(airquality$Temp)
##   [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67
##  [24] 67 67 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74
##  [47] 74 74 75 75 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78
##  [70] 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81
##  [93] 81 81 81 81 82 82 82 82 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85
## [116] 85 85 85 85 86 86 86 86 86 86 86 87 87 87 87 87 88 88 88 89 89 90 90
## [139] 90 91 91 92 92 92 92 92 93 93 93 94 94 96 97
quantile(airquality$Temp, 0.97)
## 97% 
##  93 
airquality$Temp[97]
## [1] 85
airquality$Temp[50]
## [1] 73

(3) The data set carbon (UsingR) contains a list of carbon monoxide levels at three different sites. What are the dimensions of this data set? What are the names of the columns? Create side-by-side box plots showing the distribution of monoxide levels at the different sites.

require(UsingR)
dim(carbon)
## [1] 24  2
names(carbon)
## [1] "Monoxide" "Site"    
boxplot(carbon$Site, carbon$Monoxide)  #gave me the side by side boxplots, but looks bad

plot of chunk carbon

ggplot(carbon, aes(x = Site, y = Monoxide)) + geom_boxplot()  #gave me a better looking boxplot, but just one big one

plot of chunk carbon


(4) Read the DailyIceVolume data set from the connection http://myweb.fsu.edu/jelsner/DailyIceVolume.txt into R. Subset the data frame for the years 2000-2011, inclusive. For the years 2000-2011, create side-by-side box plots showing the distribution of ice volume. Perform a t test on the difference in annual means between 2000 and 2011. What do you conclude?

loc = "http://myweb.fsu.edu/jelsner/DailyIceVolume.txt"
Ice = read.table(loc, header = TRUE)
names(Ice)
## [1] "Year" "day"  "Vol" 
attach(Ice)
Year[Year == "2000:2011"]  #could not get subset to work, tried many variations
## integer(0)
ggplot(Ice, aes(x = Year, y = Vol)) + geom_boxplot()

plot of chunk importfromWeb


t.test(Vol[Year == 2000], Vol[Year == 2011])
## 
##  Welch Two Sample t-test
## 
## data:  Vol[Year == 2000] and Vol[Year == 2011] 
## t = 14.27, df = 723.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  5.566 7.342 
## sample estimates:
## mean of x mean of y 
##     19.63     13.18 
##