By: Jessica Wheeler
load("~/Documents/STAT 5301/all_three_datasets_DOWNLOAD_THIS/cbusairtemp.RData")
hist(air, xlab="Air Temperature (Farenheit)")
This distribution is skewed to the left since it has a long left tail. It is bimodal since it has two major peaks around 35 and 75 degrees. From the histogram the center looks to be around 55 degrees. The histogram has a spread from -10 to 90 degrees. There do not seem to be any outliers.
plot(air,type='l',xlab="Days",ylab="Air Temperature (Farenheit)",main="Time Plot: Air Temperatures")
We see a general repeating pattern in the air temperature every year (about 365 days). However, comparing the temperatures around 200 days to the temperature around 550 days, our time plot shows that the second year reached significantly colder temperatures than did the first.
load("~/Documents/STAT 5301/all_three_datasets_DOWNLOAD_THIS/talk.RData")
Stem Plot for Women:
stem(talk$WordsPerDay[talk$GenderMale1=="F"])
##
## The decimal point is 4 digit(s) to the right of the |
##
## 0 | 2
## 0 | 5567888888899
## 1 | 122244
## 1 | 5566677777788
## 2 | 0111224
## 2 | 5
## 3 | 2
Stem Plot for Men:
stem(talk$WordsPerDay[talk$GenderMale1=="M"])
##
## The decimal point is 4 digit(s) to the right of the |
##
## 0 | 1223
## 0 | 55577899
## 1 | 0000011222
## 1 | 56777
## 2 | 24444
## 2 | 67
## 3 | 01
## 3 | 6
We see that the distribution for women is highly concentrated in the middle while the distribution for men is more spread out with its main peak around 11000 words. The stem plot for women has two main peaks at 7000 and 17000 words. Both seem to have a center around 15000 words. Also, both seem close to symmetrically shaped but more slightly skewed right.
boxplot(talk$WordsPerDay~talk$GenderMale1,main = "Words Per Day: Women Vs. Men",ylab = "Words Per Day")
summary(talk$WordsPerDay)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 695 8346 12460 14190 18050 36340
18050+1.5*IQR(talk$WordsPerDay)
## [1] 32606.75
There seems to be an outlier in the Female group: the female that speaks 32,291 words per day. However, since 32,291 is less than 32,606.75 (1.5*IQR), we conclude that it is not a potential outlier by the 1.5xIQR rule of thumb. The men have no outlier.
load("~/Documents/STAT 5301/all_three_datasets_DOWNLOAD_THIS/wineries.RData")
summary(wineries)
## Date
## Min. :1860
## 1st Qu.:1934
## Median :1948
## Mean :1947
## 3rd Qu.:1975
## Max. :1983
boxplot(wineries, ylab="Year", main="Boxplot for Wineries")
par(mfrow=c(1,3))
hist(wineries$Date,breaks=3,main="3 breaks")
hist(wineries$Date,breaks=11, main="14 breaks")
hist(wineries$Date,breaks=50, main="50 breaks")
I prefer the option of 11 breaks. 3 breaks do not show enough information and 50 breaks is too cluttered. 11 breaks show the distribution without showing too much detail.
mu<-202
sigma<-21
mu-2*sigma
## [1] 160
mu+2*sigma
## [1] 244
mu+2*sigma
## [1] 244
mu-sigma
## [1] 181