By: Jessica Wheeler


Question 1

load("~/Documents/STAT 5301/all_three_datasets_DOWNLOAD_THIS/cbusairtemp.RData")
hist(air, xlab="Air Temperature (Farenheit)")

  1. This distribution is skewed to the left since it has a long left tail. It is bimodal since it has two major peaks around 35 and 75 degrees. From the histogram the center looks to be around 55 degrees. The histogram has a spread from -10 to 90 degrees. There do not seem to be any outliers.

plot(air,type='l',xlab="Days",ylab="Air Temperature (Farenheit)",main="Time Plot: Air Temperatures")

We see a general repeating pattern in the air temperature every year (about 365 days). However, comparing the temperatures around 200 days to the temperature around 550 days, our time plot shows that the second year reached significantly colder temperatures than did the first.


Question 2

load("~/Documents/STAT 5301/all_three_datasets_DOWNLOAD_THIS/talk.RData")

Stem Plot for Women:

stem(talk$WordsPerDay[talk$GenderMale1=="F"])
## 
##   The decimal point is 4 digit(s) to the right of the |
## 
##   0 | 2
##   0 | 5567888888899
##   1 | 122244
##   1 | 5566677777788
##   2 | 0111224
##   2 | 5
##   3 | 2

Stem Plot for Men:

stem(talk$WordsPerDay[talk$GenderMale1=="M"])
## 
##   The decimal point is 4 digit(s) to the right of the |
## 
##   0 | 1223
##   0 | 55577899
##   1 | 0000011222
##   1 | 56777
##   2 | 24444
##   2 | 67
##   3 | 01
##   3 | 6

We see that the distribution for women is highly concentrated in the middle while the distribution for men is more spread out with its main peak around 11000 words. The stem plot for women has two main peaks at 7000 and 17000 words. Both seem to have a center around 15000 words. Also, both seem close to symmetrically shaped but more slightly skewed right.

boxplot(talk$WordsPerDay~talk$GenderMale1,main = "Words Per Day: Women Vs. Men",ylab = "Words Per Day")

summary(talk$WordsPerDay)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     695    8346   12460   14190   18050   36340
18050+1.5*IQR(talk$WordsPerDay)
## [1] 32606.75

There seems to be an outlier in the Female group: the female that speaks 32,291 words per day. However, since 32,291 is less than 32,606.75 (1.5*IQR), we conclude that it is not a potential outlier by the 1.5xIQR rule of thumb. The men have no outlier.

  1. I do not think that this data supports the stereotype that women are more talkative than men. We can see from the boxplot that the average woman might talk more than the average man, but overall the distribution is more wide spread for men. Also, from our data the gender with the most talkative person is the men. Thus, overall the data does not support that women are overall more talkative than men.

Question 3

load("~/Documents/STAT 5301/all_three_datasets_DOWNLOAD_THIS/wineries.RData")
summary(wineries)
##       Date     
##  Min.   :1860  
##  1st Qu.:1934  
##  Median :1948  
##  Mean   :1947  
##  3rd Qu.:1975  
##  Max.   :1983
boxplot(wineries, ylab="Year", main="Boxplot for Wineries")

par(mfrow=c(1,3))
hist(wineries$Date,breaks=3,main="3 breaks")
hist(wineries$Date,breaks=11, main="14 breaks")
hist(wineries$Date,breaks=50, main="50 breaks")

I prefer the option of 11 breaks. 3 breaks do not show enough information and 50 breaks is too cluttered. 11 breaks show the distribution without showing too much detail.

  1. Our boxplot tells us that there was a high concentration of wineries being founded between about 1910 and 1970 with the majority around 1950. We also see that our data lies between 1870 and 1980 except one founded in 1860. Our histogram shows a nice increase of frequency of wineries being founded as time goes on. We see a somewhat general increase with a few major peaks around 1875, 1905, 1940, and 1980. One could prefer either the boxplot or histogram depending on their research interest, but I think that the histogram gives a better overall picture. It clearly shows 4 major peaks while also demonstrating that over time there are more and more wineries being founded overall. I think that the boxplot shows that many more wineries were founded in the the mid to late 1900’s, but it does not clearly demonstrate the major peaks.

Question 4

mu<-202
sigma<-21
mu-2*sigma
## [1] 160
mu+2*sigma
## [1] 244
  1. The middle 95% of player weights are between 160 and 244 pounds.
mu+2*sigma
## [1] 244
  1. Since \(\mu\) is at the 50% mark and 95% of weights fall between \(\mu-2\sigma\) and \(\mu+2\sigma\), then 50% + 95% / 2 = 50% + 47.5% = 97.5% fall at \(\mu+2\sigma\). Thus, the heaviest 100% - 97.5% = 2.5% fall above \(\mu+2\sigma\). The heaviest 2.5% of players are above 244 pounds.
mu-sigma
## [1] 181
  1. Since \(\mu\) is at 50% and 68% of weights fall within \(\mu-\sigma\) and \(\mu+\sigma\), then 50% - (68% / 2) = 50% - 34% = 16% of weights fall below \(\mu-\sigma\). The lightest 16% of players are below 181 pounds.