# Save the dataset fruitfly and rmarkdown file in the same folder!!!
# Import dataset fruitfly
fruitfly<-read.csv("fruitfly.csv")
Compare the distribution of lifespan among the five experimental groups of fruitflies.
Hint: use the Console in Rstudio to examine the dateset before attempting this exercise. For instance, type “names(fruitfly)” (without quotes) in the Console to see the variables in the data set and type “fruitfly” to see the entire dataset.
# Plot the appropriate figure to visualize the association between one quantitative variable and one categorical variable.
boxplot(lifespan~type, data = fruitfly)
We use a boxplot
# get the group means of one quantitative varible categorized by one categorical variable
# tapply(quantitative, categorical, function)
tapply(fruitfly$lifespan, fruitfly$type, mean)
## 1 2 3 4 5
## 63.56 64.80 63.36 56.76 38.72
tapply(fruitfly$lifespan, fruitfly$type, sd)
## 1 2 3 4 5
## 16.45215 15.65248 14.53983 14.92838 12.10207
The 8 virgin females have the shortest lifespan (type=5)
Let’s compare the lifespan distribution between the group supplied with 8 virgin females and the group supplied with 8 newly pregnant females with the normal distribution.
# Supplied with 8 newly pregnant females N(63.4, 14.5)
# P(X<30)
round(pnorm(30, 63.4, 14.5), 4)
## [1] 0.0106
# P(30<X<50)
round(diff(pnorm(c(30,50), 63.4, 14.5)), 4)
## [1] 0.1671
# P(50<X<70)
round(diff(pnorm(c(50,70), 63.4, 14.5)), 4)
## [1] 0.4978
# P(X>70)
round(1-pnorm(70, 63.4, 14.5), 4)
## [1] 0.3245
# Supplied with 8 virgin females N(38.7,12.1)
# P(X<30)
round(pnorm(30, 38.7, 12.1), 4)
## [1] 0.2361
# P(30<X<50)
round(diff(pnorm(c(30,50), 38.7, 12.1)), 4)
## [1] 0.5888
# P(50<X<70)
round(diff(pnorm(c(50,70), 38.7, 12.1)), 4)
## [1] 0.1703
# P(X>70)
round(1-pnorm(70, 38.7, 12.1), 4)
## [1] 0.0048
tapply(fruitfly$lifespan, fruitfly$type, mean)
## 1 2 3 4 5
## 63.56 64.80 63.36 56.76 38.72
mean(c(81, 65, 56, 70, 56))
## [1] 65.6
Supplied with 8 newly pregnant females mean: 63.36 days Supplied with 8 virin females mean: 38.72 days
Answer: The fruitflies came from the newly pregnany females because the mean lifespan of the 8 newly pregnant females is closer to the mean of the escaped experimental fruitflies compared to the mean of the 8 virgin female fruitflies.
fruitflysubset<-subset(fruitfly,type==5)
Fill in the table to compare the theoretical quantiles (calculated using the normal distribution) and observed quantiles (calculated using fruitflysubset) from the two groups. (Round all quantiles to one decimal place.)
# Supplied with 8 virgin females N(38.7,12.1)
# Theoretical quantiles
round(qnorm(p=c(0.10,0.25,0.50,0.75,0.90),mean=38.72,sd=12.10),digits=1)
## [1] 23.2 30.6 38.7 46.9 54.2
# Observted quantiles
round(quantile(x=fruitflysubset$lifespan,probs=c(0.10,0.25,0.50,0.75,0.90)),digits=1)
## 10% 25% 50% 75% 90%
## 21.8 32.0 40.0 47.0 54.0
hist(x=fruitfly$lifespan)
hist(x=fruitflysubset$lifespan)
Answer: The theoretical and observed quantiles are close together.This particular sample validates the assumption that a lifespan follows a normal distribution.
Let’s compare the lifespan distribution between the group supplied with 8 virgin females and the group supplied with 8 newly pregnant females with the binomial distribution.
# Supplied with 8 virgin females N(38.7,12.1)
fruitflysubset<-subset(fruitfly,type==5)
# P(X>50)
round(1-pnorm(50, 38.7, 12.1), 4)
## [1] 0.1752
# 8 virgin females N(38.7,12.1) probability
round(data.frame(x=0:10,prob=dbinom(x=0:10,size=10,prob=0.1752)), 4)
## x prob
## 1 0 0.1457
## 2 1 0.3095
## 3 2 0.2958
## 4 3 0.1676
## 5 4 0.0623
## 6 5 0.0159
## 7 6 0.0028
## 8 7 0.0003
## 9 8 0.0000
## 10 9 0.0000
## 11 10 0.0000