# Save the dataset fruitfly and rmarkdown file in the same folder!!!
# Import dataset fruitfly
fruitfly <- read.csv("fruitfly.csv")
Compare the distribution of lifespan among the five experimental groups of fruitflies.
Hint: use the Console in Rstudio to examine the dateset before attempting this exercise. For instance, type “names(fruitfly)” (without quotes) in the Console to see the variables in the data set and type “fruitfly” to see the entire dataset.
# Plot the appropriate figure to visualize the association between one quantitative variable and one categorical variable.
boxplot(lifespan~type, data = fruitfly)
# get the group means of one quantitative varible categorized by one categorical variable
# tapply(quantitative, categorical, function)
tapply(fruitfly$lifespan, fruitfly$type, mean)
## 1 2 3 4 5
## 63.56 64.80 63.36 56.76 38.72
The group with the shortest lifespan with the average lifespan is group 5: Virgin Females with with a mean of 38.72 and standard deviation of 12.1020659
38.7 12.1 ### Question 2:
Let’s compare the lifespan distribution between the group supplied with 8 virgin females and the group supplied with 8 newly pregnant females with the normal distribution.
# Supplied with 8 newly pregnant females N(63.4, 14.5)
# P(X<30)
round(pnorm(30, 63.4, 14.5), 4)
## [1] 0.0106
# P(30<X<50)
round(diff(pnorm(c(30,50), 63.4, 14.5)), 4)
## [1] 0.1671
# P(50<X<70)
round(diff(pnorm(c(50,70), 63.4, 14.5)), 4)
## [1] 0.4978
# P(X>70)
round(1-pnorm(70, 63.4, 14.5), 4)
## [1] 0.3245
# Supplied with 8 virgin females N(38.7, 12.1)
----------
# P(X<30)
round(pnorm(30, 37.8, 12.1), 4)
## [1] 0.2596
# P(30<X<50)
round(diff(pnorm(c(30,50), 37.8, 12.1)), 4)
## [1] 0.5838
# P(50<X<70)
round(diff(pnorm(c(50,70), 37.8, 12.1)), 4)
## [1] 0.1528
# P(X>70)
round(1-pnorm(70, 37.8, 12.1), 4)
## [1] 0.0039
65.6
The flies most likely came from the ‘supplied with 8 newly pregnant females’ as the mean of the 5 escaped flies was 65.5 days. This is closer to the mean of the male fruit flies in the group ‘supplied with 8 pregnant females’ of 63.4. The group ‘supplied with 8 virgin females’ had a mean of approximately 38.7 which is far shorter than the average lifespan of the five escaped flies.
fruitflysubset<-subset(fruitfly,type==5) norm(fruitflysubset)
round(qnorm(p=c(.10,.25,.50,.75,.90), mean=38.72, sd=12.10207), 1) #Observed round(quantile(x=fruitflysubset$lifespan, probs=c(.10,.25,.50,.75,.90)), 1) #10th percentile (Theoretical, Observed) (23.2, 21.8) #25th percentile (Theoretical, Observed) (30.6, 32) #50th percentile (Theoretical, Observed) (38.7, 40) #75th percentile (Theoretical, Observed) (46.9, 47) #90th percentile (Theoretical, Observed) (54.2, 54)
The theoretical and observed values are close together so this validates the assumption that lifespan follows a normal distribution.