# Save the dataset fruitfly and rmarkdown file in the same folder!!!
# Import dataset fruitfly

fruitfly<-read.csv("fruitfly.csv")

Question 1:

Compare the distribution of lifespan among the five experimental groups of fruitflies.

  1. Produce an appropriate figure to compare the distribution of lifespan among the five experimental groups of fruitflies. What figure did you produce?

Hint: use the Console in Rstudio to examine the dateset before attempting this exercise. For instance, type “names(fruitfly)” (without quotes) in the Console to see the variables in the data set and type “fruitfly” to see the entire dataset.

# Plot the appropriate figure to visualize the association between one quantitative variable and one categorical variable.
boxplot(lifespan~type, data = fruitfly)

We use a boxplot

  1. Identify the group with the shortest average lifespan and provide the mean and standard deviation of lifespan among this group.
# get the group means of one quantitative varible categorized by one categorical variable

# tapply(quantitative, categorical, function)

tapply(fruitfly$lifespan, fruitfly$type, mean)
##     1     2     3     4     5 
## 63.56 64.80 63.36 56.76 38.72
tapply(fruitfly$lifespan, fruitfly$type, sd)
##        1        2        3        4        5 
## 16.45215 15.65248 14.53983 14.92838 12.10207

The 8 virgin females have the shortest lifespan (type=5)

Question 2:

Let’s compare the lifespan distribution between the group supplied with 8 virgin females and the group supplied with 8 newly pregnant females with the normal distribution.

  1. Using the normal distribution, fill in the table below to calculate the probability of surviving within the given range of days. Some answers have been filled in for you. (Round all probabilities to 4 decimals.)
# Supplied with 8 newly pregnant females N(63.4, 14.5)
# P(X<30)
round(pnorm(30, 63.4, 14.5), 4)
## [1] 0.0106
# P(30<X<50)
round(diff(pnorm(c(30,50), 63.4, 14.5)), 4)
## [1] 0.1671
# P(50<X<70)
round(diff(pnorm(c(50,70), 63.4, 14.5)), 4)
## [1] 0.4978
# P(X>70)
round(1-pnorm(70, 63.4, 14.5), 4)
## [1] 0.3245
# Supplied with 8 virgin females N(38.7,12.1)
# P(X<30)
round(pnorm(30, 38.7, 12.1), 4)
## [1] 0.2361
# P(30<X<50)
round(diff(pnorm(c(30,50), 38.7, 12.1)), 4)
## [1] 0.5888
# P(50<X<70)
round(diff(pnorm(c(50,70), 38.7, 12.1)), 4)
## [1] 0.1703
# P(X>70)
round(1-pnorm(70, 38.7, 12.1), 4)
## [1] 0.0048
  1. Suppose five fruitflies escape from their experimental conditions in a different lab, but they were noted to survive 81 65 56 70 56 days. Do you think they came from the ‘supplied with 8 newly pregnant females’ group or the ‘supplied with with 8 virgin females’ group and why?
tapply(fruitfly$lifespan, fruitfly$type, mean)
##     1     2     3     4     5 
## 63.56 64.80 63.36 56.76 38.72
mean(c(81, 65, 56, 70, 56))
## [1] 65.6

Supplied with 8 newly pregnant females mean: 63.36 days Supplied with 8 virin females mean: 38.72 days

Answer: The fruitflies came from the newly pregnany females because the mean lifespan of the 8 newly pregnant females is closer to the mean of the escaped experimental fruitflies compared to the mean of the 8 virgin female fruitflies.

  1. Submit the following code to create a data set that only contains the group of fruitflies with the shortest average lifespan. Be sure to enter the number corresponding to the type of fruitflies you identified in (1b) after the double equal sign.
fruitflysubset<-subset(fruitfly,type==5)

Fill in the table to compare the theoretical quantiles (calculated using the normal distribution) and observed quantiles (calculated using fruitflysubset) from the two groups. (Round all quantiles to one decimal place.)

# Supplied with 8 virgin females N(38.7,12.1)
# Theoretical quantiles
round(qnorm(p=c(0.10,0.25,0.50,0.75,0.90),mean=38.72,sd=12.10),digits=1)
## [1] 23.2 30.6 38.7 46.9 54.2
# Observted quantiles
round(quantile(x=fruitflysubset$lifespan,probs=c(0.10,0.25,0.50,0.75,0.90)),digits=1)
##  10%  25%  50%  75%  90% 
## 21.8 32.0 40.0 47.0 54.0
  1. Are the theoretical and observed quantiles close together or far apart? Does that validate or invalidate the assumption that lifespan follows a normal distribution?
hist(x=fruitfly$lifespan)

hist(x=fruitflysubset$lifespan)

Answer: The theoretical and observed quantiles are close together.This particular sample validates the assumption that a lifespan follows a normal distribution.

Optional Extra Credit

Let’s compare the lifespan distribution between the group supplied with 8 virgin females and the group supplied with 8 newly pregnant females with the binomial distribution.

  1. Calculate the proportion of fruitflies that survived at least 50 days among those that were supplied with 8 virgin females using the fruitflysubset data set. We will use this as our estimate of the probability of surviving at least 50 days.
# Supplied with 8 virgin females N(38.7,12.1)
fruitflysubset<-subset(fruitfly,type==5)

# P(X>50)
round(1-pnorm(50, 38.7, 12.1), 4)
## [1] 0.1752
  1. Consider a new set of 10 fruitflies in each of the groups supplied with 8 virgin females and the group supplied with 8 pregnant females. Use the binomial distribution to calculate probabilities associated with the number of fruitflies out of 10 that survive at least 50 days, and fill in the table below. (Use the probability that you calculated in 3a. Round all probabilites to four decimal places.)
# 8 virgin females N(38.7,12.1) probability
round(data.frame(x=0:10,prob=dbinom(x=0:10,size=10,prob=0.1752)), 4)
##     x   prob
## 1   0 0.1457
## 2   1 0.3095
## 3   2 0.2958
## 4   3 0.1676
## 5   4 0.0623
## 6   5 0.0159
## 7   6 0.0028
## 8   7 0.0003
## 9   8 0.0000
## 10  9 0.0000
## 11 10 0.0000