# Save the dataset fruitfly and rmarkdown file in the same folder!!!
# Import dataset fruitfly

fruitfly <- read.csv("fruitfly.csv")

Question 1:

Compare the distribution of lifespan among the five experimental groups of fruitflies.

  1. Produce an appropriate figure to compare the distribution of lifespan among the five experimental groups of fruitflies. What figure did you produce?

Hint: use the Console in Rstudio to examine the dateset before attempting this exercise. For instance, type “names(fruitfly)” (without quotes) in the Console to see the variables in the data set and type “fruitfly” to see the entire dataset.

# Plot the appropriate figure to visualize the association between one quantitative variable and one categorical variable.
boxplot(lifespan~type, data = fruitfly)

  1. Identify the group with the shortest average lifespan and provide the mean and standard deviation of lifespan among this group.
# get the group means of one quantitative varible categorized by one categorical variable

# tapply(quantitative, categorical, function)

tapply(fruitfly$lifespan, fruitfly$type, mean)
##     1     2     3     4     5 
## 63.56 64.80 63.36 56.76 38.72

Find the mean and standard deviation for group 5: the 8 virgin females

The group with the shortest lifespan with the average lifespan is group 5: Virgin Females with with a mean of 38.72 and standard deviation of 12.1020659
38.7 12.1 ### Question 2:

Let’s compare the lifespan distribution between the group supplied with 8 virgin females and the group supplied with 8 newly pregnant females with the normal distribution.

  1. Using the normal distribution, fill in the table below to calculate the probability of surviving within the given range of days. Some answers have been filled in for you. (Round all probabilities to 4 decimals.)
# Supplied with 8 newly pregnant females N(63.4, 14.5)
# P(X<30)
round(pnorm(30, 63.4, 14.5), 4)
## [1] 0.0106
# P(30<X<50)
round(diff(pnorm(c(30,50), 63.4, 14.5)), 4)
## [1] 0.1671
# P(50<X<70)
round(diff(pnorm(c(50,70), 63.4, 14.5)), 4)
## [1] 0.4978
# P(X>70)
round(1-pnorm(70, 63.4, 14.5), 4)
## [1] 0.3245
# Supplied with 8 virgin females N(38.7, 12.1)
----------
# P(X<30)
round(pnorm(30, 37.8, 12.1), 4)
## [1] 0.2596
# P(30<X<50)
round(diff(pnorm(c(30,50), 37.8, 12.1)), 4)
## [1] 0.5838
# P(50<X<70)
round(diff(pnorm(c(50,70), 37.8, 12.1)), 4)
## [1] 0.1528
# P(X>70)
round(1-pnorm(70, 37.8, 12.1), 4)
## [1] 0.0039
  1. Suppose five fruitflies escape from their experimental conditions in a different lab, but they were noted to survive 81 65 56 70 56 days. Do you think they came from the ‘supplied with 8 newly pregnant females’ group or the ‘supplied with with 8 virgin females’ group and why?

65.6

The flies most likely came from the ‘supplied with 8 newly pregnant females’ as the mean of the 5 escaped flies was 65.5 days. This is closer to the mean of the male fruit flies in the group ‘supplied with 8 pregnant females’ of 63.4. The group ‘supplied with 8 virgin females’ had a mean of approximately 38.7 which is far shorter than the average lifespan of the five escaped flies.

  1. Submit the following code to create a data set that only contains the group of fruitflies with the shortest average lifespan. Be sure to enter the number corresponding to the type of fruitflies you identified in (1b) after the double equal sign.

fruitflysubset<-subset(fruitfly,type==5) norm(fruitflysubset)

N(38.7, 12.1)

Theoretical

round(qnorm(p=c(.10,.25,.50,.75,.90), mean=38.72, sd=12.10207), 1) #Observed round(quantile(x=fruitflysubset$lifespan, probs=c(.10,.25,.50,.75,.90)), 1) #10th percentile (Theoretical, Observed) (23.2, 21.8) #25th percentile (Theoretical, Observed) (30.6, 32) #50th percentile (Theoretical, Observed) (38.7, 40) #75th percentile (Theoretical, Observed) (46.9, 47) #90th percentile (Theoretical, Observed) (54.2, 54)

  1. Are the theoretical and observed quantiles close together or far apart? Does that validate or invalidate the assumption that lifespan follows a normal distribution?

The theoretical and observed values are close together so this validates the assumption that lifespan follows a normal distribution.