Instructions are provided in italicized text. Please delete this text for your submission.

dataset <- read.csv("/Users/sean/Desktop/R Studio/BIOS 338/Data/Data_Package_Assignment/Lifespan_18°Cand21°C_ywRflies.csv")
View(dataset)

1. Data package

Simons, Mirre (2024). Data From: Dietary restriction extends lifespan across different temperatures in the fly [Dataset]. Dryad. https://doi.org/10.5061/dryad.fttdz091j

2. Overview (5 points)

This data package explores how dietary restriction affects the lifespan of Drosophila Melanogaster at various temperatures.

I chose this study specifically because of my interest in aging and logevity research with the model organism - Drosophila Melanogaster. I am also particularly interested in understanding the mechanisms of aging and how environmental factors, such as diet and temperature, can influence lifespan.

The purpose of the study was to investigate whether dietary restriction extends the lifespan of Drosophila melanogaster across different temperatures. Previous studies suggested that the lifespan extension effect of dietary restriction might not hold at lower temperatures in flies. This study aimed to test the robustness of the dietary restriction longevity response under different temperatures.

Since the data package was published on March 28, 2024, the data were likely collected prior to this date. The experiments were conducted in controlled laboratory settings, where flies were reared under specific conditions

Drosophila Photo Credit: Wikimedia Commons, User Name: Fir0002

3. Histogram and descriptive statistics (5 points)

mean_value <- mean(dataset$Age, na.rm = TRUE)
sd_value <- sd(dataset$Age, na.rm = TRUE) # standard deviation
se_value <- sd_value / sqrt(nrow(dataset)) # standard error

hist(dataset$Age,
     main = "Histogram of Age",
     xlab = "Age (days)",
     breaks = seq(min(dataset$Age),max(dataset$Age),1),
     col = "skyblue",
     border = "black",
     lwd = 2)

abline(v=mean_value, col = "red", lwd = 3) # Mean lifespan value in Histogram
## Lines for standard deviation (Below & Above the mean)
abline(v= mean_value + sd_value, col = "blue", lwd = 2, lty = 2) 
abline(v= mean_value - sd_value, col = "blue", lwd = 2, lty = 2)

## Lines for standard error (Below & Above the mean)
abline(v= mean_value + se_value, col = "green", lwd = 2, lty = 3)
abline(v= mean_value - se_value, col = "green", lwd = 2, lty = 3)

The distribution of data appears approximately symmetrical, without extreme skew to left or right. The red line at the center of the histogram represents the mean age of Drosophila’s lifespan. The blue dashed lines are +/-1 standard deviation from the mean. Most of the data fall within this range. The green dashed-dotted lines show the +/- 1 standard error from the mean, which is much tighter around the mean. There do not appear to be any major outliers in this dataset. The tails of the distribution taper off gradually at around 0 to 120 days.

4. Y~X plot (5 points)

plot(dataset$Age, dataset$At.Risk,
     main = "Age vs At Risk",
     xlab = "Age (Days)",
     ylab = "At Risk",
     col = "blue")

abline(lm(At.Risk ~ Age, data = dataset), col = "red") # Linear Regression Line

  # lm(): fits a linear regression model to the data

5. Assessment (5 points)

I’d say the quality of the data package is “good.”

The dataset is well-documented with clear labels for each variable, like Age, At Risk, Dead, etc. It includes all the essential variables for lifespan studies, and its structure is straightforward. However, the study could benefit from more details about the exact experimental setup.

The dataset is pretty clear because the variable names are intuitive, and the content matches the research focus. For instance, the age of the flies, the number of flies at risk, and the treatment groups are all clearly labeled. But while the variables are clear, more information about the Treatment variable would help improve understanding. For example, explaining exactly what “21C2” or “18C1” means in terms of temperature and dietary conditions would make it easier for someone new to interpret the results.

Overall, the dataset seems tidy and ready for analysis, which is great for reproducibility. It doesn’t need a lot of data cleaning.