Overview

This project is in two parts. Part one will use random number generation to show that a sample exponential density function closely follows and normal distribution when using a large number of sample values. In part two we will compare delivery and dosage types of Vitamin C in the odontoblast length of guinea pig teeth.

Part I: Simulation of the Exponential Density Function

Synopsis

The Central Limit Theorem tells us to expect that through repeated trials that the exponential density function will resemble the normal distribution. It is reasonable to expect some level of symmetry in the data but when our simulated distribution data is overlaid with the normal curve we see that it not exactly normally distributed. However, treating the simulated data as normally distributed a 95% confidence interval does not provide sufficient evidence to reject the hypothesis that the simulated data is centered around a mean of 5.

Experiment

To generate the random data we created a 1000 by 40 matrix that was populated by setting the seed equal to 42 and then using the rexp function. From this matrix we created a vector to calculate the mean of each of the 40 columns. Using the dplyr package the mean of the 40 calculated means was found to be 4.9865083 which is very close to the expected value of 5.

Expected Values

The exponential density function is known to have a mean and standard deviation of 1/lambda. We will let lambda equal 0.2 so the mean and standard deviation will both be equal to 5.
The expected value of the variance is 1/lambda^2 = 25. The simulated variance is 24.865 and again is close to the expected value.

Plots

The histogram displays the 40,000 random data points overlaid by the normal distribution function. The other graph is the QQ-Plot of the simulated data - the linearity of the chart strengthens the argument for normalcy.

To further strengthen the argument for normality we will use the Student-T distribution to generate a confidence interval: 4.935, 5.038. This interval was generated using a confidence level of 95%. Since the confidence interval does contain the expected mean value of 5 then there is not sufficient evidence to suggest that the simulated data has an expected mean value different from 5.

Part II: Tooth Growth Analysis

Synopsis

We first compare tooth length of guinea pigs by the type of supplement given: Ascorbic Acid and Orange Juice. We assume that there is no difference in the mean length of teeth. Second we compare tooth length by dosage size: 0.5, 1.0, and 2.0 mg/day. We also assume that there is no difference in tooth length when comparing dosage size.

Summary of data

We do assume the data is normally distributed, the sample sizes are large enough. Below are the minimum, Q1, median, mean, Q3, maximum of each supplement:

Summary of tooth length using Orange Juice: 8.2, 15.52, 22.7, 20.66, 25.73, 30.9
Summary of tooth length using Ascorbic Acid: 4.2, 11.2, 16.5, 16.96, 23.1, 33.9

Plots

Confidence Intervals

Each confidence interval below was tested at 95%.

Supplement interval: -7.571, 0.171
This result suggests that there is not a significant difference in tooth growth when comparing supplements using any size dosage

0.5 mg/day interval: 1.719, 8.781
Since both values are positive (the interval does not contain zero) then there is significant evidence to suggest that tooth growth increased when using 0.5 mg/day of Ascorbic Acid.

1.0 mg/day interval: -9.058, -2.802
Here both values are negative and suggests there is a significant difference in tooth growth rates when using Orange Juice to deliver the supplement at the 1.0 mg/day rate.

2.0 mg/day interval: -3.638, 3.798
When comparing Ascorbic Acid to Orange Juice supplements this interval contains the value of zero. Therefore there is not sufficient evidence to suggest there is a difference in tooth growth rates at the 2.0 mg/day rate.

Conclusions

When simulating data to test the normal distribution of the exponential density function we found that the limited number of data does approximate a normal distribution. The simulation mean was not significantly different than the expected value of 5. Also the QQ-Plot shows a reasonable approximation to a linear function.

In the analysis of the Tooth Growth data in guinea pigs we found that there was a not a significant difference in tooth growth rates when comparing the way the supplement was delivered. However, when we looked at the size of each dosage we determined that higher rates of growth occurred in the 0.5 mg/day when using Ascorbic Acid to deliver the supplement. When looking at 1.0 mg/day dosage rates the conclusion was that Orange Juice provided the higher rates of tooth growth rates. Finally, when comparing growth rates at the 2.0 mg/day dosage we determined there was no significant difference between Orange Juice and Ascorbic Acid.

Appendix: r code

library(dplyr)
library(datasets)
library(rmarkdown)
library(tinytex)
library(ggplot2)
n <- 40
lambda <- 0.2
nosim <- 1000
set.seed(42)

simulation_data <- matrix(data = rexp(n*nosim,lambda),nrow = nosim)

simulation_means <- data.frame(means = apply(simulation_data,
                                             MARGIN = 1,FUN = mean))
df <- cbind(simulation_data,simulation_means)

mean_simulation <- simulation_means %>% 
        summarise(sim_mean = mean(means)) %>% 
        unlist()

expected_variance = round((1/lambda)^2,3)
variance_simulation <- round(mean_simulation^2,3)


my_CI <- round(t.test(simulation_means)$conf,3)
simulation_means %>% 
        ggplot(aes(means)) +
        geom_histogram(aes(y = ..density..),binwidth = 0.25, alpha = 0.8,
                       fill = "light blue", color = "black") +
        labs(title = "Distribution of Exponential Function")+
        stat_function(fun = dnorm,
                      args = list(mean = 1/lambda, sd = 1/lambda/sqrt(n)))
ggplot(df, aes(sample = means)) + 
        stat_qq(col = "red") +
        labs(title = "QQ-Plot of Simulation Means", y = "Means" )
ToothGrowth$supp <- gsub("OJ", "Orange Juice", ToothGrowth$supp)
ToothGrowth$supp <- gsub("VC", "Ascorbic Acid", ToothGrowth$supp)
supp_data <- ToothGrowth %>% group_by(supp) %>%
        summarize(means = round(mean(len),3),
                  variance = round(var(len),3),
                  .groups = "drop")

dose_data <- ToothGrowth %>% group_by(supp,dose) %>%
        summarize(means = round(mean(len),3),
                  variance = round(var(len),3),
                  .groups = "drop")

OJ_group <- ToothGrowth %>% 
        filter(supp == "Orange Juice")
OJ_summary <- as.table(round(summary(OJ_group$len),2))

VC_group <- ToothGrowth %>% 
        filter(supp == "Ascorbic Acid")
VC_summary <- as.table(round(summary(VC_group$len),2))

half_mg_dose <- ToothGrowth %>% filter(dose == 0.5)
half_summary <- half_mg_dose %>% 
        group_by(supp) %>%
        summarise(mean_len = round(mean(len),3),
                  var_len = round(var(len),3),
                  .groups = "drop")

full_mg_dose_group <- ToothGrowth %>% filter(dose == 1.0)
full_summary <- full_mg_dose_group %>%
        group_by(supp) %>%
        summarise(mean_len = round(mean(len),3),
                  var_len = round(var(len),3),
                  .groups = "drop")

double_mg_dose_group <- ToothGrowth %>% filter(dose == 2.0)
double_summary <- double_mg_dose_group %>%
        group_by(supp) %>%
        summarise(mean_len = round(mean(len),3),
                  var_len = round(var(len),3),
                  .groups = "drop")
library(ggplot2)
ggplot(ToothGrowth, aes(supp,len, fill = supp)) + 
        geom_boxplot() +
        labs(title = "Tooth Growth in Guinea Pigs by Supplement",
             x = "Supplement",y = "Tooth Length")

ggplot(ToothGrowth, aes(factor(dose),len, fill = factor(dose))) +
        geom_boxplot() +
        labs(title = "Tooth Growth by Dosage Size",
             x = "Dosage (in mg/day)", y = "Tooth Length") +
        scale_fill_discrete(name = "Dose")
supp_CI <- round(t.test(VC_group$len,OJ_group$len)$conf,3)
VC_half_group <- half_mg_dose %>% filter(supp == "Ascorbic Acid")
OJ_half_group <- half_mg_dose %>% filter(supp == "Orange Juice")
half_CI <- round(t.test(OJ_half_group$len, VC_half_group$len)$conf,3)
VC_full_group <- full_mg_dose_group %>% filter(supp == "Ascorbic Acid")
OJ_full_group <- full_mg_dose_group %>% filter(supp == "Orange Juice")
full_CI <- round(t.test(VC_full_group$len,OJ_full_group$len)$conf,3)
VC_double_group <- double_mg_dose_group %>% filter(supp == "Ascorbic Acid")
OJ_double_group <- double_mg_dose_group %>% filter(supp == "Orange Juice")
double_CI <- round(t.test(VC_double_group$len,OJ_double_group$len)$conf,3)