The purpose of this exercise in part 1 and part 2 is to show where the distribution is centered at and compare it to the theoretical center of the distribution. Moreover, to show how the variable the distribution is and compare it to the theoretical variance of the distribution. Furthermore, an exploratory data analysis of at least a single plot or table highlighting basic features of the data will be performed, indicating appropriate confidence intervals and tests, where the results of the tests and intervals are interpreted in the context of the problem.
Set lambda = 0.2 and distribution of averages of 40 with mean of exponential distribution and standatd deviation at 1/lambda. Observation of 1000 simulation.
library(ggplot2)
set.seed(12)
n <- 40
lambda <- 0.2
Mean simulation
simulation_data <- replicate(1000, rexp(n, .2))
mean_simulation <- apply(simulation_data, 2, mean)
Sample Mean
sample_mean <- mean(mean_simulation)
sample_mean
## [1] 5.010015
Theoretical Mean
theoretical_mean <- 1/0.2
theoretical_mean
## [1] 5
plot
hist(mean_simulation, xlab = "mean", main = "Exponential Function Simulations")
abline(v = sample_mean, col = "red")
abline(v = theoretical_mean, col = "yellow")
calculate expected standard deviation and varience of sample
expected_sd <- (1/.2)/sqrt(n)
expected_var <- expected_sd^2
calculate standard deviation and variance of sample
sd <- sd(mean_simulation)
var <- var(mean_simulation)
graph simulation means distribution along with the normal distribution (blue curve)
smd <- seq(min(mean_simulation), max(mean_simulation), length=100)
smd_graph <- dnorm(smd, mean=theoretical_mean, sd=expected_sd)
hist(mean_simulation,
breaks = n, prob=T,
xlab = "means",
ylab = "count",
main = "Density of Means")
lines(smd, smd_graph, pch=3, col="blue", lty=5)
library(datasets)
library(ggplot2)
list column names and headers
colnames(ToothGrowth)
## [1] "len" "supp" "dose"
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
list summary
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
plot ToothGrowth
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(aes(x=dose, y=len), data = ToothGrowth) +
geom_boxplot(aes(fill=dose)) +
ggtitle("Tooth Length by dose Amount of Vitamin C") +
xlab("Dose") +
ylab("Tooth Length") +
facet_grid(~supp) +
theme(plot.title = element_text(lineheight = .9, face = "bold"))
Find ANOVA
anova <- aov(len ~ supp * dose, data = ToothGrowth)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## supp 1 205.4 205.4 15.572 0.000231 ***
## dose 2 2426.4 1213.2 92.000 < 2e-16 ***
## supp:dose 2 108.3 54.2 4.107 0.021860 *
## Residuals 54 712.1 13.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
find tukeyHSD to show 3 catagories with variables P-vale of >0.05
TukeyHSD(anova)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = len ~ supp * dose, data = ToothGrowth)
##
## $supp
## diff lwr upr p adj
## VC-OJ -3.7 -5.579828 -1.820172 0.0002312
##
## $dose
## diff lwr upr p adj
## 1-0.5 9.130 6.362488 11.897512 0.0e+00
## 2-0.5 15.495 12.727488 18.262512 0.0e+00
## 2-1 6.365 3.597488 9.132512 2.7e-06
##
## $`supp:dose`
## diff lwr upr p adj
## VC:0.5-OJ:0.5 -5.25 -10.048124 -0.4518762 0.0242521
## OJ:1-OJ:0.5 9.47 4.671876 14.2681238 0.0000046
## VC:1-OJ:0.5 3.54 -1.258124 8.3381238 0.2640208
## OJ:2-OJ:0.5 12.83 8.031876 17.6281238 0.0000000
## VC:2-OJ:0.5 12.91 8.111876 17.7081238 0.0000000
## OJ:1-VC:0.5 14.72 9.921876 19.5181238 0.0000000
## VC:1-VC:0.5 8.79 3.991876 13.5881238 0.0000210
## OJ:2-VC:0.5 18.08 13.281876 22.8781238 0.0000000
## VC:2-VC:0.5 18.16 13.361876 22.9581238 0.0000000
## VC:1-OJ:1 -5.93 -10.728124 -1.1318762 0.0073930
## OJ:2-OJ:1 3.36 -1.438124 8.1581238 0.3187361
## VC:2-OJ:1 3.44 -1.358124 8.2381238 0.2936430
## OJ:2-VC:1 9.29 4.491876 14.0881238 0.0000069
## VC:2-VC:1 9.37 4.571876 14.1681238 0.0000058
## VC:2-OJ:2 0.08 -4.718124 4.8781238 1.0000000
plot normality assumption
plot(anova, 2)
There is a correlation between tooth growth as well as an increase in the C vitamin. There is slight difference between the dose methods with orange juice not being significant. The assignment for the catagories are random and normal for the distribution of the means. Residules 32 and 49 of OJ as well as residule 23 of VC are showing as a outliers.