Show the sample mean and compare it to the theoretical mean of the distribution.

lambda <- 0.2
permData <- matrix(rexp(1000*40, lambda), nrow = 1000, ncol = 40)
permDataMean <- apply(permData, 1, mean)
hist(permDataMean, breaks = 50, main = "The distribution of 1000 averages of 40 random exponentials", xlab = "Value of means", ylab = "Frequency of means", col = "light blue")
abline(v = 1/lambda, lty = 1, lwd = 5, col = "blue")
legend("topright", lty = 1, lwd = 5, col = "blue", legend = "theoretical mean")

Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

permDataVar <- apply(permData, 1, var)
hist(permDataVar, breaks = 50, main = "The distribution of 1000 variance of 40 random exponentials", xlab = "Value of variances", ylab = "Frequency of variance", col = "light blue")
abline(v = (1/lambda)^2, lty = 1, lwd = 5, col = "blue")
legend("topright", lty = 1, lwd = 5, col = "blue", legend = "theoretical variance")

Show that the distribution is approximately normal.

par(mfrow = c(2, 1))

hist(permDataMean, breaks = 50, main = "The distribution of 1000 averages of 40 random exponentials", xlab = "Value of means", ylab = "Frequency of means", col = "orange")
permNorm <- rnorm(1000, mean = mean(permDataMean), sd = sd(permDataMean))
hist(permNorm, breaks = 50, main = "A normal distribution with theoretical mean and sd of the exponentials", xlab = "Normal variables", col = "light green")

part-2

Load the ToothGrowth data and perform some basic exploratory data analyses

qplot(dose, len, data = ToothGrowth, color = supp, geom = "point") +  geom_smooth(method = "lm") + labs(title = "ToothGrowth") + labs(x = "Dose of supplements", y = "Length of teeth")

## `geom_smooth()` using formula 'y ~ x'

Provide a basic summary of the data.

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

sd(ToothGrowth$len)

## [1] 7.649315

sumstats <- ToothGrowth %>% group_by(supp,dose) %>% summarize(len.mean=mean(len),)

## `summarise()` regrouping output by 'supp' (override with `.groups` argument)

sumstats

## # A tibble: 6 x 3
## # Groups:   supp [2]
##   supp   dose len.mean
##   <fct> <dbl>    <dbl>
## 1 OJ      0.5    13.2 
## 2 OJ      1      22.7 
## 3 OJ      2      26.1 
## 4 VC      0.5     7.98
## 5 VC      1      16.8 
## 6 VC      2      26.1

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

dose0.5 <- ToothGrowth[ToothGrowth$dose == 0.5, ]
t.test(x = dose0.5$len, paired = FALSE, conf.level = 0.95)$conf.

## [1]  8.499046 12.710954
## attr(,"conf.level")
## [1] 0.95

dose1 <- ToothGrowth[ToothGrowth$dose == 1, ]
t.test(x = dose1$len, paired = FALSE, conf.level = 0.95)$conf.

## [1] 17.66851 21.80149
## attr(,"conf.level")
## [1] 0.95

dose2 <- ToothGrowth[ToothGrowth$dose == 2, ]
t.test(x = dose2$len, paired = FALSE, conf.level = 0.95)$conf.

## [1] 24.33364 27.86636
## attr(,"conf.level")
## [1] 0.95

State your conclusions and the assumptions needed for your conclusions.

There is difference between using OJ and VC, when a dose value is 0 and 1.And, OJ is more effective than VC. it’s almost same when dose value is 2. And, confidence intervals are overlapped , so it’s difficult to say which one is more helpful

Stat_interface-Project

Yaswanth Pulavarthi

8/25/2020