Synopsis

The project consists of two parts: 1 Simulation Exercise to explore Statistical inference 2 Basic Inferential Data Analysis

Basic Inferential Data Analysis

We’re going to analyze the ToothGrowth data in the R datasets package. The data is set of 60 observations, length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1 and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Load necessary packages

library(ggplot2)

Load the ToothGrowth data and perform some basic exploratory data analyses

Load data

data(ToothGrowth)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Provide a basic summary of the data

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Graph

ggplot(aes(x=supp, y=len), data=ToothGrowth) + geom_boxplot(aes(fill=supp))+ 
        xlab("Supplement Type") +ylab("Tooth length")

ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
    geom_bar(stat="identity") +
    facet_grid(. ~ supp) +
    xlab("Dose(mg)") +
    ylab("Tooth length")

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

hyp1 <- t.test(len ~ supp, data = ToothGrowth)
hyp1$conf.int

## [1] -0.1710156  7.5710156
## attr(,"conf.level")
## [1] 0.95

hyp1$p.value

## [1] 0.06063451

hyp2<-t.test(len ~ supp, data = subset(ToothGrowth, dose == 0.5))
hyp2$conf.int

## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95

hyp2$p.value

## [1] 0.006358607

hyp3<-t.test(len ~ supp, data = subset(ToothGrowth, dose == 1))
hyp3$conf.int

## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95

hyp3$p.value

## [1] 0.001038376

hyp4<-t.test(len ~ supp, data = subset(ToothGrowth, dose == 2))
hyp4$conf.int

## [1] -3.79807  3.63807
## attr(,"conf.level")
## [1] 0.95

hyp4$p.value

## [1] 0.9638516

State your conclusions and the assumptions needed for your conclusions.

The conclusion is that when the dose is 0.5 or 1.0, there is a difference between the growth of teeth after taking OJ and VC, while when the dose is 2.0, there is no There is no difference between the growth of teeth after taking OJ and VC. The necessary assumption is that we first assumed that the entire population was normally distributed, and then we assumed that the population is normally distributed under each dose.

Statistical Inference Course Project Part 2

Henrys Kasereka

10/30/2020