Problem

The assignment consists of 2 parts. Create an R-Markdown script (.rmd) and generate an html output for the code and text. Please keep in mind to use both code chunks, text, and other components of reproducible research as required.

  1. Use the CO2 dataset in R
    1. To get definitions of the columns type help(CO2)
    2. Calculate means & standard deviations for 4 groups broken down by Type and Treatment
    3. Perform one-way tests twice: once for Type and once for Treatment
    4. Perform a two-way test for Type and Treatment
  2. Use the mtcars dataset in R
    1. Use the table() function with the following combinations
      1. The variables vs and am
      2. The variables gear and carb
      3. The variables cyl and gear
      4. For each of the three cases above guess what the results of a Chi-Squared analysis will be
      5. Ignore warnings for low values in the cells
    2. Perform a Chi-Squared analysis on the mtcars dataset for each of the three cases above

1.1. To get definitions of the columns type help(CO2)

help(CO2)

1.2. Calculate means & standard deviations for 4 groups broken down by Type and Treatment

First we will calculate the mean:

aggregate(CO2[,4:5],list(CO2$Type,CO2$Treatment),mean)
##       Group.1    Group.2 conc   uptake
## 1      Quebec nonchilled  435 35.33333
## 2 Mississippi nonchilled  435 25.95238
## 3      Quebec    chilled  435 31.75238
## 4 Mississippi    chilled  435 15.81429

Then we will calculate the standard deviation:

aggregate(CO2[,4:5],list(CO2$Type,CO2$Treatment),sd)
##       Group.1    Group.2     conc   uptake
## 1      Quebec nonchilled 301.4216 9.596371
## 2 Mississippi nonchilled 301.4216 7.402136
## 3      Quebec    chilled 301.4216 9.644823
## 4 Mississippi    chilled 301.4216 4.058976

1.3. Perform one-way tests twice: once for Type and once for Treatment

# Perform one-way t test on uptake
fitType=lm(uptake~Type, data=CO2)
anova(fitType)
## Analysis of Variance Table
## 
## Response: uptake
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Type       1 3365.5  3365.5  43.519 3.835e-09 ***
## Residuals 82 6341.4    77.3                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the test for Type, p-value = 3.835e-09. Therefore, we could reject the null hypothesis and conclude the origin of the plant made a difference on the uptake rates.

# Perform one-way t test on uptake
fitTreatment=lm(uptake~Treatment, data=CO2)
anova(fitTreatment)
## Analysis of Variance Table
## 
## Response: uptake
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## Treatment  1  988.1  988.11  9.2931 0.003096 **
## Residuals 82 8718.9  106.33                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In test for Treatment, p-value = 0.003096, Therefore, we could reject the null hypothesis and conclude treatment types made a difference on the uptake rates. Uptake rates is dependent on Treatment types

1.4. Perform a two-way test for Type and Treatment

lmTypeTreatment=lm(uptake~Type*Treatment, data=CO2)
anova(lmTypeTreatment)
## Analysis of Variance Table
## 
## Response: uptake
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## Type            1 3365.5  3365.5 52.5086 2.378e-10 ***
## Treatment       1  988.1   988.1 15.4164 0.0001817 ***
## Type:Treatment  1  225.7   225.7  3.5218 0.0642128 .  
## Residuals      80 5127.6    64.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With the p-value for both Type and Treatment being very low,we can reject the Null hypothesis, and both the Type and the Treatment affect the uptake rate; because the p-value for the interaction term is 0.06, Null is accepted and conclude that the no significant effect on uptake rates by Type and Treatment.

2. Using the mtcars dataset in R, use the table() function with the combinations of :

1. The variables vs and am
attach(mtcars)           
vsAm <- table(vs, am)  
vsAm
##    am
## vs   0  1
##   0 12  6
##   1  7  7
2. The variables gear and carb
gearCarb <- table(gear,carb)
gearCarb
##     carb
## gear 1 2 3 4 6 8
##    3 3 4 3 5 0 0
##    4 4 4 0 4 0 0
##    5 0 2 0 1 1 1
3. The variables cyl and gear
cylGear <- table(cyl,gear)
cylGear
##    gear
## cyl  3  4  5
##   4  1  8  2
##   6  2  4  1
##   8 12  0  2
4. For each of the three cases above guess what the results of a Chi-Squared analysis will be
  1. No significant difference on Engine type i.e. vs and Transmission i.e. am
  2. Significant difference on gear and carburetors
  3. Significant difference cylinders and gear

2.2. Perform a Chi-Squared analysis on the mtcars dataset for each of the three cases above

chisq.test(vsAm) 
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  vsAm
## X-squared = 0.34754, df = 1, p-value = 0.5555

The p value of the above test is greater than 0.05, this indicate that vs and am are independent, thus we’re not rejecting the null hypothesis.

chisq.test(gearCarb) 
## 
##  Pearson's Chi-squared test
## 
## data:  gearCarb
## X-squared = 16.518, df = 10, p-value = 0.08573

The p value of the above test is greater than 0.05, this indicate that vs and am are independent, thus we’re not rejecting the null hypothesis. Gear and Carb are independent to each other.

chisq.test(cylGear) 
## 
##  Pearson's Chi-squared test
## 
## data:  cylGear
## X-squared = 18.036, df = 4, p-value = 0.001214

The p value of the above test is less than 0.05, we’re rejecting the null hypothesis and also concluding that the cyl and gear are not independent with each other.