Assignment #3

The assignment consists of 2 parts. Create an R-Markdown script (.rmd) and generate an html output for the code and text. Please keep in mind to use both code chunks, text, and other components of reproducible research as required.

  1. Use the CO2 dataset in R
    1. To get definitions of the columns type help(CO2)
    2. Calculate means & standard deviations for 4 groups broken down by Type and Treatment
    3. Perform one-way tests twice: once for Type and once for Treatment
    4. Perform a two-way test for Type and Treatment
  2. Use the mtcars dataset in R
    1. Use the table() function with the following combinations
      1. The variables vs and am
      2. The variables gear and carb
      3. The variables cyl and gear
      4. For each of the three cases above guess what the results of a Chi-Squared analysis will be
      5. Ignore warnings for low values in the cells
    2. Perform a Chi-Squared analysis on the mtcars dataset for each of the three cases above

Question 1: Use the CO2 dataset in R

1) To get definitions of the columns type help(CO2)

#To get definitions of the columns type help(CO2)
str(CO2)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   84 obs. of  5 variables:
##  $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
##  $ conc     : num  95 175 250 350 500 675 1000 95 175 250 ...
##  $ uptake   : num  16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
##  - attr(*, "formula")=Class 'formula' length 3 uptake ~ conc | Plant
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "outer")=Class 'formula' length 2 ~Treatment * Type
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Ambient carbon dioxide concentration"
##   ..$ y: chr "CO2 uptake rate"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(uL/L)"
##   ..$ y: chr "(umol/m^2 s)"

2) Calculate means & standard deviations for 4 groups broken down by Type and Treatment

CO2_summary=CO2 %>% ungroup() %>% group_by(Type, Treatment)%>%
  summarise(mean_conc=mean(conc),
            std_conc=sd(conc),
            mean_uptake=mean(uptake),
            std_uptake=sd(uptake)) %>% ungroup()
CO2_summary
## Source: local data frame [4 x 6]
## 
##          Type  Treatment mean_conc std_conc mean_uptake std_uptake
##        (fctr)     (fctr)     (dbl)    (dbl)       (dbl)      (dbl)
## 1      Quebec nonchilled       435 301.4216    35.33333   9.596371
## 2      Quebec    chilled       435 301.4216    31.75238   9.644823
## 3 Mississippi nonchilled       435 301.4216    25.95238   7.402136
## 4 Mississippi    chilled       435 301.4216    15.81429   4.058976

3) Perform one-way tests twice: once for Type and once for Treatment

# Perform one-way t test on uptake
fit_Type=lm(uptake~Type, data=CO2)
anova(fit_Type)
## Analysis of Variance Table
## 
## Response: uptake
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Type       1 3365.5  3365.5  43.519 3.835e-09 ***
## Residuals 82 6341.4    77.3                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit_Treatment=lm(uptake~Treatment, data=CO2)
anova(fit_Treatment)
## Analysis of Variance Table
## 
## Response: uptake
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## Treatment  1  988.1  988.11  9.2931 0.003096 **
## Residuals 82 8718.9  106.33                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • In test for Type, p-value = 3.834686110^{-9}. Therefore, we could reject the null hypothesis and conclude the origin of the plant made a difference on the uptake rates.

  • In test for Treatent, p-value = 0.0030957, Therefore, we could reject the null hypothesis and conclude treatment types made a difference on the uptake rates.

4) Perform a two-way test for Type and Treatment

fit_Type_Treatment=lm(uptake~Type*Treatment, data=CO2)
anova(fit_Type_Treatment)
## Analysis of Variance Table
## 
## Response: uptake
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## Type            1 3365.5  3365.5 52.5086 2.378e-10 ***
## Treatment       1  988.1   988.1 15.4164 0.0001817 ***
## Type:Treatment  1  225.7   225.7  3.5218 0.0642128 .  
## Residuals      80 5127.6    64.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Because the p-value for both Type and Treatment is very small, we could reject the null hypothesis and conclude that both the Type and the Treatment made differences on the uptake rate.
  • Because the p-value for the interaction term is 0.06, we cannot reject the null hypothesis and conclude that the interaction between Type and Treatment didn’t make a significant impack on the uptake rates.

Question 2: Use the mtcars dataset in R

1) Use the table() function with the following combinations

A. The variables vs and am

B. The variables gear and carb

C. The variables cyl and gear

# A
tb_vs_am=with(mtcars, table(vs,am))
# B
tb_gear_carb=with(mtcars, table(gear, carb))
# C
tb_cyl_gear=with(mtcars, table(cyl, gear))

tb_vs_am
##    am
## vs   0  1
##   0 12  6
##   1  7  7
tb_gear_carb
##     carb
## gear 1 2 3 4 6 8
##    3 3 4 3 5 0 0
##    4 4 4 0 4 0 0
##    5 0 2 0 1 1 1
tb_cyl_gear
##    gear
## cyl  3  4  5
##   4  1  8  2
##   6  2  4  1
##   8 12  0  2

D. For each of the three cases above guess what the results of a Chi-Squared analysis will be

E. Ignore warnings for low values in the cells

My guess is none of each two of them are dependent with each other.

2) Perform a Chi-Squared analysis on the mtcars dataset for each of the three cases above

chisq.test(tb_vs_am)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tb_vs_am
## X-squared = 0.3475, df = 1, p-value = 0.5555
chisq.test(tb_gear_carb)
## Warning in chisq.test(tb_gear_carb): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  tb_gear_carb
## X-squared = 16.5181, df = 10, p-value = 0.08573
chisq.test(tb_cyl_gear)
## Warning in chisq.test(tb_cyl_gear): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  tb_cyl_gear
## X-squared = 18.0364, df = 4, p-value = 0.001214
  • From the above results. because ther p-value of the first 2 are higher than 0.05, we will not reject the null hypothesis and conclude that vs and am, gear and carb are independent with each other.
  • Because the p-value for the third contingency table is 0.0012141, we will reject the null hypothesis and conclude that the cyl and gear are not independent with each other. My guess turns out wrong. However, due to the small cell values in the contingency table, I will expect more samples to come up with more acccurate test results.