Statistical Concepts

CO2 dataset

calcualte means & standard deviation

help(CO2)
## starting httpd help server ...
##  done
summary(CO2)
##      Plant             Type         Treatment       conc     
##  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95  
##  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175  
##  Qn3    : 7                                    Median : 350  
##  Qc1    : 7                                    Mean   : 435  
##  Qc3    : 7                                    3rd Qu.: 675  
##  Qc2    : 7                                    Max.   :1000  
##  (Other):42                                                  
##      uptake     
##  Min.   : 7.70  
##  1st Qu.:17.90  
##  Median :28.30  
##  Mean   :27.21  
##  3rd Qu.:37.12  
##  Max.   :45.50  
## 
colnames(CO2)
## [1] "Plant"     "Type"      "Treatment" "conc"      "uptake"
m1<-subset(CO2,Type=="Quebec" & Treatment=="nonchilled")

m2<-subset(CO2,Type=="Quebec"  & Treatment=="chilled")

m3<-subset(CO2, Type=="Mississippi" & Treatment=="nonchilled")

m4<-subset(CO2, Type=="Mississippi" & Treatment=="chilled")


mean(m1$conc)
## [1] 435
mean(m2$conc)
## [1] 435
mean(m3$conc)
## [1] 435
mean(m4$conc)
## [1] 435
mean(m1$uptake)
## [1] 35.33333
mean(m2$uptake)
## [1] 31.75238
mean(m3$uptake)
## [1] 25.95238
mean(m4$uptake)
## [1] 15.81429
sd(m1$conc)
## [1] 301.4216
sd(m2$conc)
## [1] 301.4216
sd(m3$conc)
## [1] 301.4216
sd(m4$conc)
## [1] 301.4216
sd(m1$uptake)
## [1] 9.596371
sd(m2$uptake)
## [1] 9.644823
sd(m3$uptake)
## [1] 7.402136
sd(m4$uptake)
## [1] 4.058976

Perform one-way test for type and treatment on uptake, since the mean for conc is the same, it is meanless to test it

trt=lm(uptake~Treatment, CO2)
anova(trt)
## Analysis of Variance Table
## 
## Response: uptake
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## Treatment  1  988.1  988.11  9.2931 0.003096 **
## Residuals 82 8718.9  106.33                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
type=lm(uptake~Type,data=CO2)
anova(type)
## Analysis of Variance Table
## 
## Response: uptake
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Type       1 3365.5  3365.5  43.519 3.835e-09 ***
## Residuals 82 6341.4    77.3                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

since both of the p-value<0.05, we conclude we can reject the null hypthoese, which means both the type and the treatment makes a difference in uptake

Perform two way anova test

tt=lm(uptake~Type*Treatment,data=CO2)
anova(tt)
## Analysis of Variance Table
## 
## Response: uptake
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## Type            1 3365.5  3365.5 52.5086 2.378e-10 ***
## Treatment       1  988.1   988.1 15.4164 0.0001817 ***
## Type:Treatment  1  225.7   225.7  3.5218 0.0642128 .  
## Residuals      80 5127.6    64.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

we conclude both type and treatment makes a difference in uptake individually, however, we can’s conclude the interaction between type and treatment make a differences on the variable uptake

mtcars dataset

let’s take a look at the frequency counts with different combinations

carvsam<- table(mtcars$vs,mtcars$am)
carvsam
##    
##      0  1
##   0 12  6
##   1  7  7
cargc <-table(mtcars$gear,mtcars$carb)
cargc
##    
##     1 2 3 4 6 8
##   3 3 4 3 5 0 0
##   4 4 4 0 4 0 0
##   5 0 2 0 1 1 1
carcg<- table(mtcars$cyl, mtcars$gear)
carcg
##    
##      3  4  5
##   4  1  8  2
##   6  2  4  1
##   8 12  0  2

We are guessing that each variable is independent to each other

performing the chi-square test

chisq.test(carvsam)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  carvsam
## X-squared = 0.34754, df = 1, p-value = 0.5555
chisq.test(cargc)
## Warning in chisq.test(cargc): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  cargc
## X-squared = 16.518, df = 10, p-value = 0.08573
chisq.test(carcg)
## Warning in chisq.test(carcg): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  carcg
## X-squared = 18.036, df = 4, p-value = 0.001214

As a result, we can say for sure vs and am, gear and carb are independent from each other, however cyl and gear might be dependent , but we can draw the conclusion with enough confidence, betweeen the sample size is too small