Topics for today!

  1. Topic: One sample t.test()
  2. Topic: Two sample t.test()
  3. Topic: Paired two sample t.test()
  4. Topic: Checking assumptions of t.test()

Data

setwd("~/Desktop/R Materials/mih140/Assignments/'19 Assignments")
cba = read.table("cba_admissions_1999.txt", sep = "\t", header = T, quote = "", allowEscapes = T)

1. Topic: t.test()

# QU: Do scholarship students score above 600 on average in math?
cba_schol = cba[cba$scholarship_yes_no == 1, ]
cba_schol_math = cba_schol[!is.na(cba_schol$SAT_math),]
t.test(cba_schol_math$SAT_math, mu = 600, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  cba_schol_math$SAT_math
## t = 9.43, df = 196, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 600
## 95 percent confidence interval:
##  638.0555      Inf
## sample estimates:
## mean of x 
##  646.1421

2. Topic: Two sample t.test()

# QU: Do scholarship students and non-scholarship students have statistically signifigantly different average sat_math scores?
cba = cba[!is.na(cba$SAT_math),]
cba_schol = cba[cba$scholarship_yes_no == 1, ]
cba_noschol = cba[cba$scholarship_yes_no == 0, ]

t.test(cba_schol$SAT_math, cba_noschol$SAT_math, alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  cba_schol$SAT_math and cba_noschol$SAT_math
## t = 13.725, df = 286.13, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  63.45039 84.69595
## sample estimates:
## mean of x mean of y 
##  646.1421  572.0690
# Yes, statistically sign. different

3. Topic: Paired two sample t.test

# QU: Do scholarship students have statistically signifigantly different SAT_math scores vs SAT_verbal scores?
cba = cba[!is.na(cba$SAT_math),]
cba = cba[!is.na(cba$SAT_verbal),]
cba_schol = cba[cba$scholarship_yes_no == 1, ]
t.test(cba_schol$SAT_math, cba_schol$SAT_verbal, alternative = "two.sided", paired = T)
## 
##  Paired t-test
## 
## data:  cba_schol$SAT_math and cba_schol$SAT_verbal
## t = 3.0547, df = 196, p-value = 0.002567
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   5.738456 26.647331
## sample estimates:
## mean of the differences 
##                16.19289
# Compare with paired = F, which has higher p_val? Why?

4. Topic: Checking Assumptions of t.tests()

# Assumption 1: Data (in each sample!) is drawn from a normal distribution.
# Check by making a qqplot, observing the qqplot is straight and does not have bad outliers! Outliers are worse than curves

# Example:
par(mfrow = c(1,2))
qqnorm(cba_schol$SAT_math)
qqnorm(cba_schol$SAT_verbal)

par(mfrow = c(1,1))

# Assumption 2 (for two sample data): Observations in each group are independent of one another! This comes from knowledge about your data, to check this just state simply whether it is or isn't.