Task Chapter 4 ~ A/B Testing

Lab5 ~ Hypothesis Testing

Kontak	\(\downarrow\)
Email	naftaligunawan@gmail.com
Instagram	https://www.instagram.com/nbrigittag/
RPubs	https://rpubs.com/naftalibrigitta/
Nama	Naftali Brigitta Gunawan
NIM	20214920002

Exercise 1

In the built-in data set named immer, the barley yield in years 1931 and 1932 of the same field are recorded. The yield data are presented in the data frame columns \(Y_1\) and \(Y_2\). Assuming that the data in immer follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean barley yields between years 1931 and 1932.

Estimate the difference between the means of matched samples using your textbook formula.

Answer :

library(MASS)                                         # load the MASS package 
DT::datatable(immer)                                  # load your data set in a table

Y_1  = immer$Y1
Y_2  = immer$Y2
beda = Y_1-Y_2

n        = length(beda)                              # find the length of data
d        = mean(beda)                                # find the y bar or mean
sd       = sqrt((1/(n-1))*sum(beda^2))               # find the standard deviation
a        = 1-0.95 ; a                                # find the alpha

## [1] 0.05

t        = qt(1-(a/2), df = n-1)                     # t-test
E        = c(-t,t) * (sd/sqrt(n)); E                 # find the margin of error

## [1] -11.50642  11.50642

CI       = round(d+E, digits = 2) ; CI               # find the interval

## [1]  4.41 27.42

So, the conclusion is the Confidence Interval are between \(4,41 ≤ μ_d ≤ 27,42\).

Exercise 2

In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles. Meanwhile, another data column in mtcars, named am, indicates the transmission type of the automobile model (0 = automatic, 1 = manual). In particular, the gas mileage for manual and automatic transmissions are two independent data populations. Assuming that the data in mtcars follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean gas mileage of manual and automatic transmissions.

Estimate the difference between two population proportions using your textbook formula.

Answer :

library(MASS)                                         # load the MASS package 
DT::datatable(mtcars)                                 # load your data set in a table

L = mtcars$am == 0       
Xauto = mean(mtcars[L,]$mpg)
Xmanual = mean(mtcars[!L,]$mpg) 

Sauto = sd(mtcars[L,]$mpg)
Smanual = sd(mtcars[!L,]$mpg) 

n.auto = length(mtcars[L,]$mpg)
n.manual = length(mtcars[!L,]$mpg) 

alpha = 1 - 0.95

diff = Xauto - Xmanual; diff

## [1] -7.244939

t. = qt(1-(alpha/2), df = n.auto+n.manual-2)

t = (diff) / (sqrt(((((n.auto-1)*Sauto^2)+((n.manual-1)*Smanual^2)) / (n.manual+n.auto - 2))*((1/n.manual)+(1/n.auto))))

t

## [1] -4.106127

lower = diff - t. * sqrt((Sauto)^2/(n.auto)+(Smanual)^2/(n.manual))
lower

## [1] -11.17264

upper = diff + t. * sqrt((Sauto)^2/(n.auto)+(Smanual)^2/(n.manual))
upper

## [1] -3.317237

So, the conclusion are the Difference is \(-7.24\) and the Confidence Interval are between \(-11.17 ≤ μ_d ≤ -3.32\).

Exercise 3

In the built-in data set named quine, children from an Australian town is classified by ethnic background, gender, age, learning status and the number of days absent from school. In effect, the data frame column Eth indicates whether the student is Aboriginal or Not (“A” or “N”), and the column Sex indicates Male or Female (“M” or “F”). Assuming that the data in quine follows the normal distribution, find the 95% confidence interval estimate of the difference between the female proportion of Aboriginal students and the female proportion of Non-Aboriginal students, each within their own ethnic group.

Estimate the difference between two population proportions using your textbook formula.

Answer :

library(MASS)                                         # load the MASS package 
DT::datatable(quine)                                  # load your data set in a table

table(quine$Eth, quine$Sex)

##    
##      F  M
##   A 38 31
##   N 42 35

library(dplyr)
aus = quine %>%
  count(Eth, Sex)

aus = data.frame(
  "Note"     = c("A","N","Total"),
  "F"     = c(38, 42, 38+42),
  "M"     = c(31, 35, 31+35),
  "Total" = c(38+31, 42+35, 38+31+42+35)); aus

# 1 for A, and 2 for N
phat1 = 38/69
n1    = 60
phat2 = 42/77
n2    = 77
a     = 0.05
pe    = phat1-phat2                                                         # point estimate 
x     = qnorm(0.975) * sqrt(((phat1*(1-phat1))/n1)+((phat1*(1-phat1))/n2))  
CI    = round(pe + c(-x,x), digits = 3); CI

## [1] -0.163  0.173

So, the conclusion is Confidence Interval are betwen \(-0.163 ≤ μ_d ≤ 0.173\).