A/B Testing
Komputasi Statistika
| Kontak | : \(\downarrow\) |
| dhelaagatha@gmail.com | |
| https://www.instagram.com/dhelaagatha/ | |
| RPubs | https://rpubs.com/dhelaasafiani/ |
| Nama | Dhela Agatha |
| NIM | 20214920009 |
| Prodi | Statistika 2021 |
Matched Samples
Two data samples are matched if they come from repeated observations of the same subject. Here, we assume that the data populations follow the normal distribution. Using the paired t-test, we can obtain an interval estimate of the difference of the population means.
Example
In the built-in data set named immer, the barley yield in years 1931 and 1932 of the same field are recorded. The yield data are presented in the data frame columns Y1 and Y2. Assuming that the data in immer follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean barley yields between years 1931 and 1932
library(MASS) # load the MASS package
DT::datatable(immer) # load your data set in a tablet.test(immer$Y1, immer$Y2, paired=TRUE) ##
## Paired t-test
##
## data: immer$Y1 and immer$Y2
## t = 3.324, df = 29, p-value = 0.002413
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 6.121954 25.704713
## sample estimates:
## mean difference
## 15.91333
Between years 1931 and 1932 in the data set immer, the 95% confidence interval of the difference in means of the barley yields is the interval between 6.122 and 25.705.
Your Exercise
Estimate the difference between the means of matched samples using your textbook formula.
Y1 = immer$Y1
Y2 = immer$Y2
beda = Y1-Y2
n = length(beda)
mean = mean(beda)
sdiff = sqrt((1/(n-1))*sum(beda^2))
alpha = 1-0.95 ; alpha## [1] 0.05
t = qt(1-(alpha/2), df = n-1)
E = c(-t,t) * (sdiff/sqrt(n)); E ## [1] -11.50642 11.50642
Interval = round(mean+E, digits = 3) ; Interval## [1] 4.407 27.420
Antara tahun 1931 dan 1932 dalam kumpulan data, interval kepercayaan 95% dari perbedaan sarana hasil jelai adalah interval antara 4.407 dan 27.420
Independent Samples
In this section, we develop both large-sample and small-sample methodologies for comparing two population means. Note that in the small-sample case we use the t-statistic. Population Mean Between Two Independent Samples Two data samples are independent if they come from unrelated populations and the samples do not affect each other. Here, we assume that the data populations follow the normal distribution.
Example
In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles. Meanwhile, another data column in mtcars, named am, indicates the transmission type of the automobile model (0 = automatic, 1 = manual). In particular, the gas mileage for manual and automatic transmissions are two independent data populations. Assuming that the data in mtcars follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean gas mileage of manual and automatic transmissions.
DT::datatable(mtcars)L = mtcars$am == 0 # declare transmission type
mpg.auto = mtcars[L,]$mpg # automatic transmission mileage
mpg.manual = mtcars[!L,]$mpg # manual transmission mileage
t.test(mpg.auto, mpg.manual) ##
## Welch Two Sample t-test
##
## data: mpg.auto and mpg.manual
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
In mtcars, the mean mileage of automatic transmission is 17.147 mpg and the manual transmission is 24.392 mpg. The 95% confidence interval of the difference in mean gas mileage is between 3.2097 and 11.2802 mpg.
t.test(mpg ~ am, data=mtcars) ##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
Your Exercise
Estimate the difference between two population proportions using your textbook formula.
Xauto = mean(mtcars[L,]$mpg)
Xmanual = mean(mtcars[!L,]$mpg)
Sauto = sd(mtcars[L,]$mpg)
Smanual = sd(mtcars[!L,]$mpg)
n.auto = length(mtcars[L,]$mpg)
n.manual = length(mtcars[!L,]$mpg)
n.mpg = length(mtcars$mpg)
alpha = 1 - 0.95
diff = Xauto - Xmanual
t. = qt(1-(alpha/2), df = n.auto+n.manual-2)
t = (diff) / (sqrt(((((n.auto-1)*Sauto^2)+((n.manual-1)*Smanual^2)) / (n.manual+n.auto - 2))*((1/n.manual)+(1/n.auto))))
t## [1] -4.106127
lower = diff - t. * sqrt((Sauto)^2/(n.auto)+(Smanual)^2/(n.manual))
lower## [1] -11.17264
upper = diff + t. * sqrt((Sauto)^2/(n.auto)+(Smanual)^2/(n.manual))
upper## [1] -3.317237
Pada mtcar, jarak tempuh rata-rata transmisi otomatis adalah 17.147 mpg dan transmisi manual adalah 24.392 mpg. Interval kepercayaan 95% dari perbedaan jarak tempuh gas rata-rata adalah antara -3.317237 dan -11.17264 mpg.
Comparison Proportions
A survey conducted in two distinct populations will produce different results. It is often necessary to compare the survey response proportion between the two populations. Here, we assume that the data populations follow the normal distribution.
Example
In the built-in data set named quine, children from an Australian town is classified by ethnic background, gender, age, learning status and the number of days absent from school. In effect, the data frame column Eth indicates whether the student is Aboriginal or Not (“A” or “N”), and the column Sex indicates Male or Female (“M” or “F”). Assuming that the data in quine follows the normal distribution, find the 95% confidence interval estimate of the difference between the female proportion of Aboriginal students and the female proportion of Non-Aboriginal students, each within their own ethnic group.
In R, we can tally the student ethnicity against the gender with the table function. As the result shows, within the Aboriginal student population, 38 students are female. Whereas within the Non-Aboriginal student population, 42 are female
library(MASS) # load the MASS package
DT::datatable(quine) table(quine$Eth, quine$Sex) # table of Non & Aboriginal##
## F M
## A 38 31
## N 42 35
prop.test(table(quine$Eth, quine$Sex), correct=FALSE) # comparison of Two Population Proportions##
## 2-sample test for equality of proportions without continuity correction
##
## data: table(quine$Eth, quine$Sex)
## X-squared = 0.0040803, df = 1, p-value = 0.9491
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1564218 0.1669620
## sample estimates:
## prop 1 prop 2
## 0.5507246 0.5454545
The 95% confidence interval estimate of the difference between the female proportion of Aboriginal students and the female proportion of Non-Aboriginal students is between -15.6% and 16.7%.
Your Exercise
Estimate the difference between two population proportions using your textbook formula.
A = 38/69
N= 42/77
n1 = 69
n2= 77
p.bar = (38+42)/(69+77)
p.bar## [1] 0.5479452
z = ((A-N)-(0) )/ sqrt((p.bar*(1-p.bar)*((1/n1)+(1/n2))))
z## [1] 0.06387745
S = sqrt((A*(1-A)/n1)+(N*(1-N)/n2))
S## [1] 0.08249739
lower = (A-N) - (1.96*S)
lower## [1] -0.1564248
upper = (A-N) + (1.96*S)
upper## [1] 0.166965
Perkiraan interval kepercayaan 95% dari perbedaan antara proporsi perempuan siswa Aborigin dan proporsi perempuan siswa Non-Aborigin adalah antara -15,6% dan 16,7%.