Komputasi Statistika
~ Tugas 4 ~
| Kontak | : \(\downarrow\) |
| diyasaryanugroho@gmail.com | |
| https://www.instagram.com/diasary_nm/ | |
| RPubs | https://rpubs.com/diyasarya/ |
Exercise 1
In the built-in data set named immer, the barley yield
in years 1931 and 1932 of the same field are recorded. The yield data
are presented in the data frame columns Y1 and Y2. Assuming that the
data in immer follows the normal distribution, find the 95%
confidence interval estimate of the difference between the mean barley
yields between years 1931 and 1932.
Estimate the difference between the means of matched samples
using your textbook formula.
library(MASS) # load the MASS package
DT::datatable(immer) # load your data set in a tableAnswer 1
Hypothesis
H0 : Tidak terdapat perbedaan rata-rata antara tahun
1931 dengan tahun 1932 (μ1 = μ2).
H1 : Terdapat perbedaan rata-rata
antara tahun 1931 dengan tahun 1932 (µ1 ≠ μ2).
n <- 30
rata1 <- mean(immer$Y1) # Mean year 1931
rata2 <- mean(immer$Y2) # Mean year 1932
sd1 <- sd(immer$Y1) # Standar Deviasi year 1931
sd2 <- sd(immer$Y2) # Standar Deviasi year 1932
sd3 <- sqrt(1/2*(sd1^2 + sd2^2)) # Pool Standar Deviasi
sddiff <- sd3*sqrt(2/n) # Difference of Standar Deviasi
df <- 2*n-2 # Degree of Freedom
t <- (rata1 - rata2)/(sd3*sqrt(2/n)) # T Test
t## [1] 2.319955
tcrit <- qt(0.025, df) # Critical T Zone
tcrit1 <- c(tcrit, -tcrit)
tcrit1## [1] -2.001717 2.001717
Dari dua kodingan diatas, kita dapat T-Score = 2.319955 dengan daerah kritisnya adalah - 2.001717 < T < 2.001717. Dimana T-Score = 2.31995 > Tcrit = 2.001717 dengan kata lain T-Score diluar dari daerah kritisnya yang berarti H0 ditolak atau “Terdapat perbedaan rata-rata antara tahun 1931 dengan tahun 1932”.
lower <- (rata1 - rata2) + tcrit * sddiff # Lower Interval
upper <- (rata1 - rata2) - tcrit * sddiff # Upper Interval
CI <- c(lower, upper) # Convidence Interval
CI## [1] 2.182895 29.643772
Dengan level signifikan 95% didapat selang kepercayaannya adalah 2.182895 < μ < 29.643772.
diff <- rata1 - rata2 # Mean Difference
diff## [1] 15.91333
Perbedaan rata-rata tahun 1931 dengan tahun 1932 adalah 15.91333.
Exercise 2
In the data frame column mpg of the data set mtcars,
there are gas mileage data of various 1974 U.S. automobiles. Meanwhile,
another data column in mtcars, named am, indicates the
transmission type of the automobile model (0 = automatic, 1 = manual).
In particular, the gas mileage for manual and automatic transmissions
are two independent data populations. Assuming that the data in
mtcars follows the normal distribution, find the 95%
confidence interval estimate of the difference between the proportion
gas mileage of manual and automatic transmissions.
Estimate the difference between two population proportions using
your textbook formula.
DT::datatable(mtcars) # load your data set in a tableAnswer 2
library(dplyr)
n = 32 # Size of Population
n1 = mtcars %>% # Size of model automatic
filter(am == 0) %>%
nrow()
n2 = mtcars %>% # Size of model manual
filter(am == 1) %>%
nrow()
p1 <- n1/n # Propotion of model automatic
p2 <- n2/n # Propotion of model manual
SE <- sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2)) # Standar Deviasi of Error
ME <- 1.95 * SE # Margin of Error
SE## [1] 0.1767767
ME## [1] 0.3447146
Didapat Standar Deviasi of Error adalah 0.1767767 atau 17.68% dan Magin of Error adalah 0.3447146 atau 34.47%.
diffp <- p1-p2 # Propotion Difference
lb <- diffp - ME # Lower Interval
ub <- diffp + ME # Upper Interval
CI <- c(lb, ub) # Confidence Interval
CI## [1] -0.1572146 0.5322146
Dengan level signifikan 95% didapat selang kepercayaannya adalah -0.1572146 < p < 0.5322146.
diffp## [1] 0.1875
Perbedaan proporsi antara penggunaan bahan bakar model automatic dengan model manual adalah 0.1875 atau 18.75%.
Exercise 3
In the built-in data set named quine, children from an Australian
town is classified by ethnic background, gender, age, learning status
and the number of days absent from school. In effect, the data frame
column Eth indicates whether the student is Aboriginal or Not (“A” or
“N”), and the column Sex indicates Male or Female (“M” or “F”). Assuming
that the data in quine follows the normal distribution, find the 95%
confidence interval estimate of the difference between the female
proportion of Aboriginal students and the female proportion of
Non-Aboriginal students, each within their own ethnic group.
In R, we can tally the student ethnicity against the gender with
the table function. As the result shows, within the Aboriginal student
population, 38 students are female. Whereas within the Non-Aboriginal
student population, 42 are female.
Estimate the difference between two population proportions using
your textbook formula.
DT::datatable(quine) # load your data set in a tableAnswer 3
library(dplyr)
n = 146 # Size of Population
n1 = quine %>% # Size of Male Aboriginal
filter(Eth == "A", Sex == "M") %>%
nrow()
n2 = quine %>% # Size of Male Non-Aboriginal
filter(Eth == "N", Sex == "M") %>%
nrow()
n3 = quine %>% # Size of Female Aboriginal
filter(Eth == "A", Sex == "F") %>%
nrow()
n4 = quine %>% # Size of Female Non-Aboriginal
filter(Eth == "N", Sex == "F") %>%
nrow()
data.frame(Quine = c("A", "N", "Total"),
M = c(n1, n2, n1+n2),
F = c(n3, n4, n3+n4),
Total = c(n1+n3, n2+n4, sum(n1,n2,n3,n4)))p1 <- n3/(n1+n3) # Propotion of Female Aboriginal
p2 <- n4/(n2+n4) # Propotion of Female Non-Aboriginal
SE <- sqrt((p1*(1-p1)/(n1+n3)) + (p2*(1-p2)/(n2+n4))) # Standar Deviasi of Error
ME <- 1.95 * SE # Margin of Error
SE## [1] 0.08249739
ME## [1] 0.1608699
Didapat Standar Deviasi of Error adalah 0.08249739 atau 8.25% dan Magin of Error adalah 0.1608699 atau 16.09%.
diffp <- p1-p2 # Propotion Difference
lb <- diffp - ME # Lower Interval
ub <- diffp + ME # Upper Interval
CI <- c(lb, ub) # Confidence Interval
CI## [1] -0.1555998 0.1661400
Dengan level signifikan 95% didapat selang kepercayaannya adalah -0.1555998 < p < 0.1661400.
diffp## [1] 0.005270092
Perbedaan proporsi antara siswi etnis aboriginal dengan siswi etnis non-aboriginal adalah 0.005270092 atau 0.53%.