Komputasi Statistika

~ Tugas 4 ~

Kontak	: \(\downarrow\)
Email	diyasaryanugroho@gmail.com
Instagram	https://www.instagram.com/diasary_nm/
RPubs	https://rpubs.com/diyasarya/

Exercise 1

In the built-in data set named immer, the barley yield in years 1931 and 1932 of the same field are recorded. The yield data are presented in the data frame columns Y1 and Y2. Assuming that the data in immer follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean barley yields between years 1931 and 1932.

Estimate the difference between the means of matched samples using your textbook formula.

library(MASS)                                         # load the MASS package 
DT::datatable(immer)                                  # load your data set in a table

Answer 1

Hypothesis
H0 : Tidak terdapat perbedaan rata-rata antara tahun 1931 dengan tahun 1932 (μ1 = μ2).
H1 : Terdapat perbedaan rata-rata antara tahun 1931 dengan tahun 1932 (µ1 ≠ μ2).

n <- 30
rata1 <- mean(immer$Y1)           # Mean year 1931
rata2 <- mean(immer$Y2)           # Mean year 1932
sd1 <- sd(immer$Y1)               # Standar Deviasi year 1931
sd2 <- sd(immer$Y2)               # Standar Deviasi year 1932
sd3 <- sqrt(1/2*(sd1^2 + sd2^2))  # Pool Standar Deviasi
sddiff <- sd3*sqrt(2/n)           # Difference of Standar Deviasi
df <- 2*n-2                       # Degree of Freedom
t <- (rata1 - rata2)/(sd3*sqrt(2/n))  # T Test
t

## [1] 2.319955

tcrit <- qt(0.025, df)            # Critical T Zone
tcrit1 <- c(tcrit, -tcrit)    
tcrit1

## [1] -2.001717  2.001717

Dari dua kodingan diatas, kita dapat T-Score = 2.319955 dengan daerah kritisnya adalah - 2.001717 < T < 2.001717. Dimana T-Score = 2.31995 > Tcrit = 2.001717 dengan kata lain T-Score diluar dari daerah kritisnya yang berarti H0 ditolak atau “Terdapat perbedaan rata-rata antara tahun 1931 dengan tahun 1932”.

lower <- (rata1 - rata2) + tcrit * sddiff       # Lower Interval
upper <- (rata1 - rata2) - tcrit * sddiff       # Upper Interval
CI <- c(lower, upper)                           # Convidence Interval
CI

## [1]  2.182895 29.643772

Dengan level signifikan 95% didapat selang kepercayaannya adalah 2.182895 < μ < 29.643772.

diff <- rata1 - rata2                         # Mean Difference
diff

## [1] 15.91333

Perbedaan rata-rata tahun 1931 dengan tahun 1932 adalah 15.91333.

Exercise 2

In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles. Meanwhile, another data column in mtcars, named am, indicates the transmission type of the automobile model (0 = automatic, 1 = manual). In particular, the gas mileage for manual and automatic transmissions are two independent data populations. Assuming that the data in mtcars follows the normal distribution, find the 95% confidence interval estimate of the difference between the proportion gas mileage of manual and automatic transmissions.

Estimate the difference between two population proportions using your textbook formula.

DT::datatable(mtcars)                                 # load your data set in a table

Answer 2

library(dplyr)
n = 32                                        # Size of Population
n1 = mtcars %>%                               # Size of model automatic
  filter(am == 0) %>%
  nrow()    
n2 = mtcars %>%                               # Size of model manual
  filter(am == 1) %>%
  nrow()
p1 <- n1/n                                    # Propotion of model automatic
p2 <- n2/n                                    # Propotion of model manual
SE <- sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))   # Standar Deviasi of Error
ME <- 1.95 * SE                               # Margin of Error
SE

## [1] 0.1767767

ME

## [1] 0.3447146

Didapat Standar Deviasi of Error adalah 0.1767767 atau 17.68% dan Magin of Error adalah 0.3447146 atau 34.47%.

diffp <- p1-p2                              # Propotion Difference
lb <- diffp - ME                            # Lower Interval
ub <- diffp + ME                            # Upper Interval
CI <- c(lb, ub)                             # Confidence Interval
CI

## [1] -0.1572146  0.5322146

Dengan level signifikan 95% didapat selang kepercayaannya adalah -0.1572146 < p < 0.5322146.

diffp

## [1] 0.1875

Perbedaan proporsi antara penggunaan bahan bakar model automatic dengan model manual adalah 0.1875 atau 18.75%.

Exercise 3

In the built-in data set named quine, children from an Australian town is classified by ethnic background, gender, age, learning status and the number of days absent from school. In effect, the data frame column Eth indicates whether the student is Aboriginal or Not (“A” or “N”), and the column Sex indicates Male or Female (“M” or “F”). Assuming that the data in quine follows the normal distribution, find the 95% confidence interval estimate of the difference between the female proportion of Aboriginal students and the female proportion of Non-Aboriginal students, each within their own ethnic group.

In R, we can tally the student ethnicity against the gender with the table function. As the result shows, within the Aboriginal student population, 38 students are female. Whereas within the Non-Aboriginal student population, 42 are female.

Estimate the difference between two population proportions using your textbook formula.

DT::datatable(quine)                                  # load your data set in a table

Answer 3

library(dplyr)
n = 146                                       # Size of Population
n1 = quine %>%                                # Size of Male Aboriginal
  filter(Eth == "A", Sex == "M") %>%
  nrow() 
n2 = quine %>%                                # Size of Male Non-Aboriginal
  filter(Eth == "N", Sex == "M") %>%
  nrow() 
n3 = quine %>%                                # Size of Female Aboriginal
  filter(Eth == "A", Sex == "F") %>%
  nrow()
n4 = quine %>%                                # Size of Female Non-Aboriginal
  filter(Eth == "N", Sex == "F") %>%
  nrow()
data.frame(Quine = c("A", "N", "Total"),
           M = c(n1, n2, n1+n2),
           F = c(n3, n4, n3+n4),
           Total = c(n1+n3, n2+n4, sum(n1,n2,n3,n4)))

p1 <- n3/(n1+n3)                              # Propotion of Female Aboriginal
p2 <- n4/(n2+n4)                              # Propotion of Female Non-Aboriginal
SE <- sqrt((p1*(1-p1)/(n1+n3)) + (p2*(1-p2)/(n2+n4)))   # Standar Deviasi of Error
ME <- 1.95 * SE                               # Margin of Error
SE

## [1] 0.08249739

ME

## [1] 0.1608699

Didapat Standar Deviasi of Error adalah 0.08249739 atau 8.25% dan Magin of Error adalah 0.1608699 atau 16.09%.

diffp <- p1-p2                              # Propotion Difference
lb <- diffp - ME                            # Lower Interval
ub <- diffp + ME                            # Upper Interval
CI <- c(lb, ub)                             # Confidence Interval
CI

## [1] -0.1555998  0.1661400

Dengan level signifikan 95% didapat selang kepercayaannya adalah -0.1555998 < p < 0.1661400.

diffp

## [1] 0.005270092

Perbedaan proporsi antara siswi etnis aboriginal dengan siswi etnis non-aboriginal adalah 0.005270092 atau 0.53%.