Statistical Analysis : Parameter Estimation and Hypothesis Testing

A retrospective study

A retrospective study is an observational study that enrolls participants who already have a disease or condition. In other words, all cases have already happened before the study begins. Researchers then look back in time, using questionnaires, medical records and other methods; Basically, you just dig into the data and see what you find. The goal is to find out what potential risk factors or other associations and relationships the group has in common.

mat_smoke = matrix(c(688,21,650,59),2,2)
colnames(mat_smoke) = c("cases","control")
rownames(mat_smoke) = c("yes", "no")

mat_smoke %>% kbl %>%  kable_styling()

	cases	control
yes	688	650
no	21	59

risk_diff_conf = prop.test(x = c(mat_smoke[1,1],mat_smoke[2,1] ), n = c(sum(mat_smoke[1,]),sum(mat_smoke[2,]) ) )

risk_diff_conf$conf.int

## [1] 0.1450106 0.3583900
## attr(,"conf.level")
## [1] 0.95

Kita yakin 95% perbedaan proporso dari prevalensi kangker tenggorokan antara perokok dan bukan perokok diantara selang tersebut.

Kerena selang tidak mencakup 0, kita menyimpulkan terdapat perbedaan prevalensi kanker tenggorokan antara smoker dan non-smoker.

Humberger Meat : Normal Distribution

mu = 1
sd = 0.15

prob_1 = pnorm(1,mu,sd,lower.tail = F)
prob_2 = pnorm(1.05,mu,sd)-pnorm(0.95,mu,sd)
prob_3 = pnorm(.8,mu,sd)
prob_4 = pnorm(1.45,mu,sd,lower.tail = F)

res = rbind(c("p(x > 1)" ,prob_1 ),c("p(0.95 < x < 1.05)" ,prob_2 ),c("p(x < .80)" ,prob_3 ),c("p(x > 1.45) unusual karena probability kecil" ,prob_4 ))
res %>%  kbl %>% kable_styling()

p(x > 1)	0.5
p(0.95 < x < 1.05)	0.261117319636473
p(x < .80)	0.0912112197258679
p(x > 1.45) unusual karena probability kecil	0.00134989803163009

Whitefly : Binomial -> Normal Distribution

p = .1
n = 100

#soal a
mu = n*p

#soal b
sd = sqrt(n*p*(1-p))
b = qnorm(.975,mu,sd  )
a = qnorm(.025,mu,sd  )

What is the average number of fields sampled that are infested with whitefly?	10
Within what limits would you expect to find the number of infested fields, with probability approximately 95%	[4.12,15.88]
What might you conclude if you found that x 25 fields were infested? Is it possible that one of the characteristics of a binomial experiment is not satisfied in this experiment? Explain.	Karena nilai 25 di luar selang 95%, kejadian ini kemungkinan kecil terjadi. Kemungkinan disebabkan kejadian antar field satu ke field di dekatnya terjadi tidak independent

Hypothesis Testing : mean

kuota_ipb = c(71,25,26,99,94,89,66,66,90,53, 71,22,49,68,83,78,84,30,43,96)

#Selang keparecayaan 90%
t.test(kuota_ipb,conf.level = .9)$conf.int

## [1] 55.3627 74.9373
## attr(,"conf.level")
## [1] 0.9

#Hypothesis
t.test(kuota_ipb,alternative = "greater" ,mu = 50)

## 
##  One Sample t-test
## 
## data:  kuota_ipb
## t = 2.6766, df = 19, p-value = 0.007462
## alternative hypothesis: true mean is greater than 50
## 95 percent confidence interval:
##  55.3627     Inf
## sample estimates:
## mean of x 
##     65.15

\[ H_0 : \mu \le 50 \] \[ H_1 : \mu < 50 \]

Tolak H0, p-value < 0.05.Rata-rata kuota internet mahasiswa IPB yang dihabiskan selama sebulan adalah lebih dari 50GB

Comparing book prices

site_a = c( 115, 79, 43, 140, 99, 30, 80, 99, 119, 69)
site_b = c( 110, 79, 40, 129, 99, 30, 69, 99, 109, 66)

Harga yang diambil untuk masing-masing site adalah buku yang sama. Sehingga sampel bisa dikatakan dependent antara site a dan site b

res = rbind(c("Mean site a" , mean(site_a)),
            c("Mean site b" , mean(site_b)),
            c("mean of the difference scores" , mean(site_a)-mean(site_b)))

res %>%  kbl() %>%  kable_styling()

Mean site a	87.3
Mean site b	83
mean of the difference scores	4.3

Rata-rata perbedaan harga antara site a dan site b untuk sepuluh buku tersebut adalah sebesar 4.3

result = t.test(site_a,site_b,conf.level = .9,paired = T)

result$conf.int

## [1] 1.566674 7.033326
## attr(,"conf.level")
## [1] 0.9

Dengan tingkat kepercayaan 90%, diyakini perbedaan rata-rata harga buku dari kedua website berada pada selang [ 1.566674 , 7.033326]