dr.sc. Luka Šikić
16 prosinac, 2019
## [1] 50 60 60 64 66 66 67 69 70 74 76 76 77 79 79 79 81 82 82 89
## [1] 72.3
\[ \begin{array}{ll} H_0: & \mu = 67.5 \\ H_1: & \mu \neq 67.5 \end{array} \]
Grafički prikaz nulte i alternativne hipoteze pod pretpostvkom jednostranog \(z\)- testa. Nulta i alternativna hipoteza pretpostavljaju da populacija (podatci) prati standarnu distribuciju i da je standardna devijacija poznata (\(\sigma_0\)). Pod nultom hipotezom je prosjek populacije \(\mu\) jednak apriori definiranoj vrijednosti \(\mu_0\). Pod alternativnom hipotezom prosjek populacije nije jednak tako definiranoj vrijednosti, \(\mu \neq \mu_0\).
Puna linija predstavlja teoretsku distribuciju pod nultom hipotezom iz koje su “generirane” ocjene studenata sociologije.
\[ \bar{X} - \mu_0 \]
\[ X \sim \mbox{Normal}(\mu_0,\sigma^2) \]
\[ \mbox{SE}({\bar{X}}) = \frac{\sigma}{\sqrt{N}} \]
\[ \bar{X} \sim \mbox{Normal}(\mu_0,\mbox{SE}({\bar{X}})) \]
\[ z_{\bar{X}} = \frac{\bar{X} - \mu_0}{\mbox{SE}({\bar{X}})} \]
\[ z_{\bar{X}} = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{N}} \]
\[ z_{\bar{X}} \sim \mbox{Normal}(0,1) \]
Kritičke regije za dvostrani test \(z\)-test
Kritičke regije za jednostrani \(z\)-test
# Definiraj varijablu sa prosjekom ocjena u uzorku
sample.mean <- mean( grades )
print( sample.mean ) # Pogledaj podatke## [1] 72.3
# Definiraj pretpostavljeni prosjek populacije
mu.null <- 67.5
# Definiraj pretpostavljenu standardnu devijaciju populacije
sd.true <- 9.5## [1] 20
# Definiraj standardnu pogrešku sampling distribucije prosjeka (uzorka)
sem.true <- sd.true / sqrt(N)
print(sem.true) # Pogledaj podatke## [1] 2.124265
# Spremi testnu statistiku u varijablu
z.score <- (sample.mean - mu.null) / sem.true
print( z.score ) # Pogledaj podatke## [1] 2.259606
# Vjerojatnost u gornjem dijelu distribucije
upper.area <- pnorm( q = z.score, lower.tail = FALSE )
print( upper.area ) # Pogledaj podatke## [1] 0.01192287
# Vjerojatnost u donjem dijelu distribucije
lower.area <- pnorm( q = -z.score, lower.tail = TRUE )
print( lower.area ) # Pogledaj podatke## [1] 0.01192287
## [1] 0.02384574
Grafički prikaz nulte i alternativne hipoteze kod \(t\)-testa. Primijeti sličnosti u usporedbi sa \(z\)-testom. Pod nultom hipotezom je prosjek populacije \(\mu\) jednak nekoj apriori specificiranoj vrijednosti \(\mu_0\), a pod alternativnom nije tako. Kao kod \(z\)-testa prtpostavljamo standardnu distribuciju; razlika se odnosi na to da kod t-testa(distribucije) ne pretpostavljamo da je standardna devijacija \(\sigma\) unaprijed poznata.
## [1] 9.520615
\[ t = \frac{\bar{X} - \mu}{\hat{\sigma}/\sqrt{N} } \] - Prikaži distribuciju grafički
\(t\) distribucija sa 2 stupnja slobode(l) i 10 stupnjeva slobode(d) i standardna distribucija(prosjek 0, i st_dev 1) prikazana isprekidanom linijom. \(t\) distribucija ima deblje repove(viša asimetričnost) od standardne distribucije. Ova razlika je izražena kod malog broja stupnjeva slobode ali zanemariva za više vrijednosti stupnjeva slobode. Za veći broj stupnjeva slobode je \(t\) distribucija skoro identična normalnoj distribuciji.
##
## One sample t-test
##
## Data variable: grades
##
## Descriptive statistics:
## grades
## mean 72.300
## std dev. 9.521
##
## Hypotheses:
## null: population mean equals 67.5
## alternative: population mean not equal to 67.5
##
## Test results:
## t-statistic: 2.255
## degrees of freedom: 19
## p-value: 0.036
##
## Other information:
## two-sided 95% confidence interval: [67.844, 76.756]
## estimated effect size (Cohen's d): 0.504
\(t(19) = 2.25\), \(p<.05\), CI\(_{95} = [67.8, 76.8]\)
## 'data.frame': 33 obs. of 2 variables:
## $ grade: num 65 72 66 74 73 71 66 76 69 79 ...
## $ tutor: Factor w/ 2 levels "Anastasia","Bernadette": 1 2 2 1 1 2 2 2 2 2 ...
## grade tutor
## 1 65 Anastasia
## 2 72 Bernadette
## 3 66 Bernadette
## 4 74 Anastasia
## 5 73 Anastasia
## 6 71 Bernadette
| Prosjek | std dev | N | |
|---|---|---|---|
| Anastasia | 74.53 | 9.00 | 15 |
| Bernadette | 69.06 | 5.77 | 18 |
Histogram prikazuje distribuciju ocjena u Anastasijnom razredu
Histogram prikazuje distribuciju ocjena u Bernadettinom razredu
\[ \begin{array}{ll} H_0: & \mu_1 = \mu_2 \\ H_1: & \mu_1 \neq \mu_2 \end{array} \]
\[ t = \frac{\bar{X}_1 - \bar{X}_2}{\mbox{SE}} \]
Grafički prikaz nulte i alternativne hipoteze kod Studentovog \(t\)-testa. Nulta hipoteza pretpostavlja da obje grupe imaju jednak prosjek \(\mu_1\) i \(\mu_2\) dok su ti prosjeci pod alternativnom hipotezom različiti. Primijeti pretpostavku da su populacijske distribucije normalne i da imaju jednaku standardnu devijaciju.
\[ \mu_1 - \mu_2 = 0 \]
\[ \bar{X}_1 - \bar{X}_2 \]
\[ \begin{array}{rcl} w_1 &=& N_1 - 1\\ w_2 &=& N_2 - 1 \end{array} \]
\[ \hat\sigma^2_p = \frac{w_1 {\hat\sigma_1}^2 + w_2 {\hat\sigma_2}^2}{w_1 + w_2} \]
\[ \hat\sigma_p = \sqrt{\frac{w_1 {\hat\sigma_1}^2 + w_2 {\hat\sigma_2}^2}{w_1 + w_2}} \]
\[ X_{ik} - \bar{X}_k \]
\[ \frac{\sum_{ik} \left( X_{ik} - \bar{X}_k \right)^2}{N} \] 3. Izvrši korekciju(nazivnik)
\[ \hat\sigma^2_p = \frac{\sum_{ik} \left( X_{ik} - \bar{X}_k \right)^2}{N -2} \]
\[ \mbox{SE}({\bar{X}_1 - \bar{X}_2}) = \hat\sigma \sqrt{\frac{1}{N_1} + \frac{1}{N_2}} \]
\[ t = \frac{\bar{X}_1 - \bar{X}_2}{\mbox{SE}({\bar{X}_1 - \bar{X}_2})} \]
## grade tutor
## 1 65 Anastasia
## 2 72 Bernadette
## 3 66 Bernadette
## 4 74 Anastasia
## 5 73 Anastasia
## 6 71 Bernadette
# Izvedi test
independentSamplesTTest(
formula = grade ~ tutor, # Formula za zavisnu i nezavisnu varijablu
data = harpo, # Podatci
var.equal = TRUE # Pretpostavka jednakih varijanci
)##
## Student's independent samples t-test
##
## Outcome variable: grade
## Grouping variable: tutor
##
## Descriptive statistics:
## Anastasia Bernadette
## mean 74.533 69.056
## std dev. 8.999 5.775
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: 2.115
## degrees of freedom: 31
## p-value: 0.043
##
## Other information:
## two-sided 95% confidence interval: [0.197, 10.759]
## estimated effect size (Cohen's d): 0.74
\(t(31) = 2.1\), \(p<.05\), \(CI_{95} = [0.2, 10.8]\), \(d = .74\)
\[ t = \frac{\bar{X}_1 - \bar{X}_2}{\mbox{SE}({\bar{X}_1 - \bar{X}_2})} \]
\[ \mbox{SE}({\bar{X}_1 - \bar{X}_2}) = \sqrt{ \frac{{\hat{\sigma}_1}^2}{N_1} + \frac{{\hat{\sigma}_2}^2}{N_2} } \]
\[ \mbox{df} = \frac{ ({\hat{\sigma}_1}^2 / N_1 + {\hat{\sigma}_2}^2 / N_2)^2 }{ ({\hat{\sigma}_1}^2 / N_1)^2 / (N_1 -1 ) + ({\hat{\sigma}_2}^2 / N_2)^2 / (N_2 -1 ) } \]
Grafički prikaz nulte i alternativne hipoteze za Welch \(t\)-test. Kao kod studentovog \(t\)-testa pretpostavljamo normalnu distribuciju ali valja primijetiti da pod alternativnom hipotezom više ne zahtijevamo da oba uzorka imaju jednake varijance.
independentSamplesTTest(
formula = grade ~ tutor, # Formula za zavisnu i nezavisnu varijablu
data = harpo # Podatci
)##
## Welch's independent samples t-test
##
## Outcome variable: grade
## Grouping variable: tutor
##
## Descriptive statistics:
## Anastasia Bernadette
## mean 74.533 69.056
## std dev. 8.999 5.775
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: 2.034
## degrees of freedom: 23.025
## p-value: 0.054
##
## Other information:
## two-sided 95% confidence interval: [-0.092, 11.048]
## estimated effect size (Cohen's d): 0.724
## 'data.frame': 20 obs. of 3 variables:
## $ id : Factor w/ 20 levels "student1","student10",..: 1 12 14 15 16 17 18 19 20 2 ...
## $ grade_test1: num 42.9 51.8 71.7 51.6 63.5 58 59.8 50.8 62.5 61.9 ...
## $ grade_test2: num 44.6 54 72.3 53.4 63.8 59.3 60.8 51.6 64.3 63.2 ...
## id grade_test1 grade_test2
## 1 student1 42.9 44.6
## 2 student2 51.8 54.0
## 3 student3 71.7 72.3
## 4 student4 51.6 53.4
## 5 student5 63.5 63.8
## 6 student6 58.0 59.3
## Warning: package 'psych' was built under R version 3.6.1
## vars n mean sd median trimmed mad min max range skew
## id* 1 20 10.50 5.92 10.5 10.50 7.41 1.0 20.0 19.0 0.00
## grade_test1 2 20 56.98 6.62 57.7 56.92 7.71 42.9 71.7 28.8 0.05
## grade_test2 3 20 58.38 6.41 59.7 58.35 6.45 44.6 72.3 27.7 -0.05
## kurtosis se
## id* -1.38 1.32
## grade_test1 -0.35 1.48
## grade_test2 -0.39 1.43
Prosječna ocjena za test 1 i 2, uz prateće 95% intervale pouzdanosti.
Dijagram rasipanja za odnos ocjena na prvom i drugom testu
# Stvori vektor razlika u ocjenama između prvog i drugog testa
chico$improvement <- chico$grade_test2 - chico$grade_test1 ## id grade_test1 grade_test2 improvement
## 1 student1 42.9 44.6 1.7
## 2 student2 51.8 54.0 2.2
## 3 student3 71.7 72.3 0.6
## 4 student4 51.6 53.4 1.8
## 5 student5 63.5 63.8 0.3
## 6 student6 58.0 59.3 1.3
Histogram prikazuje individualna poboljšanja ocjene između prvog i drugog testa. Valja primijetiti da je gotovo cjelokupna distribucija iznad 0: najveći broj studenata je poboljšao rezultat na drugom testu.
## 2.5% 97.5%
## [1,] 0.9508686 1.859131
\[ D_{i} = X_{i1} - X_{i2} \]
\[ \begin{array}{ll} H_0: & \mu_D = 0 \\ H_1: & \mu_D \neq 0 \end{array} \]
\[ t = \frac{\bar{D}}{\mbox{SE}({\bar{D}})} \]
\[ t = \frac{\bar{D}}{\hat\sigma_D / \sqrt{N}} \]
##
## One sample t-test
##
## Data variable: chico$improvement
##
## Descriptive statistics:
## improvement
## mean 1.405
## std dev. 0.970
##
## Hypotheses:
## null: population mean equals 0
## alternative: population mean not equal to 0
##
## Test results:
## t-statistic: 6.475
## degrees of freedom: 19
## p-value: <.001
##
## Other information:
## two-sided 95% confidence interval: [0.951, 1.859]
## estimated effect size (Cohen's d): 1.448
pairedSamplesTTest(
formula = ~ grade_test2 + grade_test1, # Formula za definiranje zavisnih i nezavisnih varijabli
data = chico # Podatci
)##
## Paired samples t-test
##
## Variables: grade_test2 , grade_test1
##
## Descriptive statistics:
## grade_test2 grade_test1 difference
## mean 58.385 56.980 1.405
## std dev. 6.406 6.616 0.970
##
## Hypotheses:
## null: population means equal for both measurements
## alternative: different population means for each measurement
##
## Test results:
## t-statistic: 6.475
## degrees of freedom: 19
## p-value: <.001
##
## Other information:
## two-sided 95% confidence interval: [0.951, 1.859]
## estimated effect size (Cohen's d): 1.448
# Prestrukturiraj podatke
chico2 <- wideToLong( chico, within="time" )
head( chico2 ) # Pregledaj podatke## id improvement time grade
## 1 student1 1.7 test1 42.9
## 2 student2 2.2 test1 51.8
## 3 student3 0.6 test1 71.7
## 4 student4 1.8 test1 51.6
## 5 student5 0.3 test1 63.5
## 6 student6 1.3 test1 58.0
## id improvement time grade
## 1 student1 1.7 test1 42.9
## 21 student1 1.7 test2 44.6
## 10 student10 1.3 test1 61.9
## 30 student10 1.3 test2 63.2
## 11 student11 1.4 test1 50.4
## 31 student11 1.4 test2 51.8
# Provedi test
pairedSamplesTTest(
formula = grade ~ time, # Definiraj formulu
data = chico2, # Podatci
id = "id" # Naziv id
)##
## Paired samples t-test
##
## Outcome variable: grade
## Grouping variable: time
## ID variable: id
##
## Descriptive statistics:
## test1 test2 difference
## mean 56.980 58.385 -1.405
## std dev. 6.616 6.406 0.970
##
## Hypotheses:
## null: population means equal for both measurements
## alternative: different population means for each measurement
##
## Test results:
## t-statistic: -6.475
## degrees of freedom: 19
## p-value: <.001
##
## Other information:
## two-sided 95% confidence interval: [-1.859, -0.951]
## estimated effect size (Cohen's d): 1.448
pairedSamplesTTest(
formula = grade ~ time + (id),
data = chico2
)
pairedSamplesTTest( grade ~ time + (id), chico2 )
library(psych)
library(lsr)
# Provedi test
oneSampleTTest( x = grades,
mu = 67.5,
one.sided = "greater" # Gornja granica
) ##
## One sample t-test
##
## Data variable: grades
##
## Descriptive statistics:
## grades
## mean 72.300
## std dev. 9.521
##
## Hypotheses:
## null: population mean less than or equal to 67.5
## alternative: population mean greater than 67.5
##
## Test results:
## t-statistic: 2.255
## degrees of freedom: 19
## p-value: 0.018
##
## Other information:
## one-sided 95% confidence interval: [68.619, Inf]
## estimated effect size (Cohen's d): 0.504
# Provedi test na drugi način
independentSamplesTTest(
formula = grade ~ tutor,
data = harpo,
one.sided = "Anastasia"
)##
## Welch's independent samples t-test
##
## Outcome variable: grade
## Grouping variable: tutor
##
## Descriptive statistics:
## Anastasia Bernadette
## mean 74.533 69.056
## std dev. 8.999 5.775
##
## Hypotheses:
## null: population means are equal, or smaller for group 'Anastasia'
## alternative: population mean is larger for group 'Anastasia'
##
## Test results:
## t-statistic: 2.034
## degrees of freedom: 23.025
## p-value: 0.027
##
## Other information:
## one-sided 95% confidence interval: [0.863, Inf]
## estimated effect size (Cohen's d): 0.724
# Provedi test na treći način
pairedSamplesTTest(
formula = ~ grade_test2 + grade_test1,
data = chico,
one.sided = "grade_test2"
)##
## Paired samples t-test
##
## Variables: grade_test2 , grade_test1
##
## Descriptive statistics:
## grade_test2 grade_test1 difference
## mean 58.385 56.980 1.405
## std dev. 6.406 6.616 0.970
##
## Hypotheses:
## null: population means are equal, or smaller for measurement 'grade_test2'
## alternative: population mean is larger for measurement 'grade_test2'
##
## Test results:
## t-statistic: 6.475
## degrees of freedom: 19
## p-value: <.001
##
## Other information:
## one-sided 95% confidence interval: [1.03, Inf]
## estimated effect size (Cohen's d): 1.448
# Provedi standardni test usporedbe prosjeka
t.test( x = grades, # Definiraj podatke
mu = 67.5 # Definiraj prosjek
)##
## One Sample t-test
##
## data: grades
## t = 2.2547, df = 19, p-value = 0.03615
## alternative hypothesis: true mean is not equal to 67.5
## 95 percent confidence interval:
## 67.84422 76.75578
## sample estimates:
## mean of x
## 72.3
# Provedi test za nezavisne uzorke
t.test( formula = grade ~ tutor, # Definiraj formulu
data = harpo ) # Definiraj podatke##
## Welch Two Sample t-test
##
## data: grade by tutor
## t = 2.0342, df = 23.025, p-value = 0.05361
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09249349 11.04804904
## sample estimates:
## mean in group Anastasia mean in group Bernadette
## 74.53333 69.05556
# Provedi test za zavisne uzorke
t.test( x = chico$grade_test2, # Definiraj varijablu
y = chico$grade_test1, # Definiraj varijablu
paired = TRUE # Zavisni uzorci
)##
## Paired t-test
##
## data: chico$grade_test2 and chico$grade_test1
## t = 6.4754, df = 19, p-value = 3.321e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.9508686 1.8591314
## sample estimates:
## mean of the differences
## 1.405
\[ d = \frac{\mbox{(prosjek 1)} - \mbox{(prosjek 2)}}{\mbox{std dev}} \]
| \(d\)-vrijednost | okvirna interpretsacija |
|---|---|
| otprilike 0.2 | mali efekt |
| otprilike 0.5 | umjereni efekt |
| otprilike 0.8 | veliki efekt |
\[ d = \frac{\bar{X} - \mu_0}{\hat{\sigma}} \]
# Provedi Choen test za jedan uzorak
cohensD( x = grades, # Podatci
mu = 67.5 # cUsporedi sa prosjekom od 67.5
)## [1] 0.5041691
## [1] 0.5041691
\[ \delta = \frac{\mu_1 - \mu_2}{\sigma} \]
\[ d = \frac{\bar{X}_1 - \bar{X}_2}{\hat{\sigma}_p} \]
# Provedi test u R
cohensD( formula = grade ~ tutor, # Definiraj formulu
data = harpo, # Podatci
method = "pooled" # Tip testa
)## [1] 0.7395614
\[ \delta^\prime = \frac{\mu_1 - \mu_2}{\sigma^\prime} \]
\[ \sigma^\prime = \sqrt{\displaystyle{\frac{ {\sigma_1}^2 + {\sigma_2}^2}{2}}} \]
\[ d = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\displaystyle{\frac{ {\hat\sigma_1}^2 + {\hat\sigma_2}^2}{2}}}} \]
# Provedi test u R
cohensD( formula = grade ~ tutor, # Definiraj formulu
data = harpo, # Podatci
method = "unequal" # Tip testa
)## [1] 0.7244995
\[ d = \frac{\bar{D}}{\hat{\sigma}_D} \]
# Provedi test u R
cohensD( x = chico$grade_test2, # Definiraj prvu varijablu
y = chico$grade_test1, # Definiraj drugu varijablu
method = "paired" # Izaberi metodu
)## [1] 1.447952
Histogram normalno distribuiranih podataka; prikaz se odnosi na simulaciju 100 opservacija.
## Normalno distribuirani podatci
## Asimetrija= -0.02936155
## Zakrivljenost= -0.06035938
##
## Shapiro-Wilk normality test
##
## data: data
## W = 0.99108, p-value = 0.7515
QQ plot normalno distribuiranih podataka, prikaz se odnosi na simulaciju 100 opservacija.
Histogram 100 opservacija “zakrivljeno” distribuiranih podataka.
## Podatci sa gamma distribucijom.
## Asimetrija= 1.889475
## Zakrivljenost= 4.4396
##
## Shapiro-Wilk normality test
##
## data: data
## W = 0.81758, p-value = 8.908e-10
QQ plot normalno distribuiranih, “zakrivljenih” podataka, prikaz se odnosi na simulaciju 100 opservacija.
Histogram 100 opservacija u distribuciji koja ima puno mase u repovima.
## Heavy-Tailed Data
## Asimetrija= -0.05308273
## Zakrivljenost= 7.508765
##
## Shapiro-Wilk normality test
##
## data: data
## W = 0.83892, p-value = 4.718e-09
Histogram 100 opservacija u distribuciji koja ima puno mase u repovima.
normal.data <- rnorm( n = 100 ) # Stvori 100 normalno distribuiranih brojeva
hist( x = normal.data ) # Napravi histogram\[ W = \frac{ \left( \sum_{i = 1}^N a_i X_i \right)^2 }{ \sum_{i = 1}^N (X_i - \bar{X})^2} \]
##
## Shapiro-Wilk normality test
##
## data: normal.data
## W = 0.98654, p-value = 0.4076
## scores group
## 1 6.4 A
## 2 10.7 A
## 3 11.9 A
## 4 7.3 A
## 5 10.0 A
## 6 14.5 B
## 7 10.4 B
## 8 12.9 B
## 9 11.7 B
## 10 13.0 B
##
## Wilcoxon rank sum test
##
## data: scores by group
## W = 3, p-value = 0.05556
## alternative hypothesis: true location shift is not equal to 0
## [1] 6.4 10.7 11.9 7.3 10.0
## [1] 14.5 10.4 12.9 11.7 13.0
##
## Wilcoxon rank sum test
##
## data: score.A and score.B
## W = 3, p-value = 0.05556
## alternative hypothesis: true location shift is not equal to 0
## before after change
## 1 30 6 -24
## 2 43 29 -14
## 3 21 11 -10
## 4 24 31 7
## 5 23 17 -6
## 6 40 2 -38
## 7 29 31 2
## 8 56 21 -35
## 9 38 8 -30
## 10 16 21 5
##
## Wilcoxon signed rank test
##
## data: happiness$change
## V = 7, p-value = 0.03711
## alternative hypothesis: true location is not equal to 0
# Wilcoxonov test za zavisne uzorke
wilcox.test( x = happiness$after,
y = happiness$before,
paired = TRUE
)##
## Wilcoxon signed rank test
##
## data: happiness$after and happiness$before
## V = 7, p-value = 0.03711
## alternative hypothesis: true location shift is not equal to 0