Литература

Basic Statistics in R:

Reproductible research in R:

\( \chi^2 \):

План

Оценки

# равномерное распределение:
runiform <- round(runif(100, 0, 10), 0)
runiform[1:10]
##  [1]  7  1  4  4  8 10  1  7  7  4
mean(runiform)
## [1] 5.21

return_mean <- function(x, n) {
    runiform <- round(runif(n, 0, 10), 0)
    runiform[1:10]
    return(mean(runiform))
}

1000 воспроизведений, объём выборки 10

replications <- 1:1000

unif.10 <- sapply(replications, return_mean, 10)
summary(unif.10)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.20    4.40    5.00    5.05    5.70    8.00 
hist(unif.10)
lines(density(unif.10))

plot of chunk unnamed-chunk-2

Проверка статистических гипотез

Ошибки I и II рода (таблица из Wikipedia)

H0 is true
Truly not guilty
H1 is true
Truly guilty
Accept Null Hypothesis
Acquittal
Right decision Wrong decision
Type II Error
Reject Null Hypothesis
Conviction
Wrong decision
Type I Error
Right decision

Критерий согласия Пирсона (Критерий \( \chi^2 \) Пирсона)

Виды:

Goodness-of-fit (пример из Simple R)

\[ \chi^2 = \sum \limits_{i=1}^{n}\frac{(O_i - E_i)^2}{E_i} \]

freq = c(22, 21, 22, 27, 22, 36)
# specify probabilities, (uniform,likethis,is default though)
probs = c(1, 1, 1, 1, 1, 1)/6  # or use rep(1/6,6)
chisq.test(freq, p = probs)
## 
##  Chi-squared test for given probabilities
## 
## data:  freq 
## X-squared = 6.72, df = 5, p-value = 0.2423
## 

Test for independence

Задачи приведены по Д/З 2 по курсу Э. Понарина

Задача 1

Code chunk from task (probably by Prof. Ponarin):

freq <- c(50, 45, 8, 18, 8, 28, 174, 84, 154, 55, 11, 78, 110, 223, 
    96, 14, 150, 185, 714, 447, 0, 42, 72, 320, 411)
fo <- gl(5, 5, 25, ordered = T)
fo
##  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
## Levels: 1 < 2 < 3 < 4 < 5
so <- gl(5, 1, 25, ordered = T)
so
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
## Levels: 1 < 2 < 3 < 4 < 5
TabMod <- xtabs(freq ~ fo + so)

Hint pictures for task 1 (compare to conclusion for the task):

library(vcd)
mosaic(TabMod, shade = T)

plot of chunk unnamed-chunk-5

chisq.test(TabMod)
## 
##  Pearson's Chi-squared test
## 
## data:  TabMod 
## X-squared = 1199, df = 16, p-value < 2.2e-16
## 

Expected values:

tm.exp <- chisq.test(TabMod)$exp

Residuals table (Compare to the pics and the conclusion):

round((TabMod - tm.exp), 2)
##    so
## fo       1      2      3      4      5
##   1  46.20  26.96  -8.93 -34.71 -29.52
##   2  13.42 104.78  19.03 -48.27 -88.96
##   3  -4.26   5.57  42.01  11.33 -54.65
##   4 -30.48 -61.15 -13.20  96.96   7.86
##   5 -24.89 -76.16 -38.91 -25.30 165.26

Задача 2

Code chunk from task (probably by Prof. Ponarin):

Journals <- matrix(c(61, 63, 20, 59, 42, 3, 47, 41), ncol = 4, byrow = T)
Journals
##      [,1] [,2] [,3] [,4]
## [1,]   61   63   20   59
## [2,]   42    3   47   41
row.names(Journals) <- LETTERS[1:2]
Journals
##   [,1] [,2] [,3] [,4]
## A   61   63   20   59
## B   42    3   47   41
colnames(Journals) <- c("Neither", "A only", "B only", "Both")
Journals
##   Neither A only B only Both
## A      61     63     20   59
## B      42      3     47   41

Hint pics:

plot of chunk unnamed-chunk-9

plot of chunk unnamed-chunk-10