假设检验习题

———

alt text

使用t.test()均值检验之前，需要对数据进行正态检验、方差齐性检验。本题原假设H0：油漆工人的血小板计数与正常男子无差异，即mu=225，H1:mu!=225。

x <- c(220, 188, 162, 230, 145, 160, 238, 188, 247, 113, 126, 245, 164, 231, 
    256, 183, 190, 158, 224, 175)
# 正态性检验：
shapiro.test(x)  #p-value = 0.3768>0.05，无法拒绝原假设，因此认为X服从正态分布

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.9506, p-value = 0.3768

t.test(x, mu = 225)

## 
##  One Sample t-test
## 
## data:  x
## t = -3.4783, df = 19, p-value = 0.002516
## alternative hypothesis: true mean is not equal to 225
## 95 percent confidence interval:
##  172.3827 211.9173
## sample estimates:
## mean of x 
##    192.15

p-value = 0.002516<0.05，拒绝原假设，即认为油漆工人的血小板计数与正常男子有差异。

———

alt text

x <- c(1067, 919, 1196, 785, 1126, 936, 918, 1156, 920, 948)
1 - pnorm(1000, mean(x), sd(x))

## [1] 0.4912059

1000小时以上的概率为0.4912059

———

alt text

成对数据的均值检验：

x <- c(113, 120, 138, 120, 100, 118, 138, 123)
y <- c(138, 116, 125, 136, 110, 132, 130, 110)
t.test(x, y, paired = TRUE)

## 
##  Paired t-test
## 
## data:  x and y
## t = -0.6513, df = 7, p-value = 0.5357
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -15.628891   8.878891
## sample estimates:
## mean of the differences 
##                  -3.375

p-value = 0.5357，无法拒绝原假设，即认为两种方法疗效无差异。

———

alt text

x <- c(-0.7, -5.6, 2, 2.8, 0.7, 3.5, 4, 5.8, 7.1, -0.5, 2.5, -1.6, 1.7, 3, 0.4, 
    4.5, 4.6, 2.5, 6, -1.4)
y <- c(3.7, 6.5, 5, 5.2, 0.8, 0.2, 0.6, 3.4, 6.6, -1.1, 6, 3.8, 2, 1.6, 2, 2.2, 
    1.2, 3.1, 1.7, -2)

（1）检验数据是否服从正态分布：

shapiro.test(x)  #p-value = 0.7527>0.5，故无法拒绝原假设，认为服从正态分布

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.9699, p-value = 0.7527

shapiro.test(y)  #p-value = 0.7754，服从正态分布

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.971, p-value = 0.7754

ks.test(x, "pnorm", mean(x), sd(x))

## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  x
## D = 0.1065, p-value = 0.9771
## alternative hypothesis: two-sided

ks.test(y, "pnorm", mean(y), sd(y))

## Warning in ks.test(y, "pnorm", mean(y), sd(y)): ties should not be present
## for the Kolmogorov-Smirnov test

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  y
## D = 0.1197, p-value = 0.9368
## alternative hypothesis: two-sided

sort(x)

##  [1] -5.6 -1.6 -1.4 -0.7 -0.5  0.4  0.7  1.7  2.0  2.5  2.5  2.8  3.0  3.5
## [15]  4.0  4.5  4.6  5.8  6.0  7.1

x1 <- table(cut(x, breaks = c(-6, -3, 0, 3, 6, 9)))
p <- pnorm(c(-3, 0, 3, 6, 9), mean(x), sd(x))
p <- c(p[1], p[2] - p[1], p[3] - p[2], p[4] - p[3], 1 - p[4])
chisq.test(x1, p)

## Warning in chisq.test(x1, p): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  x1 and p
## X-squared = 15, df = 12, p-value = 0.2414

（2）均值检验：

t.test(x, y, var.equal = TRUE)  #方差相等

## 
##  Two Sample t-test
## 
## data:  x and y
## t = -0.6419, df = 38, p-value = 0.5248
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.326179  1.206179
## sample estimates:
## mean of x mean of y 
##     2.065     2.625

t.test(x, y, var.equal = FALSE)  #方差不等

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = -0.6419, df = 36.086, p-value = 0.525
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.32926  1.20926
## sample estimates:
## mean of x mean of y 
##     2.065     2.625

t.test(x, y, paired = TRUE)  #成对数据

## 
##  Paired t-test
## 
## data:  x and y
## t = -0.6464, df = 19, p-value = 0.5257
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.373146  1.253146
## sample estimates:
## mean of the differences 
##                   -0.56

p值均大于0.05，故无法拒绝原假设，即认为两组数据均值无差异。

（3）方差检验检验：

var.test(x, y)

## 
##  F test to compare two variances
## 
## data:  x and y
## F = 1.5984, num df = 19, denom df = 19, p-value = 0.3153
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6326505 4.0381795
## sample estimates:
## ratio of variances 
##           1.598361

p-value = 0.3153>0.05，故无法拒绝原假设，即认为两组数据方差无差异。

———

alt text

（1）检验数据是否服从正态分布：

x <- c(126, 125, 136, 128, 123, 138, 142, 116, 110, 108, 115, 140)
y <- c(162, 172, 177, 170, 175, 152, 157, 159, 160, 162)
shapiro.test(x)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.9396, p-value = 0.4934

shapiro.test(y)

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.938, p-value = 0.5313

ks.test(x, "pnorm", mean(x), sd(x))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  x
## D = 0.1464, p-value = 0.9266
## alternative hypothesis: two-sided

ks.test(y, "pnorm", mean(y), sd(y))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  y
## D = 0.2222, p-value = 0.707
## alternative hypothesis: two-sided

p值均大于0.05，无法拒绝原假设，数据服从正态分布。

（2）方差齐性检验：

var.test(x, y)

## 
##  F test to compare two variances
## 
## data:  x and y
## F = 1.9646, num df = 11, denom df = 9, p-value = 0.32
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5021943 7.0488630
## sample estimates:
## ratio of variances 
##           1.964622

p-value = 0.32>0.05，无法拒绝原假设，即认为两组数据方差是相同的。

（3）两样本均值检验

t.test(x, y, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = -8.8148, df = 20, p-value = 2.524e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -48.24975 -29.78358
## sample estimates:
## mean of x mean of y 
##  125.5833  164.6000

p-value = 2.524e-08<0.05，拒绝原假设，认为新药组和对照组病人的抗凝血酶活力是有差异的

———

alt text

根据题意，应用二项分布总体的假设检验，原假设H0:老年人口比重为14.7%。

binom.test(57, 400, p = 0.147)

## 
##  Exact binomial test
## 
## data:  57 and 400
## number of successes = 57, number of trials = 400, p-value = 0.8876
## alternative hypothesis: true probability of success is not equal to 0.147
## 95 percent confidence interval:
##  0.1097477 0.1806511
## sample estimates:
## probability of success 
##                 0.1425

p-value = 0.8876>0.05，无法拒绝原假设，认为调查结果支持老年人口比重为14.7%的看法。

———

alt text

原假设H0:母雏比例<=0.5,H1:母雏比例>0.5。

binom.test(178, 328, p = 0.5, alternative = "greater")

## 
##  Exact binomial test
## 
## data:  178 and 328
## number of successes = 178, number of trials = 328, p-value =
## 0.06794
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.4957616 1.0000000
## sample estimates:
## probability of success 
##              0.5426829

p-value = 0.06794>0.05，无法拒绝原假设，即认为这种处理未能增加母雏的比例。

———

alt text

检验是否服从理论分布，应用chisq.test()：

chisq.test(c(315, 101, 108, 32), p = c(9, 3, 3, 1)/16)

## 
##  Chi-squared test for given probabilities
## 
## data:  c(315, 101, 108, 32)
## X-squared = 0.47, df = 3, p-value = 0.9254

p-value = 0.9254>0.05，无法拒绝原假设，即认为结果服从自由组合规律。

———

alt text

应用pearson卡方检验

y <- c(92, 68, 28, 11, 1, 0)
x <- 0:5
q <- ppois(x, mean(rep(x, y)))
n <- length(y)
p[1] <- q[1]
p[n] <- 1 - q[n - 1]
for (i in 2:(n - 1)) p[i] <- q[i] - q[i - 1]
p

## [1] 0.447087927 0.359905781 0.144862077 0.038871324 0.007822854 0.001450038

chisq.test(y, p = p)

## 
##  Chi-squared test for given probabilities
## 
## data:  y
## X-squared = 2.1596, df = 5, p-value = 0.8267

警告由于pearson卡方检验要求分组后的每组频数大于等于5。

应用pearson卡方检验

z <- c(92, 68, 28, 12)
n <- length(z)
p <- p[1:n - 1]
p[n] <- 1 - q[n - 1]
chisq.test(z, p = p)

## 
##  Chi-squared test for given probabilities
## 
## data:  z
## X-squared = 0.9113, df = 3, p-value = 0.8227

p-value = 0.8227>0.05，无法拒绝原假设，即认为服从泊松分布。

———

alt text

应用ks.test()，检验两组数据是否来自同一总体分布

x <- c(2.36, 3.14, 7.52, 3.48, 2.76, 5.43, 6.54, 7.41)
y <- c(4.38, 4.25, 6.53, 3.28, 7.21, 6.55)
ks.test(x, y)

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  x and y
## D = 0.375, p-value = 0.6374
## alternative hypothesis: two-sided

p-value = 0.6374>0.05，无法拒绝原假设，即认为两组数据来自同一总体分布。

alt text

列联表的独立性检验，原假设：分娩过程中使用胎儿电子检测仪对剖腹产率无影响。

x <- c(358, 2492, 229, 2745)
dim(x) <- c(2, 2)
x

##      [,1] [,2]
## [1,]  358  229
## [2,] 2492 2745

chisq.test(x)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  x
## X-squared = 37.4143, df = 1, p-value = 9.552e-10

p-value = 9.552e-10<0.05，拒绝原假设，认为分娩过程中使用胎儿电子检测仪对剖腹产率有影响。

———

alt text

列联表的独立性检验，原假设：锻炼时间和1500米长跑记录是相互独立的。

x <- c(45, 46, 28, 11, 12, 20, 23, 12, 10, 28, 30, 35)
dim(x) <- c(4, 3)
x

##      [,1] [,2] [,3]
## [1,]   45   12   10
## [2,]   46   20   28
## [3,]   28   23   30
## [4,]   11   12   35

chisq.test(x)

## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 40.401, df = 6, p-value = 3.799e-07

p-value = 3.799e-07<0.05，拒绝原假设，认为锻炼时间对1500米长跑记录是有影响的。

———

alt text

由于有频率小于5，因此使用fisher精确检验

x <- c(3, 6, 4, 4)
dim(x) <- c(2, 2)
fisher.test(x)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  x
## p-value = 0.6372
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.04624382 5.13272210
## sample estimates:
## odds ratio 
##   0.521271

p-value = 0.6372>0.05，无法拒绝原假设，即认为两变量是独立的，两种工艺对产品质量没有影响。

———

alt text

在相同个体上的两次测定，因此使用McNeMar检验：

x <- c(58, 1, 8, 2, 42, 9, 3, 7, 17)
dim(x) <- c(3, 3)
mcnemar.test(x)

## 
##  McNemar's Chi-squared test
## 
## data:  x
## McNemar's chi-squared = 2.8561, df = 3, p-value = 0.4144

p-value = 0.4144>0.05，无法拒绝原假设，即认为两次测定结果是相同的。

———

alt text

（1）使用符号检验分析，原假设为：中位数>=14.6，备择假设：中位数<14.6。

x <- c(13.32, 13.06, 14.02, 11.86, 13.58, 13.77, 13.51, 14.42, 14.44, 15.43)
binom.test(sum(x > 14.6), length(x), alternative = "less")

## 
##  Exact binomial test
## 
## data:  sum(x > 14.6) and length(x)
## number of successes = 1, number of trials = 10, p-value = 0.01074
## alternative hypothesis: true probability of success is less than 0.5
## 95 percent confidence interval:
##  0.0000000 0.3941633
## sample estimates:
## probability of success 
##                    0.1

p-value = 0.01074<0.05，拒绝原假设，认为该鱼塘的鱼的长度在14.6之下。

（2）使用Wilcoxon符号秩检验分析，原假设为：中位数>=14.6，备择假设：中位数<14.6。

x <- c(13.32, 13.06, 14.02, 11.86, 13.58, 13.77, 13.51, 14.42, 14.44, 15.43)
wilcox.test(x, mu = 14.6, alternative = "less", exact = FALSE)

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  x
## V = 4.5, p-value = 0.01087
## alternative hypothesis: true location is less than 14.6

p-value = 0.01087<0.05，拒绝原假设，认为该鱼塘的鱼的长度在14.6之下。

———

alt text

（1）使用符号检验法检验两测定无显著差异，原假设为两总体无差异。

x <- c(48, 33, 37.5, 48, 42.5, 40, 42, 36, 11.3, 22, 36, 27.3, 14.2, 32.1, 52, 
    38, 17.3, 20, 21, 46.1)
y <- c(37, 41, 23.4, 17, 31.5, 40, 31, 36, 5.7, 11.5, 21, 6.1, 26.5, 21.3, 44.5, 
    28, 22.6, 20, 11, 22.3)
binom.test(sum(x > y), length(x))

## 
##  Exact binomial test
## 
## data:  sum(x > y) and length(x)
## number of successes = 14, number of trials = 20, p-value = 0.1153
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4572108 0.8810684
## sample estimates:
## probability of success 
##                    0.7

p-value = 0.1153>0.05，无法拒绝原假设，即认为两侧定是无差异的。

（2）Wilcoxon符号秩检验法检验两测定无显著差异，原假设为两总体无差异。

wilcox.test(x, y, paired = TRUE, exact = FALSE)

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  x and y
## V = 136, p-value = 0.005191
## alternative hypothesis: true location shift is not equal to 0

p-value = 0.005191<0.05，拒绝原假设，即认为两侧定是有差异的。

（3）使用Wilcoxon秩和检验法检验两测定无显著差异，原假设为两总体无差异。

wilcox.test(x, y, exact = FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  x and y
## W = 274.5, p-value = 0.04524
## alternative hypothesis: true location shift is not equal to 0

p-value = 0.04524<0.05，拒绝原假设，即认为两侧定是有差异的。

（4）对数据作正态检验和方差齐性检验

正态性检验：

ks.test(x, "pnorm", mean(x), sd(x))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  x
## D = 0.1407, p-value = 0.8235
## alternative hypothesis: two-sided

shapiro.test(x)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.9507, p-value = 0.3773

ks.test(y, "pnorm", mean(y), sd(y))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  y
## D = 0.1014, p-value = 0.973
## alternative hypothesis: two-sided

shapiro.test(y)

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.9667, p-value = 0.6848

p-value均大于0.05，无法拒绝原假设，即认为x、y是正态的。

方差齐性检验：

var.test(x, y)

## 
##  F test to compare two variances
## 
## data:  x and y
## F = 1.1406, num df = 19, denom df = 19, p-value = 0.7772
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4514788 2.8817689
## sample estimates:
## ratio of variances 
##           1.140639

p-value = 0.7772>0.05，无法拒绝原假设，即认为x、y方差是相同的。

t检验：

t.test(x, y, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = 2.2428, df = 38, p-value = 0.03082
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   0.8125529 15.8774471
## sample estimates:
## mean of x mean of y 
##    33.215    24.870

p-value = 0.03082<0.05，拒绝原假设，即认为x、y是有差异的。

———

alt text

spearman秩相关检验：

x <- c(24, 17, 20, 41, 52, 23, 46, 18, 15, 29)
y <- c(8, 1, 4, 7, 9, 5, 10, 3, 2, 6)
cor.test(x, y, method = "spearman", exact = F)

## 
##  Spearman's rank correlation rho
## 
## data:  x and y
## S = 10, p-value = 5.484e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9393939

p-value = 5.484e-05<0.05，拒绝原假设，认为相关的，且为正相关。

kendall秩相关检验：

cor.test(x, y, method = "kendall", exact = F)

## 
##  Kendall's rank correlation tau
## 
## data:  x and y
## z = 3.3094, p-value = 0.000935
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.8222222

p-value = 0.000935<0.05，拒绝原假设，认为相关的，且为正相关。

———

alt text

使用Wilcoxon检验

x <- rep(1:5, c(0, 1, 9, 7, 3))
y <- rep(1:5, c(2, 2, 11, 4, 1))
wilcox.test(x, y, exact = F)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  x and y
## W = 266, p-value = 0.05509
## alternative hypothesis: true location shift is not equal to 0

假设检验习题

囧囧有神

2016年1月11日

———

使用t.test()均值检验之前，需要对数据进行正态检验、方差齐性检验。本题原假设H0：油漆工人的血小板计数与正常男子无差异，即mu=225，H1:mu!=225。

p-value = 0.002516<0.05，拒绝原假设，即认为油漆工人的血小板计数与正常男子有差异。

———

1000小时以上的概率为0.4912059

———

成对数据的均值检验：

p-value = 0.5357，无法拒绝原假设，即认为两种方法疗效无差异。

———

（1）检验数据是否服从正态分布：

（2）均值检验：

p值均大于0.05，故无法拒绝原假设，即认为两组数据均值无差异。

（3）方差检验检验：

p-value = 0.3153>0.05，故无法拒绝原假设，即认为两组数据方差无差异。

———

（1）检验数据是否服从正态分布：

p值均大于0.05，无法拒绝原假设，数据服从正态分布。

（2）方差齐性检验：

p-value = 0.32>0.05，无法拒绝原假设，即认为两组数据方差是相同的。

（3）两样本均值检验

p-value = 2.524e-08<0.05，拒绝原假设，认为新药组和对照组病人的抗凝血酶活力是有差异的

———

根据题意，应用二项分布总体的假设检验，原假设H0:老年人口比重为14.7%。

p-value = 0.8876>0.05，无法拒绝原假设，认为调查结果支持老年人口比重为14.7%的看法。

———

原假设H0:母雏比例<=0.5,H1:母雏比例>0.5。

p-value = 0.06794>0.05，无法拒绝原假设，即认为这种处理未能增加母雏的比例。

———

检验是否服从理论分布，应用chisq.test()：

p-value = 0.9254>0.05，无法拒绝原假设，即认为结果服从自由组合规律。

———

应用pearson卡方检验

警告由于pearson卡方检验要求分组后的每组频数大于等于5。

应用pearson卡方检验

p-value = 0.8227>0.05，无法拒绝原假设，即认为服从泊松分布。

———

应用ks.test()，检验两组数据是否来自同一总体分布

p-value = 0.6374>0.05，无法拒绝原假设，即认为两组数据来自同一总体分布。

列联表的独立性检验，原假设：分娩过程中使用胎儿电子检测仪对剖腹产率无影响。

p-value = 9.552e-10<0.05，拒绝原假设，认为分娩过程中使用胎儿电子检测仪对剖腹产率有影响。

———

列联表的独立性检验，原假设：锻炼时间和1500米长跑记录是相互独立的。

p-value = 3.799e-07<0.05，拒绝原假设，认为锻炼时间对1500米长跑记录是有影响的。

———

由于有频率小于5，因此使用fisher精确检验

p-value = 0.6372>0.05，无法拒绝原假设，即认为两变量是独立的，两种工艺对产品质量没有影响。

———

在相同个体上的两次测定，因此使用McNeMar检验：

p-value = 0.4144>0.05，无法拒绝原假设，即认为两次测定结果是相同的。

———

（1）使用符号检验分析，原假设为：中位数>=14.6，备择假设：中位数<14.6。

p-value = 0.01074<0.05，拒绝原假设，认为该鱼塘的鱼的长度在14.6之下。

（2）使用Wilcoxon符号秩检验分析，原假设为：中位数>=14.6，备择假设：中位数<14.6。

p-value = 0.01087<0.05，拒绝原假设，认为该鱼塘的鱼的长度在14.6之下。

———

（1）使用符号检验法检验两测定无显著差异，原假设为两总体无差异。

p-value = 0.1153>0.05，无法拒绝原假设，即认为两侧定是无差异的。

（2）Wilcoxon符号秩检验法检验两测定无显著差异，原假设为两总体无差异。

p-value = 0.005191<0.05，拒绝原假设，即认为两侧定是有差异的。

（3）使用Wilcoxon秩和检验法检验两测定无显著差异，原假设为两总体无差异。

p-value = 0.04524<0.05，拒绝原假设，即认为两侧定是有差异的。

（4）对数据作正态检验和方差齐性检验

正态性检验：

p-value均大于0.05，无法拒绝原假设，即认为x、y是正态的。

方差齐性检验：

p-value = 0.7772>0.05，无法拒绝原假设，即认为x、y方差是相同的。

t检验：

p-value = 0.03082<0.05，拒绝原假设，即认为x、y是有差异的。

———

spearman秩相关检验：

p-value = 5.484e-05<0.05，拒绝原假设，认为相关的，且为正相关。

kendall秩相关检验：

p-value = 0.000935<0.05，拒绝原假设，认为相关的，且为正相关。

———

使用Wilcoxon检验

p-value = 0.05509>0.05，无法拒绝原假设，不能认为新方法的疗效显著优于原疗法。