HW_3 N094220018 陳彥伯

#1. 檢測一組平均數是否為某一值
#單一T檢定：H0: 相等, H1: 推翻H0，也就是不相等, 因為沒有大小所以用雙尾檢定
N1 <-c(16.3,16.2,15.8,15.4,16,15.6,15.5,16.1,15.9,16.1) 
t.test(N1, mu = 16, alternative = "two.sided")

## 
##  One Sample t-test
## 
## data:  N1
## t = -1.1326, df = 9, p-value = 0.2867
## alternative hypothesis: true mean is not equal to 16
## 95 percent confidence interval:
##  15.67029 16.10971
## sample estimates:
## mean of x 
##     15.89

#按上述，做雙尾單一T檢定，p value = 0.2867, p>0.01, 所以我有99%的信心說按照這個統計結果，無法推翻虛無假說H0（平均值等於16），所以這組向量的平均值有99%的信心說是等於16的

#2.請使用 R 語言內建資料集(airquality),請使用其中兩個變數 Wind 及 Temp來建立迴歸模型,來探討溫度(Temp)是否會影響風速(Wind)?
#討論溫度會否影響風速，所以自變數x是溫度Temp, 應變數y是風速Wind
View(airquality)
#繪出以Temp為x軸，Wind為y軸的平滑曲線分布圖。並算出相關係數為-0.4579879（但是負的相關性並沒有很大）
scatter.smooth(x = airquality$Temp, y = airquality$Wind,main = "Wind ~ Temp")

cor(airquality$Wind, airquality$Temp)

## [1] -0.4579879

#建立Temp跟Wind之間線性回歸模型
model1 <- lm(airquality$Wind~airquality$Temp)
summary(model1)

## 
## Call:
## lm(formula = airquality$Wind ~ airquality$Temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.5784 -2.4489 -0.2261  1.9853  9.7398 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     23.23369    2.11239  10.999  < 2e-16 ***
## airquality$Temp -0.17046    0.02693  -6.331 2.64e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.142 on 151 degrees of freedom
## Multiple R-squared:  0.2098, Adjusted R-squared:  0.2045 
## F-statistic: 40.08 on 1 and 151 DF,  p-value: 2.642e-09

#得出線性回歸方程式y=-0.17046x + 0.02693, p-value = 2.64 e-09, R-square = 0.2098
#進行殘差檢定：shapiro看殘差是否常態分佈、durbinwastsontest看殘差是否獨立性，ncvTest看殘差變異數是否一致
library(car)

## Loading required package: carData

require(car)
shapiro.test(model1$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model1$residuals
## W = 0.9856, p-value = 0.1132

durbinWatsonTest(model1)

##  lag Autocorrelation D-W Statistic p-value
##    1       0.1425864      1.701754   0.062
##  Alternative hypothesis: rho != 0

ncvTest(model1)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.3131126, Df = 1, p = 0.57578

#對應pvalue分別為0.1132, 0.054, 0.57, 因為P值都>0.05，所以我們有95%信心說這個線性回歸的model殘差符合常態分佈及獨立性，也就是這個線性回歸是可以用作解釋Wind跟Temp的關係，但是可能不是最好的解釋方程，因為R-square值只有0.2098 遠小於0.8

#3. 第三題屬於單樣本比率的檢定，比照binom跟prop.test，似乎都可以使用，不過binom比較精確，所以就選用binom.test
#H0: 徐老師的投球命中率<0.8，H1: 命中率>=0.8
binom.test(83, 100, p=0.8, alternative = "less", conf.level = 0.99)

## 
##  Exact binomial test
## 
## data:  83 and 100
## number of successes = 83, number of trials = 100, p-value = 0.8077
## alternative hypothesis: true probability of success is less than 0.8
## 99 percent confidence interval:
##  0.0000000 0.9076597
## sample estimates:
## probability of success 
##                   0.83

#上面得出p value = 0.8077, >0.01,我取信賴區間99%，範圍是0-0.0976597, 表示無法推翻H0，所以我有99%的信心說徐老師的命中率>=0.8

#4. 雙樣本檢驗平均值是否有差異,我們先檢定是否常態分佈跟變異數是否一致，才能看能否用t檢定。但這兩組data非相依性，故paired = false)
x<- c(69.4, 69.7, 72.3, 71.8, 70.3, 68.2, 74.6, 70.2, 77.8, 65.4,74.0, 78.0, 72.8, 84.0, 62.6, 69.6, 71.6, 69.5, 72.9, 70.4)
y<- c(65.1, 80.5, 65.8, 73.5, 71.1, 58.6, 64.4, 74.5, 79.5, 69.8,74.6, 71.5, 68.2, 77.0, 64.2, 79.0, 69.0, 62.0, 83.4, 68.2)
summary(x)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   62.60   69.58   71.00   71.75   73.17   84.00

str(x)

##  num [1:20] 69.4 69.7 72.3 71.8 70.3 68.2 74.6 70.2 77.8 65.4 ...

shapiro.test(x)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.93781, p-value = 0.2179

shapiro.test(y)

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98088, p-value = 0.9449

# x 的 p-value = 0.2179 >0.05, 故無法推翻虛無假說：x為常態分佈
# y 的 p-value = 0.9449 >0.05, 故無法推翻虛無假說：y為常態分佈
# 接下來進行x,y的變異數檢定
var.test(x,y, alternative = "two.sided")

## 
##  F test to compare two variances
## 
## data:  x and y
## F = 0.47229, num df = 19, denom df = 19, p-value = 0.1106
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1869385 1.1932202
## sample estimates:
## ratio of variances 
##          0.4722911

# Ｆ檢定下H0: x跟y的變異數相等，p = 0.1106 >0.05, 無法推翻H0, 且95%信賴區間:0.1869~1.19，故我有95%的信心說x跟y變異數相等，所以t.test的時候var.equal = T)
#因為x,y都符合常態分佈，且變異數一致，所以可以用以下的t檢定兩者的平均值是否有差異
t.test(x,y, alternative = "two.sided", paired = FALSE, var.equal = TRUE, conf.level = 0.99)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = 0.41744, df = 38, p-value = 0.6787
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
##  -4.176695  5.696695
## sample estimates:
## mean of x mean of y 
##    71.755    70.995

#H0：x跟y的平均值沒有差異。跑出來p-value = 0.6787, 我取99%的信賴區間：-4.176695~5.696695，包含0，所以有99%的信心說x,y的平均值沒有差異，也就是平均體重x,y是相同的）

HW_3 N094220018 陳彥伯

Yanbo Chen

1/3/2021