進階應用統計 - Homework 1

by 陳柏宇, 學號 : 606890100

Lab activity 1

The American Automobile Association has published data(Defensive Driving : Managing Time and Space, 1991) that looks at the relationship between the average stopping distance (y = distance, in feet) and the speed of a car (x = speed, in miles per hour).

Create a fitted line plot of the data. Does a line do a good job of describling the trend in the data ?

carstop <- read.table("C:/Users/USER/Documents/carstopping.txt",header = TRUE)
attach(carstop)
carstop.model <- lm(Distance ~ Speed)
summary(carstop.model)

## 
## Call:
## lm(formula = Distance ~ Speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.738 -22.351  -7.738  16.622  47.083 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -44.1667    22.0821   -2.00   0.0924 .  
## Speed         5.6726     0.5279   10.75 3.84e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.21 on 6 degrees of freedom
## Multiple R-squared:  0.9506, Adjusted R-squared:  0.9424 
## F-statistic: 115.5 on 1 and 6 DF,  p-value: 3.837e-05

估計的截距項 β 約為 -44.167 ；斜率 β1 約為 5.673 。

plot(Speed, Distance)
abline(-44.167, 5.673)

由上圖可看出fitted line配適的不錯，能夠描述此筆data的trend。

Does car speed explain a large portion of the variability in the average stopping distance ?

anova(carstop.model)

## Analysis of Variance Table
## 
## Response: Distance
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Speed      1 135150  135150  115.48 3.837e-05 ***
## Residuals  6   7022    1170                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

是的。透過 ANOVA table 與上述 summary( ) 的 R-squared，可以表示在 Distance 的變異中有 94% ~ 95% 可以用 Speed 的變異來解釋，也就是這個 model 可以解釋 94% ~ 95% 的 Distance 變異。

Lab activity 2

The following questions illustrate the point : The large sample size results in a sample slope that is significantly different from 0, but not meaningfully different from 0.

Create a fitted line plot and perform a standaed regression analysis on the data set. Does there appear to be strong linear relation between x and y ?

pract <- read.table("C:/Users/USER/Documents/practical.txt",header = TRUE)
attach(pract)
pract.model <- lm(y ~ x)
summary(pract.model)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3623 -0.6559  0.0131  0.6818  3.1889 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.006235   0.061969   80.79   <2e-16 ***
## x           0.099798   0.005576   17.90   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.017 on 998 degrees of freedom
## Multiple R-squared:  0.243,  Adjusted R-squared:  0.2422 
## F-statistic: 320.3 on 1 and 998 DF,  p-value: < 2.2e-16

plot(x, y)
abline(5.006, 0.1)

cor.test(~y + x)

## 
##  Pearson's product-moment correlation
## 
## data:  y and x
## t = 17.897, df = 998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4445046 0.5384542
## sample estimates:
##       cor 
## 0.4929148

由上圖呈現與 x & y 的 Correlation ，學生認為 x 與 y 不具有強的線性相關。

Conduct the test H0 : β1 = 0. What is your conclusion about the relationship between x and y ?

從上述 summary( ) 結果已知拒絕 H0 ，因此 β1 不是呈現水平。

Calculate…

pract.model2 <- lm(y ~ offset(x))
summary(pract.model2)

## 
## Call:
## lm(formula = y ~ offset(x))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3245  -4.4736  -0.1879   4.4886  11.5463 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -3.5457     0.1673  -21.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.292 on 999 degrees of freedom

plot(x, y)
abline(-3.546, 1)

Slope β1 = 1

pract.pred <- predict(pract.model, newdata = data.frame(x = pract$x), se.fit = T, interval = 'confidence')
pract.pred2 <- predict(pract.model, newdata = data.frame(x = pract$x), se.fit = T,interval = 'prediction')
plot(x, y)
abline(5.006, 0.1)
lines(pract$x, pract.pred$fit[ ,3], lty = 'dashed', col = 'red')
lines(pract$x, pract.pred$fit[ ,2], lty = 'dashed', col = 'red')
lines(pract$x, pract.pred2$fit[ ,3], lty = 'dashed', col = 'blue')
lines(pract$x, pract.pred2$fit[ ,2], lty = 'dashed', col = 'blue')

Lab activity 3

Lab activity 4

library(ggplot2)

powerNum <- numeric()
powerNum2 <- numeric()

myPower <- function(n, mu, mu0, sigma2, nrep){
  ybar <- numeric(nrep)
  s2 <- numeric(nrep)
  power.mu <- logical(nrep)
  set.seed(1234)
  for(i in 1:nrep){
    y <- rnorm(n, mean = mu, sd = sqrt(sigma2))
    ybar[i] <- mean(y)
    s2[i] <- var(y)
    s <- sqrt(var(y))
    power.mu[i] <- abs(ybar[i] - mu0) > qt(.975, n - 1) * s / sqrt(n)
  }
  return(sum(power.mu) / nrep)
}

for(i in 2:50){
  powerNum[i-1] <- myPower(n = i, mu = 6.5, mu0 = 5, sigma2 = 4, nrep = 10000)
}

for(i in 2:50){
  powtt <- power.t.test(n = i, delta = 1.5, sd = sqrt(4), strict = T, type = "one.sample")
  powerNum2[i-1] <- powtt$power
}

ggplot(data.frame(x = powerNum), aes(x)) + stat_function(fun=sin)