資料匯入

dta<-read.csv("C:/Users/User/Desktop/LearnR/CA/CAdata/lowbwt.csv")
View(dta)
attach(dta)

(a) Please calculate Pearson’s correlation coefficient and p-value between LWT and BWT.

cor.test(LWT, BWT, method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  LWT and BWT
## t = 2.5856, df = 187, p-value = 0.01048
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04423134 0.32003247
## sample estimates:
##       cor 
## 0.1857887

(b) Then draw thescatterplotof these two variables.

plot(LWT, BWT)

with(dta, smoothScatter(LWT, BWT))

(c) Please use multiple linear regression to analyze the risk factors of BWT by all independent variables.

(Outcome: BWT; Predictors: AGE, LWT, RACE, SMOKE, PTL, HT, UI, and FTV)

(BWT, AGE, LWT, PTL, and FTV are treated as continuousvariables)

(RACE, SMOKE, HT, and UI are treated as categoricalvariables)

Hint: R code: factor(RACE)

fit= lm(BWT~AGE+LWT+factor(RACE)+factor(SMOKE)+PTL+factor(HT)+factor(UI)+FTV)  #用factor 宣告x3轉成類別變項
summary(fit)
## 
## Call:
## lm(formula = BWT ~ AGE + LWT + factor(RACE) + factor(SMOKE) + 
##     PTL + factor(HT) + factor(UI) + FTV)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1826.20  -434.94    57.59   472.67  1702.68 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2930.102    312.839   9.366  < 2e-16 ***
## AGE              -3.650      9.618  -0.379 0.704787    
## LWT               4.354      1.735   2.509 0.012990 *  
## factor(RACE)2  -489.442    149.953  -3.264 0.001316 ** 
## factor(RACE)3  -357.051    114.729  -3.112 0.002163 ** 
## factor(SMOKE)1 -350.618    106.454  -3.294 0.001192 ** 
## PTL             -48.839    101.950  -0.479 0.632492    
## factor(HT)1    -592.812    202.279  -2.931 0.003824 ** 
## factor(UI)1    -514.928    138.857  -3.708 0.000278 ***
## FTV             -14.072     46.458  -0.303 0.762323    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 650.2 on 179 degrees of freedom
## Multiple R-squared:  0.2427, Adjusted R-squared:  0.2046 
## F-statistic: 6.373 on 9 and 179 DF,  p-value: 7.961e-08

(d) Please use multiple linear regression to analyze the risk factors of BWT by stepwise method.

library(MASS)

step <-stepAIC(fit, direction="both")      # direction = c("both", "backward", "forward") #   -的 back +的for   
## Start:  AIC=2458.13
## BWT ~ AGE + LWT + factor(RACE) + factor(SMOKE) + PTL + factor(HT) + 
##     factor(UI) + FTV
## 
##                 Df Sum of Sq      RSS    AIC
## - FTV            1     38784 75709672 2456.2
## - AGE            1     60875 75731762 2456.3
## - PTL            1     97013 75767900 2456.4
## <none>                       75670887 2458.1
## - LWT            1   2661527 78332414 2462.7
## - factor(HT)     1   3630845 79301732 2465.0
## - factor(SMOKE)  1   4585833 80256720 2467.2
## - factor(RACE)   2   6627291 82298178 2470.0
## - factor(UI)     1   5813481 81484368 2470.1
## 
## Step:  AIC=2456.23
## BWT ~ AGE + LWT + factor(RACE) + factor(SMOKE) + PTL + factor(HT) + 
##     factor(UI)
## 
##                 Df Sum of Sq      RSS    AIC
## - AGE            1     82254 75791926 2454.4
## - PTL            1     93251 75802922 2454.5
## <none>                       75709672 2456.2
## + FTV            1     38784 75670887 2458.1
## - LWT            1   2623848 78333519 2460.7
## - factor(HT)     1   3592170 79301842 2463.0
## - factor(SMOKE)  1   4569078 80278749 2465.3
## - factor(RACE)   2   6601034 82310705 2468.0
## - factor(UI)     1   5791945 81501616 2468.2
## 
## Step:  AIC=2454.43
## BWT ~ LWT + factor(RACE) + factor(SMOKE) + PTL + factor(HT) + 
##     factor(UI)
## 
##                 Df Sum of Sq      RSS    AIC
## - PTL            1    119803 75911729 2452.7
## <none>                       75791926 2454.4
## + AGE            1     82254 75709672 2456.2
## + FTV            1     60164 75731762 2456.3
## - LWT            1   2542283 78334209 2458.7
## - factor(HT)     1   3545242 79337168 2461.1
## - factor(SMOKE)  1   4490240 80282166 2463.3
## - factor(UI)     1   5723278 81515204 2466.2
## - factor(RACE)   2   6614759 82406685 2466.2
## 
## Step:  AIC=2452.73
## BWT ~ LWT + factor(RACE) + factor(SMOKE) + factor(HT) + factor(UI)
## 
##                 Df Sum of Sq      RSS    AIC
## <none>                       75911729 2452.7
## + PTL            1    119803 75791926 2454.4
## + AGE            1    108807 75802922 2454.5
## + FTV            1     58868 75852861 2454.6
## - LWT            1   2671613 78583342 2457.3
## - factor(HT)     1   3583850 79495579 2459.4
## - factor(SMOKE)  1   4911219 80822948 2462.6
## - factor(RACE)   2   6674129 82585858 2464.7
## - factor(UI)     1   6327025 82238754 2465.9
step$anova # display results
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## BWT ~ AGE + LWT + factor(RACE) + factor(SMOKE) + PTL + factor(HT) + 
##     factor(UI) + FTV
## 
## Final Model:
## BWT ~ LWT + factor(RACE) + factor(SMOKE) + factor(HT) + factor(UI)
## 
## 
##    Step Df  Deviance Resid. Df Resid. Dev      AIC
## 1                          179   75670887 2458.130
## 2 - FTV  1  38784.31       180   75709672 2456.227
## 3 - AGE  1  82254.45       181   75791926 2454.432
## 4 - PTL  1 119802.88       182   75911729 2452.730