library(tidyverse)
library(car)
DATE <- c("324", "325", "326", "327", "328", "329", "330", "331", "401",
          "402", "403", "404", "405", "406", "407", "408", "409", "410", "411"
          , "412", "413", "414", "415", "416", "417", "418", "419")
Taipei_Temperature <- c(21, 28, 25, 17, 18, 24, 26, 23, 17, 15, 17, 21, 26, 26, 23, 24
                 , 28, 30, 29, 31, 30, 24, 20, 22, 20, 22, 18)
Taipei <- c(0, 0, 0, 1, 5, 1, 7, 10, 19, 13, 22, 29, 41, 84, 61, 74, 67, 96, 
            101, 142, 147, 145, 185, 221, 245, 270, 287)
Weekday <- c("Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "N", "N", "Y", "Y", "Y", 
         "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "N", "N", "Y", "Y")
cvdta <- data.frame(DATE, Weekday, Taipei_Temperature, Taipei)
head(cvdta)
##   DATE Weekday Taipei_Temperature Taipei
## 1  324       Y                 21      0
## 2  325       Y                 28      0
## 3  326       N                 25      0
## 4  327       N                 17      1
## 5  328       Y                 18      5
## 6  329       Y                 24      1
str(cvdta)
## 'data.frame':    27 obs. of  4 variables:
##  $ DATE              : chr  "324" "325" "326" "327" ...
##  $ Weekday           : chr  "Y" "Y" "N" "N" ...
##  $ Taipei_Temperature: num  21 28 25 17 18 24 26 23 17 15 ...
##  $ Taipei            : num  0 0 0 1 5 1 7 10 19 13 ...
#convert char to factor variables
cvdta$DATE <- as.factor(cvdta$DATE)
cvdta$Weekday <- as.factor(cvdta$Weekday)
ggplot(aes(y = Taipei, x = Taipei_Temperature, color = Weekday), data = cvdta) +
  geom_point() +
  geom_smooth(method = lm, se = F) + 
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

由此圖可看出weekdayY與WeekdayN的斜率不相同。

#run multiple linear regression
cvmod <- lm(Taipei ~ Taipei_Temperature + Weekday, data = cvdta)
summary(cvmod)
## 
## Call:
## lm(formula = Taipei ~ Taipei_Temperature + Weekday, data = cvdta)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -87.86 -74.68 -23.07  54.75 206.71 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)        66.662153  96.804435   0.689    0.498
## Taipei_Temperature  0.756913   4.178546   0.181    0.858
## WeekdayY            0.002713  40.597481   0.000    1.000
## 
## Residual standard error: 94.29 on 24 degrees of freedom
## Multiple R-squared:  0.001425,   Adjusted R-squared:  -0.08179 
## F-statistic: 0.01712 on 2 and 24 DF,  p-value: 0.983

H0=台北每日確診數量和台北最高溫有相關

H1=台北每日確診數量和台北最高溫無相關

由於T值小於2(T= 0.181<2),且近乎零,以及P值大於0.05(P=0.858>0.05),則表示台北每日確診數量和台北最高溫的平均數差異不顯著,則兩者之間的相關不顯著。 故表示在95%的信心水準下,無法推翻虛無假設,也表示台北每日確診數量和台北最高溫之間無相關。且台北每日最高溫無法預測台北每日新增確診數量。

得出公式:Taipei=0.76(Taipei_Temperature)+0.002(WeekdayY)+94.29

Diagnostic

Checking the normality of the residuals

hist( x = residuals(cvmod),
      xlab = "Value of residual",
      main = "",
      breaks = 20)

從上圖可看出Residuals不是常態分布,且接近正偏態,表示預測值檢觀察值不集中於0,則此model是有很大誤差的。 Normal Q-Q - to examine whether the residuals are normally distributed. - It’s good if residuals points follow the straight dashed line.

# Plot of the theoretical quantiles according to the model, against the quantiles of the standardised residuals.
plot(cvmod, which = 2)

Checking the linearity of the relationship Residuals vs Fitted - A horizontal line, without distinct patterns is an indication for a linear relationship, what is good.

# Plot of the fitted values against the residuals for cvmod, with a line showing the relationship between the two.
plot(cvmod, which  = 1)

從此圖可看出不是水平線,預測值的誤差會隨著預測值而變動,表示其中還有一些變異是無法被此model解釋的。 Advance version

library(car)
residualPlots(cvmod)

##                    Test stat Pr(>|Test stat|)
## Taipei_Temperature   -0.3281           0.7458
## Weekday                                      
## Tukey test           -0.3284           0.7426

Checking the homogeneity of variance Scale-Location (or Spread-Location). - Horizontal line with equally spread points is a good indication of homoscedasticity. - If homogeneity of variance is violated - the standard error estimates associated with the regression coefficients are no longer entirely reliable - t tests for the coefficients aren’t quite right either.

# Plot of the fitted values (model predictions) against the square root of the abs standardised residuals.
plot(cvmod, which = 3)