DATE <- c("324", "325", "326", "327", "328", "329", "330", "331", "401",
          "402", "403", "404", "405", "406", "407", "408", "409", "410", "411", "412")
Domestic <- c(15, 14, 21, 83, 34, 33, 56, 87, 104, 160, 183, 133, 216, 281, 382, 384, 442, 431, 439, 551)
Oversea <- c(124, 122, 82, 120, 93, 63, 107, 152, 132, 244, 97, 142, 65, 78, 149, 123, 136, 144, 191, 112)

#create a dataframe with those vectors and assign it to an object

cvdta <- data.frame(DATE, Domestic, Oversea)
cvdta$DATE <- as.factor(cvdta$DATE)
library(car)
library(tidyverse)

Look at your data

scatterplot(Domestic ~ Oversea, 
            data = cvdta, 
            smooth = F)

從此圖來看,3/24至4/12這段期間,本土及境外移入的確診人數呈現緩慢上升的趨勢。且從回歸線來看,可以發現境外移入的確診人數較本土案例上升快。

ggplot(aes(x = Oversea, y = Domestic), data = cvdta) +
  geom_point() +
  geom_smooth(method = lm, se = T) +
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

以相關來說,兩者呈現正相關,意即本土案例越多,境外移入案例也越多。然而,當我們看到灰色的區塊,其為可解釋的誤差值,如果想要證實兩者有高度的相關,其灰色區塊應較窄,所以我們還要進一步進行回歸分析,看兩者能互相解釋的程度。

Fitting the simple linear regression model

cvmod <- lm(Domestic ~ Oversea, data = cvdta)
summary(cvmod)
## 
## Call:
## lm(formula = Domestic ~ Oversea, data = cvdta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -187.66 -136.95  -97.39  156.82  360.77 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   74.210    122.743   0.605    0.553
## Oversea        1.036      0.940   1.102    0.285
## 
## Residual standard error: 174.5 on 18 degrees of freedom
## Multiple R-squared:  0.0632, Adjusted R-squared:  0.01115 
## F-statistic: 1.214 on 1 and 18 DF,  p-value: 0.285

從上述結果來看,想以境外移入案例數量預測本土個案數量時,其P值為.285,意即可能會有28.5%的機率預測錯誤,大於.05,沒有達到顯著水準,所以兩者能夠預測的程度很低。再來,以境外移入案例數量只能解釋.011的本土案例數量,所以兩者的相關性很低,無法相互解釋。