DATE <- c("324", "325", "326", "327", "328", "329", "330", "331", "401",
"402", "403", "404", "405", "406", "407", "408", "409", "410", "411", "412")
Domestic <- c(15, 14, 21, 83, 34, 33, 56, 87, 104, 160, 183, 133, 216, 281, 382, 384, 442, 431, 439, 551)
Oversea <- c(124, 122, 82, 120, 93, 63, 107, 152, 132, 244, 97, 142, 65, 78, 149, 123, 136, 144, 191, 112)
#create a dataframe with those vectors and assign it to an object
cvdta <- data.frame(DATE, Domestic, Oversea)
cvdta$DATE <- as.factor(cvdta$DATE)
library(car)
library(tidyverse)
Look at your data
scatterplot(Domestic ~ Oversea,
data = cvdta,
smooth = F)
從此圖來看,3/24至4/12這段期間,本土及境外移入的確診人數呈現緩慢上升的趨勢。且從回歸線來看,可以發現境外移入的確診人數較本土案例上升快。
ggplot(aes(x = Oversea, y = Domestic), data = cvdta) +
geom_point() +
geom_smooth(method = lm, se = T) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
以相關來說,兩者呈現正相關,意即本土案例越多,境外移入案例也越多。然而,當我們看到灰色的區塊,其為可解釋的誤差值,如果想要證實兩者有高度的相關,其灰色區塊應較窄,所以我們還要進一步進行回歸分析,看兩者能互相解釋的程度。
Fitting the simple linear regression model
cvmod <- lm(Domestic ~ Oversea, data = cvdta)
summary(cvmod)
##
## Call:
## lm(formula = Domestic ~ Oversea, data = cvdta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -187.66 -136.95 -97.39 156.82 360.77
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.210 122.743 0.605 0.553
## Oversea 1.036 0.940 1.102 0.285
##
## Residual standard error: 174.5 on 18 degrees of freedom
## Multiple R-squared: 0.0632, Adjusted R-squared: 0.01115
## F-statistic: 1.214 on 1 and 18 DF, p-value: 0.285
從上述結果來看,想以境外移入案例數量預測本土個案數量時,其P值為.285,意即可能會有28.5%的機率預測錯誤,大於.05,沒有達到顯著水準,所以兩者能夠預測的程度很低。再來,以境外移入案例數量只能解釋.011的本土案例數量,所以兩者的相關性很低,無法相互解釋。