Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?
processors <- read.csv("full_data.csv")
head(processors)
## date location new_cases new_deaths total_cases total_deaths
## 1 2019-12-31 Afghanistan 0 0 0 0
## 2 2020-01-01 Afghanistan 0 0 0 0
## 3 2020-01-02 Afghanistan 0 0 0 0
## 4 2020-01-03 Afghanistan 0 0 0 0
## 5 2020-01-04 Afghanistan 0 0 0 0
## 6 2020-01-05 Afghanistan 0 0 0 0
plot(processors$new_cases, processors$new_deaths)

lm1=lm(processors$new_deaths~processors$new_cases)
plot(processors$new_cases, processors$new_deaths)
abline(lm1)

summary(lm1)
##
## Call:
## lm(formula = processors$new_deaths ~ processors$new_cases)
##
## Residuals:
## Min 1Q Median 3Q Max
## -402.45 0.05 0.09 0.09 510.77
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0941680 0.2169339 -0.434 0.664
## processors$new_cases 0.0433618 0.0002312 187.585 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.09 on 6269 degrees of freedom
## Multiple R-squared: 0.8488, Adjusted R-squared: 0.8488
## F-statistic: 3.519e+04 on 1 and 6269 DF, p-value: < 2.2e-16
plot(fitted(lm1),resid(lm1))

qqnorm(resid(lm1))
qqline(resid(lm1))
