Will be going with a dataset I had analyzed back in DATA607 before I learnt of regressions
head(gundata_clean)
## # A tibble: 6 × 12
## year Crime/…¹ Murde…² Robbe…³ Priso…⁴ %Blac…⁵ %Whit…⁶ Popul…⁷ Incom…⁸ Densi…⁹
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1977 414. 14.2 96.8 83 8.38 55.1 3.78 9563. 0.0746
## 2 1978 419. 13.3 99.1 94 8.35 55.1 3.83 9932 0.0756
## 3 1979 413. 13.2 110. 144 8.33 55.1 3.87 9877. 0.0762
## 4 1980 448. 13.2 132. 141 8.41 54.9 3.90 9541. 0.0768
## 5 1981 470. 11.9 126. 149 8.48 54.9 3.92 9548. 0.0772
## 6 1982 448. 10.6 112 183 8.51 54.9 3.93 9479. 0.0773
## # … with 2 more variables: state <chr>, ShallCarryLaw <chr>, and abbreviated
## # variable names ¹`Crime/100k`, ²`Murder/100k`, ³`Robbery/100k`,
## # ⁴`Prisoners/100k`, ⁵`%Black`, ⁶`%White`, ⁷`Population/mil`,
## # ⁸`Income/capitaofstate`, ⁹`Density(sqrmi/1000)`
gundata_corplot <- gundata_clean %>%
select(-"state", -"ShallCarryLaw")
gundata_cor <- cor(gundata_corplot, method = "pearson")
col_gd <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(gundata_cor, method = "color", col = col_gd(200),
type = "upper", order = "hclust",
addCoef.col = "black",
tl.col = "black", tl.srt = 45,)
Robbery and Murder are obviouslyt contributing factors to crime so I dont want to view those parameters. I want to see what really creates the crime
model <- lm(`Crime/100k` ~ `Density(sqrmi/1000)`+ `Income/capitaofstate`, data= gundata_corplot)
model
##
## Call:
## lm(formula = `Crime/100k` ~ `Density(sqrmi/1000)` + `Income/capitaofstate`,
## data = gundata_corplot)
##
## Coefficients:
## (Intercept) `Density(sqrmi/1000)` `Income/capitaofstate`
## 85.39321 146.67552 0.02667
summary(model)
##
## Call:
## lm(formula = `Crime/100k` ~ `Density(sqrmi/1000)` + `Income/capitaofstate`,
## data = gundata_corplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -712.70 -189.17 -34.69 163.41 868.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.539e+01 4.036e+01 2.116 0.0346 *
## `Density(sqrmi/1000)` 1.467e+02 5.543e+00 26.460 <2e-16 ***
## `Income/capitaofstate` 2.667e-02 2.941e-03 9.067 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 241.6 on 1170 degrees of freedom
## Multiple R-squared: 0.4785, Adjusted R-squared: 0.4776
## F-statistic: 536.8 on 2 and 1170 DF, p-value: < 2.2e-16
plot(model)
The model isnt very strong and we see that there is a substantial skew of our data in regards to the Q-Q plot, also we see that the data is not showing a lot of normality.