library(usdata)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
county_complete %>%
filter(state %in% c("Washington","Alabama")) %>%
mutate(WA = ifelse(state =="Washington",1,0)) -> ccm
# Make sure it worked
table(ccm$WA)
##
## 0 1
## 67 39
The model should identify counties in Washington based on characteristics in the county_complete (ccm) dataframe. For the first model, use unemployment_rate_2007 and poverty_2017.
WAMod1 = glm(WA ~ unemployment_rate_2007 + poverty_2017,data=ccm,family=binomial)
summary(WAMod1)
##
## Call:
## glm(formula = WA ~ unemployment_rate_2007 + poverty_2017, family = binomial,
## data = ccm)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1560 1.2675 0.123 0.902
## unemployment_rate_2007 1.6467 0.3333 4.940 7.82e-07 ***
## poverty_2017 -0.5456 0.1063 -5.130 2.89e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 139.462 on 105 degrees of freedom
## Residual deviance: 58.548 on 103 degrees of freedom
## AIC: 64.548
##
## Number of Fisher Scoring iterations: 6
ProbWA = predict(WAMod1,type="response")
PredWA = ProbWA > .5
# Create the confusion matrix
table(ccm$WA,PredWA)
## PredWA
## FALSE TRUE
## 0 63 4
## 1 6 33
# Compute the overall accuracy rate
AccRate = mean(PredWA == ccm$WA)
AccRate
## [1] 0.9056604
Of the 39 coun6ies in Washington, 33 were classified correctly and 6 were incorrectly identified as Alabama counties. In the case of Alabama, 63 counties were identified correctly and 4 were classified as Washington counties.
The overall accuracy was about 90%.
Create your own model using no more than two variables.