Logistic Regression - County data - Model explainer
Author
Tural Sadigov
libraries and data
We use tidyverse (Wickham et al. (2019)) for data wrangling, tidymodels (Kuhn and Wickham (2020)) for modeling and stats2data (Sadigov (2022)) for the data.
Code
```{r}library(tidyverse)library(tidymodels)library(stats2data)# one <- 'https://raw.githubusercontent.com/'# two <- 'turalsadigov/'# three <- 'MATH_254/main/data/'# four <- 'county.csv'# url <- str_c(one, two, three, four)# county <- read_csv(url)df <- stats2data::county %>%select(name, state, pop2017, median_hh_income, metro) %>%unite('name/state', name:state, sep ='/') %>%mutate(metro =factor(metro)) %>%drop_na()df <- df %>%mutate(pop2017 =log(pop2017))set.seed(2022)df_split <-initial_split(data = df, prop =0.80, strata = metro)df_training <-training(df_split)df_testing <-testing(df_split)```
Baniecki, Hubert, and Przemyslaw Biecek. 2019. “modelStudio: Interactive Studio with Explanations for ML Predictive Models” 4: 1798. https://doi.org/10.21105/joss.01798.
Greenwell, Brandon M., and Bradley C. Boehmke. 2020. “Variable Importance Plotsan Introduction to the Vip Package” 12. https://doi.org/10.32614/RJ-2020-013.
Kuhn, Max, and Hadley Wickham. 2020. “Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles.”https://www.tidymodels.org.
Maksymiuk, Szymon, Alicja Gosiewska, and Przemyslaw Biecek. 2020. “Landscape of r Packages for eXplainable Artificial Intelligence.”https://arxiv.org/abs/2009.13248.
Sadigov, Tural. 2022. “Stats2data: Data Package for MATH 254, Statistical Modeling and Applications, at Hamilton College.”
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.