Exploring the Data With Summary Stats

I created five number summaries for all variables to provide a baseline of the state of Michigan, the data will be exported into a CVS file and transformed into an aesthetically pleasing dynamic table

Here’s the final table!

The script below:

Creates a descriptive variable names
Pulls a five number summary for all variables (grouped by county) using the ::pysch:: package
Rounds the data one decimal, for aesthetics
Trims off unnecessary rows and columns
Writes a CSV file for export

Step 1

install packages in console if needed, load the ::tidyverse::

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.1.3     ✔ stringr 1.4.0
## ✔ readr   1.4.0     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Step 2

Rename variables with descriptive titles, allowing for a cleaner looking final table
Select the numerical variables I can summarize

table_covid_census <-  read.csv("~/STA 518/BrookemWalters-Portfolio/Stats 518 Final Project/data/covid_census.csv") %>% 
  select(County, population19E, householdsE, median_ageE,median_incomeE, bach_degree_plus_a25E, public_assist_rate,
         unemployment_rate, percent_asian, percent_asian, percent_black,percent_native, percent_pacific_islander, 
         percent_white, percent_hispanic,Total_Deaths, Deaths_Per_Pop_Thousand) %>% 
  rename( 
    Population = "population19E",
    Households = "householdsE",
    `Median Age` = "median_ageE",
    `Median HHI` = "median_incomeE",
    `% A25+ College Grad+` = "bach_degree_plus_a25E",
    `% of HH on Public Assist.` = "public_assist_rate",
    `Unemployment Rate` = "unemployment_rate",
    `% Asian` = "percent_asian",
    `% Black` = "percent_black",    
    `% Native American` = "percent_native", 
    `% Pacific Islander` = "percent_pacific_islander",
    `% White` = "percent_white",            
    `% Hispanic` = "percent_hispanic",
    `Total Covid Deaths` = "Total_Deaths",
    `Covid Deaths Per Thousand` = "Deaths_Per_Pop_Thousand"
  )
glimpse(table_covid_census)

## Rows: 83
## Columns: 16
## $ County                      <chr> "Alcona", "Alger", "Allegan", "Alpena", "A…
## $ Population                  <int> 10396, 9098, 117104, 28431, 23301, 15013, …
## $ Households                  <int> 1326, 1379, 28264, 5315, 4204, 2718, 1472,…
## $ `Median Age`                <dbl> 58.9, 49.6, 40.2, 48.1, 51.6, 50.0, 45.8, …
## $ `Median HHI`                <int> 43341, 45184, 65071, 42603, 57256, 45679, …
## $ `% A25+ College Grad+`      <dbl> 18.4, 19.3, 23.6, 18.0, 29.6, 12.9, 15.2, …
## $ `% of HH on Public Assist.` <dbl> 35.1, 22.8, 12.1, 30.2, 19.8, 30.5, 33.0, …
## $ `Unemployment Rate`         <dbl> 6.1, 4.7, 3.4, 7.5, 4.2, 6.8, 4.1, 5.3, 5.…
## $ `% Asian`                   <dbl> 0.3, 0.8, 0.7, 0.5, 0.4, 0.2, 0.5, 0.6, 0.…
## $ `% Black`                   <dbl> 0.4, 6.9, 1.2, 0.6, 0.4, 0.4, 8.7, 0.4, 1.…
## $ `% Native American`         <dbl> 0.6, 3.9, 0.3, 0.3, 0.6, 1.3, 10.2, 0.2, 0…
## $ `% Pacific Islander`        <dbl> 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.…
## $ `% White`                   <dbl> 95.5, 83.5, 87.4, 95.2, 94.2, 94.5, 72.7, …
## $ `% Hispanic`                <dbl> 1.6, 1.7, 7.5, 1.4, 2.3, 2.0, 1.7, 3.1, 5.…
## $ `Total Covid Deaths`        <int> 72, 15, 368, 143, 70, 72, 55, 177, 607, 73…
## $ `Covid Deaths Per Thousand` <dbl> 6.92, 1.65, 3.12, 5.03, 3.00, 4.84, 6.70, …

Step 3

# I want county for another project, but I'll remove it here so I can summarize the numerical data
sum_table_covid_census <- select(table_covid_census, -County)

Calculate a five number summary of all variable using the “describe” function from the {pysch} package
- **note* the ::pysch:: has a major conflict the {ggplot2} -to avoid bugs later, it’s best to run it as part of the function

sum_table_covid_census <- psych::describe(sum_table_covid_census, fast = TRUE ) %>% 
  mutate_if(is.numeric, round,1)
glimpse(sum_table_covid_census)

## Rows: 15
## Columns: 8
## $ vars  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## $ n     <dbl> 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83
## $ mean  <dbl> 120167.6, 25974.1, 45.0, 53504.5, 23.1, 24.8, 5.9, 1.1, 3.8, 1.3…
## $ sd    <dbl> 259934.0, 58861.8, 6.4, 9444.7, 9.0, 6.7, 1.5, 1.5, 5.9, 2.6, 0.…
## $ min   <dbl> 2102.0, 306.0, 28.7, 38356.0, 12.0, 10.1, 3.4, 0.0, 0.3, 0.0, 0.…
## $ max   <dbl> 1753059.0, 414277.0, 59.1, 84274.0, 56.7, 42.7, 10.0, 9.1, 38.1,…
## $ range <dbl> 1750957.0, 413971.0, 30.4, 45918.0, 44.7, 32.6, 6.6, 9.1, 37.8, …
## $ se    <dbl> 28531.5, 6460.9, 0.7, 1036.7, 1.0, 0.7, 0.2, 0.2, 0.6, 0.3, 0.0,…

Step 4

Change the first column to the variable names, for table readability
All of the variables are for all 83 counties, removing “n” column

sum_table_covid_census <- tibble::rownames_to_column(sum_table_covid_census, "socioeconomic factor")

sum_table_covid_census <-   select(sum_table_covid_census, -vars , -n)

Step 5

Write an CSV file for use import into the dynamic summary table

write.csv(sum_table_covid_census,"~/STA 518/BrookemWalters-Portfolio/Stats 518 Final Project/Tables_Shiny/MIShiny/covid_census_summary_stats.csv")

Step 7

Clean up the full data set from Step 2 for use in the shiny table

table_covid_census <-  table_covid_census %>% 
  select(County, Population, Households,`Total Covid Deaths`,
         `Covid Deaths Per Thousand`,  `Median Age`, `Median HHI`,
         `% A25+ College Grad+`, `Unemployment Rate`, `% of HH on Public Assist.`,
         `% Asian`, `% Black`, `% Native American`, `% Pacific Islander`,
         `% White`, `% Hispanic`)

Step 8

Write a CSV file for use import into the dynamic county table

write.csv(table_covid_census,"~/STA 518/BrookemWalters-Portfolio/Stats 518 Final Project/Tables_Shiny/MIShiny/Covid_Census_Table.csv")

Exploring the Data With Summary Stats

brooke

7/12/2022