Data Tidying

Untidy Examples:

Example 1: Racial Wealth

Source: https://apps.urban.org/features/wealth-inequality-charts/

Original Table

wealthbyrace <- read_excel("WealthbyRace.xlsx")

## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5

wealthbyrace

## # A tibble: 13 x 5
##    `Average Family Wealth by Race/Et… ...2        ...3        ...4        ...5  
##                                 <dbl> <chr>       <chr>       <chr>       <chr> 
##  1                                 NA Non-White   White       Black       Hispa…
##  2                               1963 19503.8372… 140632.660… <NA>        <NA>  
##  3                               1983 73233.6157… 324057.599… 67269.6000… 62562…
##  4                               1989 <NA>        424082.4    78092.2     84397…
##  5                               1992 <NA>        373825.9    80779.48    90751…
##  6                               1995 <NA>        394522.3    68908.6399… 96487…
##  7                               1998 <NA>        497581.1    94972.45    12851…
##  8                               2001 <NA>        662337.1    97930.09    11985…
##  9                               2004 <NA>        715453.3    146127.9    15872…
## 10                               2007 <NA>        802519.8    156285.1    215534
## 11                               2010 <NA>        715067.3    110569.1    12803…
## 12                               2013 <NA>        717069.1    102106      11116…
## 13                               2016 <NA>        919336.1    139523.1    19172…

Tidied Table

wealthbyrace <- read_excel("WealthbyRace.xlsx", skip = 1) #skips the first row that is just the title

## New names:
## * `` -> ...1

tidied_wealthbyrace <- wealthbyrace %>%
     rename(year = 1) %>% #gives the nameless first column a name
     pivot_longer(cols = c("Non-White":"Hispanic"), names_to = "race", values_to = "wealth_family") #breaks down by year and race
tidied_wealthbyrace

## # A tibble: 48 x 3
##     year race      wealth_family
##    <dbl> <chr>             <dbl>
##  1  1963 Non-White        19504.
##  2  1963 White           140633.
##  3  1963 Black               NA 
##  4  1963 Hispanic            NA 
##  5  1983 Non-White        73234.
##  6  1983 White           324058.
##  7  1983 Black            67270.
##  8  1983 Hispanic         62562.
##  9  1989 Non-White           NA 
## 10  1989 White           424082.
## # … with 38 more rows

Example 2: Hate Crimes

Source: https://catalog.data.gov/dataset/hate-crimes-by-county-and-bias-type-beginning-2010

Original Table

hatecrimes <- read_csv("hatecrimes.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   County = col_character(),
##   `Crime Type` = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

hatecrimes

## # A tibble: 605 x 44
##    County  Year `Crime Type`         `Anti-Male` `Anti-Female` `Anti-Transgende…
##    <chr>  <dbl> <chr>                      <dbl>         <dbl>             <dbl>
##  1 Albany  2010 Crimes Against Pers…           0             0                 0
##  2 Albany  2010 Property Crimes                0             0                 0
##  3 Albany  2011 Crimes Against Pers…           0             0                 0
##  4 Albany  2011 Property Crimes                0             0                 0
##  5 Albany  2012 Crimes Against Pers…           0             0                 0
##  6 Albany  2012 Property Crimes                0             0                 0
##  7 Albany  2013 Crimes Against Pers…           0             0                 0
##  8 Albany  2013 Property Crimes                0             0                 0
##  9 Albany  2014 Crimes Against Pers…           0             0                 0
## 10 Albany  2014 Property Crimes                0             0                 0
## # … with 595 more rows, and 38 more variables:
## #   Anti-Gender Identity Expression <dbl>, Anti-Age* <dbl>, Anti-White <dbl>,
## #   Anti-Black <dbl>, Anti-American Indian/Alaskan Native <dbl>,
## #   Anti-Asian <dbl>, Anti-Native Hawaiian/Pacific Islander <dbl>,
## #   Anti-Multi-Racial Groups <dbl>, Anti-Other Race <dbl>, Anti-Jewish <dbl>,
## #   Anti-Catholic <dbl>, Anti-Protestant <dbl>, Anti-Islamic (Muslim) <dbl>,
## #   Anti-Multi-Religious Groups <dbl>, Anti-Atheism/Agnosticism <dbl>,
## #   Anti-Religious Practice Generally <dbl>, Anti-Other Religion <dbl>,
## #   Anti-Buddhist <dbl>, Anti-Eastern Orthodox (Greek, Russian, etc.) <dbl>,
## #   Anti-Hindu <dbl>, Anti-Jehovahs Witness <dbl>, Anti-Mormon <dbl>,
## #   Anti-Other Christian <dbl>, Anti-Sikh <dbl>, Anti-Hispanic <dbl>,
## #   Anti-Arab <dbl>, Anti-Other Ethnicity/National Origin <dbl>,
## #   Anti-Non-Hispanic* <dbl>, Anti-Gay Male <dbl>, Anti-Gay Female <dbl>,
## #   Anti-Gay (Male and Female) <dbl>, Anti-Heterosexual <dbl>,
## #   Anti-Bisexual <dbl>, Anti-Physical Disability <dbl>,
## #   Anti-Mental Disability <dbl>, Total Incidents <dbl>, Total Victims <dbl>,
## #   Total Offenders <dbl>

Tidied Table

tidied_hatecrimes <- hatecrimes %>%
  select('County', 'Year', 'Crime Type', 'Anti-Asian') %>% #gets rid of categories other than the one we are targeting
  pivot_wider(names_from = 'Year', values_from = 'Anti-Asian') #shows trends by year of each type of crime for each county
tidied_hatecrimes

## # A tibble: 116 x 12
##    County  `Crime Type`  `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017`
##    <chr>   <chr>          <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Albany  Crimes Again…      0      0      1      0      0      0      0      0
##  2 Albany  Property Cri…      0      0      0      0      0     NA      0      0
##  3 Allega… Crimes Again…     NA     NA     NA      0     NA     NA     NA     NA
##  4 Allega… Property Cri…     NA     NA     NA     NA     NA     NA      0     NA
##  5 Bronx   Crimes Again…      0      0      0      0      0      1      0      0
##  6 Bronx   Property Cri…      0      0      0      0      0      0      0      0
##  7 Broome  Crimes Again…      0      0      0      0     NA      0      1      0
##  8 Broome  Property Cri…     NA      0      0      0      0     NA     NA     NA
##  9 Cattar… Crimes Again…     NA      0      0      0     NA      0     NA      0
## 10 Cayuga  Crimes Again…      0     NA      0     NA     NA      0     NA      0
## # … with 106 more rows, and 2 more variables: 2018 <dbl>, 2019 <dbl>

Data Tidying

Teresa Lewandowski

3/4/2021

Tidy Examples:

Untidy Examples:

Example 1: Racial Wealth

Example 2: Hate Crimes