Week 8: Apply it to your data 7

Import your data

Mydata <- read_csv("../00_data/Mydata.csv")

## New names:
## Rows: 65706 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): lake, species, comments, region dbl (4): ...1, year, grand_total, values
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

Mydata

## # A tibble: 65,706 × 8
##     ...1  year lake  species      grand_total comments region            values
##    <dbl> <dbl> <chr> <chr>              <dbl> <chr>    <chr>              <dbl>
##  1     1  1991 Erie  American Eel           1 <NA>     Michigan (MI)          0
##  2     2  1991 Erie  American Eel           1 <NA>     New York (NY)          0
##  3     3  1991 Erie  American Eel           1 <NA>     Ohio (OH)              0
##  4     4  1991 Erie  American Eel           1 <NA>     Pennsylvania (PA)      0
##  5     5  1991 Erie  American Eel           1 <NA>     U.S. Total             0
##  6     6  1991 Erie  American Eel           1 <NA>     Canada (ONT)           1
##  7     7  1992 Erie  American Eel           0 <NA>     Michigan (MI)          0
##  8     8  1992 Erie  American Eel           0 <NA>     New York (NY)          0
##  9     9  1992 Erie  American Eel           0 <NA>     Ohio (OH)              0
## 10    10  1992 Erie  American Eel           0 <NA>     Pennsylvania (PA)      0
## # ℹ 65,696 more rows

Pivoting

long to wide form

Mydata_long <- Mydata %>%
    
    pivot_longer(cols = c(`lake`, `species`), 
                 names_to = "presence", 
                 values_to = "area")

# Pivoting is done but does not make sense as Data was already well arranged

wide to long form

Mydata_long %>%
    pivot_wider(names_from = presence, values_from = area)

## # A tibble: 65,706 × 8
##     ...1  year grand_total comments region            values lake  species     
##    <dbl> <dbl>       <dbl> <chr>    <chr>              <dbl> <chr> <chr>       
##  1     1  1991           1 <NA>     Michigan (MI)          0 Erie  American Eel
##  2     2  1991           1 <NA>     New York (NY)          0 Erie  American Eel
##  3     3  1991           1 <NA>     Ohio (OH)              0 Erie  American Eel
##  4     4  1991           1 <NA>     Pennsylvania (PA)      0 Erie  American Eel
##  5     5  1991           1 <NA>     U.S. Total             0 Erie  American Eel
##  6     6  1991           1 <NA>     Canada (ONT)           1 Erie  American Eel
##  7     7  1992           0 <NA>     Michigan (MI)          0 Erie  American Eel
##  8     8  1992           0 <NA>     New York (NY)          0 Erie  American Eel
##  9     9  1992           0 <NA>     Ohio (OH)              0 Erie  American Eel
## 10    10  1992           0 <NA>     Pennsylvania (PA)      0 Erie  American Eel
## # ℹ 65,696 more rows

#reverting pivot back to normal

Separating and Uniting

Separate a column

Mydata_sep <- Mydata %>%
    
    separate(col = region, into = c("City", "State"))

## Warning: Expected 2 pieces. Additional pieces discarded in 65706 rows [1, 2, 3, 4, 5, 6,
## 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

# Most values were successfully separated apart from anomalies

Unite two columns

Mydata_sep %>%
    
    unite(col = "Region", City:State, sep = "/")

## # A tibble: 65,706 × 8
##     ...1  year lake  species      grand_total comments Region          values
##    <dbl> <dbl> <chr> <chr>              <dbl> <chr>    <chr>            <dbl>
##  1     1  1991 Erie  American Eel           1 <NA>     Michigan/MI          0
##  2     2  1991 Erie  American Eel           1 <NA>     New/York             0
##  3     3  1991 Erie  American Eel           1 <NA>     Ohio/OH              0
##  4     4  1991 Erie  American Eel           1 <NA>     Pennsylvania/PA      0
##  5     5  1991 Erie  American Eel           1 <NA>     U/S                  0
##  6     6  1991 Erie  American Eel           1 <NA>     Canada/ONT           1
##  7     7  1992 Erie  American Eel           0 <NA>     Michigan/MI          0
##  8     8  1992 Erie  American Eel           0 <NA>     New/York             0
##  9     9  1992 Erie  American Eel           0 <NA>     Ohio/OH              0
## 10    10  1992 Erie  American Eel           0 <NA>     Pennsylvania/PA      0
## # ℹ 65,696 more rows

# Again most values were successfully separated apart from anomalies

Missing Values

#Use function to change explicit missing values to implicit in a column to get rid of useless rows

library(dplyr)

Mydata %>% 
    filter(!is.na(grand_total))

## # A tibble: 33,939 × 8
##     ...1  year lake  species      grand_total comments region            values
##    <dbl> <dbl> <chr> <chr>              <dbl> <chr>    <chr>              <dbl>
##  1     1  1991 Erie  American Eel           1 <NA>     Michigan (MI)          0
##  2     2  1991 Erie  American Eel           1 <NA>     New York (NY)          0
##  3     3  1991 Erie  American Eel           1 <NA>     Ohio (OH)              0
##  4     4  1991 Erie  American Eel           1 <NA>     Pennsylvania (PA)      0
##  5     5  1991 Erie  American Eel           1 <NA>     U.S. Total             0
##  6     6  1991 Erie  American Eel           1 <NA>     Canada (ONT)           1
##  7     7  1992 Erie  American Eel           0 <NA>     Michigan (MI)          0
##  8     8  1992 Erie  American Eel           0 <NA>     New York (NY)          0
##  9     9  1992 Erie  American Eel           0 <NA>     Ohio (OH)              0
## 10    10  1992 Erie  American Eel           0 <NA>     Pennsylvania (PA)      0
## # ℹ 33,929 more rows

Week 8: Apply it to your data 7

Angus Somers

2025-04-11

Import your data

Pivoting

long to wide form

wide to long form

Separating and Uniting

Separate a column

Unite two columns

Missing Values