Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

ufo_sightings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/ufo_sightings.csv')

## Rows: 96429 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): city, state, country_code, shape, reported_duration, summary, day_...
## dbl  (1): duration_seconds
## lgl  (1): has_images
## dttm (2): reported_date_time, reported_date_time_utc
## date (1): posted_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

places <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/places.csv')

## Rows: 14417 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): city, alternate_city_names, state, country, country_code, timezone
## dbl (4): latitude, longitude, population, elevation_m
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1 : Places

Columns: City, country_code, Elevation_m
Rows: 10

Data 2 : Ufo Sightings

Columns: City, country_code, Shape
Rows: 10

set.seed(1234)
places_small <- places %>% select(state, country_code, elevation_m) %>% sample_n(10)
ufo_sightings_small <- ufo_sightings %>% select(state, country_code, shape) %>% sample_n(10)

places_small

## # A tibble: 10 × 3
##    state       country_code elevation_m
##    <chr>       <chr>              <dbl>
##  1 OH          US                   262
##  2 MA          US                    55
##  3 MN          US                   298
##  4 CA          US                    53
##  5 NJ          US                     7
##  6 ME          US                    12
##  7 CT          US                   145
##  8 Bacs-Kiskun HU                    NA
##  9 IN          US                   150
## 10 WV          US                   290

ufo_sightings_small

## # A tibble: 10 × 3
##    state country_code shape   
##    <chr> <chr>        <chr>   
##  1 GA    US           triangle
##  2 NC    US           cylinder
##  3 NJ    US           triangle
##  4 CA    US           circle  
##  5 TX    US           triangle
##  6 AZ    US           sphere  
##  7 FL    US           sphere  
##  8 CA    US           unknown 
##  9 MI    US           triangle
## 10 FL    US           circle

3. inner_join

Describe the resulting data:

Columns: state, country_code, elevation_m, shape
Rows: 3

How is it different from the original two datasets? It`s 3 rows instead of 10 in the original data set. All colum from the two data sets

places_small %>% inner_join(ufo_sightings_small, by = c("country_code", "state"))

## # A tibble: 3 × 4
##   state country_code elevation_m shape   
##   <chr> <chr>              <dbl> <chr>   
## 1 CA    US                    53 circle  
## 2 CA    US                    53 unknown 
## 3 NJ    US                     7 triangle

4. left_join

Describe the resulting data:

Columns: state, country_code, elevation_m, shape
Rows: 11

How is it different from the original two datasets? * Another row was added, another city state and country_code * Showing all the elevation_m info and takes the shape info away (left)(right) * Matches data from the 3 rows that matches as the same state, country_code, elevation_m and shape

places_small %>% left_join(ufo_sightings_small, by = c("country_code", "state"))

## # A tibble: 11 × 4
##    state       country_code elevation_m shape   
##    <chr>       <chr>              <dbl> <chr>   
##  1 OH          US                   262 <NA>    
##  2 MA          US                    55 <NA>    
##  3 MN          US                   298 <NA>    
##  4 CA          US                    53 circle  
##  5 CA          US                    53 unknown 
##  6 NJ          US                     7 triangle
##  7 ME          US                    12 <NA>    
##  8 CT          US                   145 <NA>    
##  9 Bacs-Kiskun HU                    NA <NA>    
## 10 IN          US                   150 <NA>    
## 11 WV          US                   290 <NA>

5. right_join

Describe the resulting data:

Columns: state, country_code, elevation_m, shape
Rows: 10

How is it different from the original two datasets? * same amount of rows, but adds all info from the rightside visible and takes the info from the left away. * Matches data from the 3 rows that matches as the same state, country_code, elevation_m and shape

places_small %>% right_join(ufo_sightings_small, by = c("country_code", "state"))

## # A tibble: 10 × 4
##    state country_code elevation_m shape   
##    <chr> <chr>              <dbl> <chr>   
##  1 CA    US                    53 circle  
##  2 CA    US                    53 unknown 
##  3 NJ    US                     7 triangle
##  4 GA    US                    NA triangle
##  5 NC    US                    NA cylinder
##  6 TX    US                    NA triangle
##  7 AZ    US                    NA sphere  
##  8 FL    US                    NA sphere  
##  9 MI    US                    NA triangle
## 10 FL    US                    NA circle

6. full_join

Describe the resulting data:

Columns: state, country_codes, elevation_m, shape
Rows: 18

How is it different from the original two datasets? * More rows added * Shows all of the rows info

places_small %>% full_join(ufo_sightings_small, by = c("country_code", "state"))

## # A tibble: 18 × 4
##    state       country_code elevation_m shape   
##    <chr>       <chr>              <dbl> <chr>   
##  1 OH          US                   262 <NA>    
##  2 MA          US                    55 <NA>    
##  3 MN          US                   298 <NA>    
##  4 CA          US                    53 circle  
##  5 CA          US                    53 unknown 
##  6 NJ          US                     7 triangle
##  7 ME          US                    12 <NA>    
##  8 CT          US                   145 <NA>    
##  9 Bacs-Kiskun HU                    NA <NA>    
## 10 IN          US                   150 <NA>    
## 11 WV          US                   290 <NA>    
## 12 GA          US                    NA triangle
## 13 NC          US                    NA cylinder
## 14 TX          US                    NA triangle
## 15 AZ          US                    NA sphere  
## 16 FL          US                    NA sphere  
## 17 MI          US                    NA triangle
## 18 FL          US                    NA circle

7. semi_join

Describe the resulting data:

Columns: state, country_code, elevation_m
Rows: 2

How is it different from the original two datasets? * Remowes shape column * Makes it only two rows

places_small %>% semi_join(ufo_sightings_small, by = c("country_code", "state"))

## # A tibble: 2 × 3
##   state country_code elevation_m
##   <chr> <chr>              <dbl>
## 1 CA    US                    53
## 2 NJ    US                     7

8. anti_join

Describe the resulting data:

Columns: State, country_code, elevation_m
Rows: 8

How is it different from the original two datasets? * Remowes shape column * Shows 8 rows

places_small %>% anti_join(ufo_sightings_small, by = c("country_code", "state"))

## # A tibble: 8 × 3
##   state       country_code elevation_m
##   <chr>       <chr>              <dbl>
## 1 OH          US                   262
## 2 MA          US                    55
## 3 MN          US                   298
## 4 ME          US                    12
## 5 CT          US                   145
## 6 Bacs-Kiskun HU                    NA
## 7 IN          US                   150
## 8 WV          US                   290

Week 9: Apply it to your data 8

Daniel Lee

2022-10-05

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join