Import two related datasets from TidyTuesday Project.
ufo_sightings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/ufo_sightings.csv')
## Rows: 96429 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): city, state, country_code, shape, reported_duration, summary, day_...
## dbl (1): duration_seconds
## lgl (1): has_images
## dttm (2): reported_date_time, reported_date_time_utc
## date (1): posted_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
places <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/places.csv')
## Rows: 14417 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): city, alternate_city_names, state, country, country_code, timezone
## dbl (4): latitude, longitude, population, elevation_m
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1 : Places
Data 2 : Ufo Sightings
set.seed(1234)
places_small <- places %>% select(state, country_code, elevation_m) %>% sample_n(10)
ufo_sightings_small <- ufo_sightings %>% select(state, country_code, shape) %>% sample_n(10)
places_small
## # A tibble: 10 × 3
## state country_code elevation_m
## <chr> <chr> <dbl>
## 1 OH US 262
## 2 MA US 55
## 3 MN US 298
## 4 CA US 53
## 5 NJ US 7
## 6 ME US 12
## 7 CT US 145
## 8 Bacs-Kiskun HU NA
## 9 IN US 150
## 10 WV US 290
ufo_sightings_small
## # A tibble: 10 × 3
## state country_code shape
## <chr> <chr> <chr>
## 1 GA US triangle
## 2 NC US cylinder
## 3 NJ US triangle
## 4 CA US circle
## 5 TX US triangle
## 6 AZ US sphere
## 7 FL US sphere
## 8 CA US unknown
## 9 MI US triangle
## 10 FL US circle
Describe the resulting data:
How is it different from the original two datasets? It`s 3 rows instead of 10 in the original data set. All colum from the two data sets
places_small %>% inner_join(ufo_sightings_small, by = c("country_code", "state"))
## # A tibble: 3 × 4
## state country_code elevation_m shape
## <chr> <chr> <dbl> <chr>
## 1 CA US 53 circle
## 2 CA US 53 unknown
## 3 NJ US 7 triangle
Describe the resulting data:
How is it different from the original two datasets? * Another row was added, another city state and country_code * Showing all the elevation_m info and takes the shape info away (left)(right) * Matches data from the 3 rows that matches as the same state, country_code, elevation_m and shape
places_small %>% left_join(ufo_sightings_small, by = c("country_code", "state"))
## # A tibble: 11 × 4
## state country_code elevation_m shape
## <chr> <chr> <dbl> <chr>
## 1 OH US 262 <NA>
## 2 MA US 55 <NA>
## 3 MN US 298 <NA>
## 4 CA US 53 circle
## 5 CA US 53 unknown
## 6 NJ US 7 triangle
## 7 ME US 12 <NA>
## 8 CT US 145 <NA>
## 9 Bacs-Kiskun HU NA <NA>
## 10 IN US 150 <NA>
## 11 WV US 290 <NA>
Describe the resulting data:
How is it different from the original two datasets? * same amount of rows, but adds all info from the rightside visible and takes the info from the left away. * Matches data from the 3 rows that matches as the same state, country_code, elevation_m and shape
places_small %>% right_join(ufo_sightings_small, by = c("country_code", "state"))
## # A tibble: 10 × 4
## state country_code elevation_m shape
## <chr> <chr> <dbl> <chr>
## 1 CA US 53 circle
## 2 CA US 53 unknown
## 3 NJ US 7 triangle
## 4 GA US NA triangle
## 5 NC US NA cylinder
## 6 TX US NA triangle
## 7 AZ US NA sphere
## 8 FL US NA sphere
## 9 MI US NA triangle
## 10 FL US NA circle
Describe the resulting data:
How is it different from the original two datasets? * More rows added * Shows all of the rows info
places_small %>% full_join(ufo_sightings_small, by = c("country_code", "state"))
## # A tibble: 18 × 4
## state country_code elevation_m shape
## <chr> <chr> <dbl> <chr>
## 1 OH US 262 <NA>
## 2 MA US 55 <NA>
## 3 MN US 298 <NA>
## 4 CA US 53 circle
## 5 CA US 53 unknown
## 6 NJ US 7 triangle
## 7 ME US 12 <NA>
## 8 CT US 145 <NA>
## 9 Bacs-Kiskun HU NA <NA>
## 10 IN US 150 <NA>
## 11 WV US 290 <NA>
## 12 GA US NA triangle
## 13 NC US NA cylinder
## 14 TX US NA triangle
## 15 AZ US NA sphere
## 16 FL US NA sphere
## 17 MI US NA triangle
## 18 FL US NA circle
Describe the resulting data:
How is it different from the original two datasets? * Remowes shape column * Makes it only two rows
places_small %>% semi_join(ufo_sightings_small, by = c("country_code", "state"))
## # A tibble: 2 × 3
## state country_code elevation_m
## <chr> <chr> <dbl>
## 1 CA US 53
## 2 NJ US 7
Describe the resulting data:
How is it different from the original two datasets? * Remowes shape column * Shows 8 rows
places_small %>% anti_join(ufo_sightings_small, by = c("country_code", "state"))
## # A tibble: 8 × 3
## state country_code elevation_m
## <chr> <chr> <dbl>
## 1 OH US 262
## 2 MA US 55
## 3 MN US 298
## 4 ME US 12
## 5 CT US 145
## 6 Bacs-Kiskun HU NA
## 7 IN US 150
## 8 WV US 290