Import two related datasets from TidyTuesday Project.
College <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/salary_potential.csv")
## Rows: 935 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): name, state_name
## dbl (5): rank, early_career_pay, mid_career_pay, make_world_better_percent, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
College
## # A tibble: 935 × 7
## rank name state_name early_career_pay mid_career_pay make_world_better_pe…¹
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1 Aubu… Alabama 54400 104500 51
## 2 2 Univ… Alabama 57500 103900 59
## 3 3 The … Alabama 52300 97400 50
## 4 4 Tusk… Alabama 54500 93500 61
## 5 5 Samf… Alabama 48400 90500 52
## 6 6 Spri… Alabama 46600 89100 53
## 7 7 Birm… Alabama 49100 88300 48
## 8 8 Univ… Alabama 48600 87200 57
## 9 9 Univ… Alabama 47700 86400 56
## 10 10 Alab… Alabama 48700 83500 58
## # ℹ 925 more rows
## # ℹ abbreviated name: ¹make_world_better_percent
## # ℹ 1 more variable: stem_percent <dbl>
Tuition <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/tuition_income.csv")
## Rows: 209012 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): name, state, campus, income_lvl
## dbl (3): total_price, year, net_cost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Tuition
## # A tibble: 209,012 × 7
## name state total_price year campus net_cost income_lvl
## <chr> <chr> <dbl> <dbl> <chr> <dbl> <chr>
## 1 Piedmont International Un… NC 20174 2016 On Ca… 11475 0 to 30,0…
## 2 Piedmont International Un… NC 20174 2016 On Ca… 11451 30,001 to…
## 3 Piedmont International Un… NC 20174 2016 On Ca… 16229 48_001 to…
## 4 Piedmont International Un… NC 20174 2016 On Ca… 15592 75,001 to…
## 5 Piedmont International Un… NC 20514 2017 On Ca… 11668. 0 to 30,0…
## 6 Piedmont International Un… NC 20514 2017 On Ca… 11644. 30,001 to…
## 7 Piedmont International Un… NC 20514 2017 On Ca… 16503. 48_001 to…
## 8 Piedmont International Un… NC 20514 2017 On Ca… 15855. 75,001 to…
## 9 Piedmont International Un… NC 20514 2017 On Ca… 0 Over 110,…
## 10 Piedmont International Un… NC 20829 2018 On Ca… 11848. 0 to 30,0…
## # ℹ 209,002 more rows
Describe the two datasets:
Data1
Data 2
College_small <- College %>% select(name, rank, state_name) %>% sample_n(10)
Tuition_small <- Tuition %>% select(campus, name, year) %>% sample_n(10)
College_small
## # A tibble: 10 × 3
## name rank state_name
## <chr> <dbl> <chr>
## 1 Stevenson University 15 Maryland
## 2 University of Wisconsin-Platteville 4 Wisconsin
## 3 Kennesaw State University 5 Georgia
## 4 Nova Southeastern University 14 Florida
## 5 Taylor University 10 Indiana
## 6 Menlo College 20 California
## 7 Yeshiva University 15 New-York
## 8 LeTourneau University 18 Texas
## 9 University of Nebraska Medical Center 3 Nebraska
## 10 Colgate University 7 New-York
Tuition_small
## # A tibble: 10 × 3
## campus name year
## <chr> <chr> <dbl>
## 1 Off Campus Marquette University 2014
## 2 Off Campus South Dakota School of Mines and Technology 2018
## 3 On Campus Santa Barbara Business College-Ventura 2018
## 4 Off Campus Whittier College 2014
## 5 Off Campus Cortiva Institute-New Jersey 2015
## 6 Off Campus Wesleyan College 2018
## 7 Off Campus Platt College-Los Angeles 2012
## 8 Off Campus Brigham Young University-Idaho 2018
## 9 On Campus Gonzaga University 2011
## 10 Off Campus Johnston Community College 2013
Describe the resulting data:
How is it different from the original two datasets?
There was no data to match so nothing was displayed.
joined_data <- College_small %>%
inner_join(Tuition_small)
## Joining with `by = join_by(name)`
joined_data
## # A tibble: 0 × 5
## # ℹ 5 variables: name <chr>, rank <dbl>, state_name <chr>, campus <chr>,
## # year <dbl>
Describe the resulting data:
How is it different from the original two datasets?
It found matching rows in the second data set and combined the two together.
left <- College_small %>%
left_join(Tuition_small)
## Joining with `by = join_by(name)`
left
## # A tibble: 10 × 5
## name rank state_name campus year
## <chr> <dbl> <chr> <chr> <dbl>
## 1 Stevenson University 15 Maryland <NA> NA
## 2 University of Wisconsin-Platteville 4 Wisconsin <NA> NA
## 3 Kennesaw State University 5 Georgia <NA> NA
## 4 Nova Southeastern University 14 Florida <NA> NA
## 5 Taylor University 10 Indiana <NA> NA
## 6 Menlo College 20 California <NA> NA
## 7 Yeshiva University 15 New-York <NA> NA
## 8 LeTourneau University 18 Texas <NA> NA
## 9 University of Nebraska Medical Center 3 Nebraska <NA> NA
## 10 Colgate University 7 New-York <NA> NA
Describe the resulting data:
How is it different from the original two datasets?
This returned all the rows from y and the x and y columns. There was no match for rank and state_name so NA was returned.
right <- College_small %>%
right_join(Tuition_small)
## Joining with `by = join_by(name)`
right
## # A tibble: 10 × 5
## name rank state_name campus year
## <chr> <dbl> <chr> <chr> <dbl>
## 1 Marquette University NA <NA> Off Campus 2014
## 2 South Dakota School of Mines and Technology NA <NA> Off Campus 2018
## 3 Santa Barbara Business College-Ventura NA <NA> On Campus 2018
## 4 Whittier College NA <NA> Off Campus 2014
## 5 Cortiva Institute-New Jersey NA <NA> Off Campus 2015
## 6 Wesleyan College NA <NA> Off Campus 2018
## 7 Platt College-Los Angeles NA <NA> Off Campus 2012
## 8 Brigham Young University-Idaho NA <NA> Off Campus 2018
## 9 Gonzaga University NA <NA> On Campus 2011
## 10 Johnston Community College NA <NA> Off Campus 2013
Describe the resulting data:
How is it different from the original two datasets?
Rows with matching column values in each set were combined. This created 20 rows instead of 10
full <- College_small %>%
full_join(Tuition_small)
## Joining with `by = join_by(name)`
full
## # A tibble: 20 × 5
## name rank state_name campus year
## <chr> <dbl> <chr> <chr> <dbl>
## 1 Stevenson University 15 Maryland <NA> NA
## 2 University of Wisconsin-Platteville 4 Wisconsin <NA> NA
## 3 Kennesaw State University 5 Georgia <NA> NA
## 4 Nova Southeastern University 14 Florida <NA> NA
## 5 Taylor University 10 Indiana <NA> NA
## 6 Menlo College 20 California <NA> NA
## 7 Yeshiva University 15 New-York <NA> NA
## 8 LeTourneau University 18 Texas <NA> NA
## 9 University of Nebraska Medical Center 3 Nebraska <NA> NA
## 10 Colgate University 7 New-York <NA> NA
## 11 Marquette University NA <NA> Off Campus 2014
## 12 South Dakota School of Mines and Technology NA <NA> Off Campus 2018
## 13 Santa Barbara Business College-Ventura NA <NA> On Campus 2018
## 14 Whittier College NA <NA> Off Campus 2014
## 15 Cortiva Institute-New Jersey NA <NA> Off Campus 2015
## 16 Wesleyan College NA <NA> Off Campus 2018
## 17 Platt College-Los Angeles NA <NA> Off Campus 2012
## 18 Brigham Young University-Idaho NA <NA> Off Campus 2018
## 19 Gonzaga University NA <NA> On Campus 2011
## 20 Johnston Community College NA <NA> Off Campus 2013
Describe the resulting data:
How is it different from the original two datasets?
There was no matching x values within the y values, so nothing was returned
semi <- College_small %>%
semi_join(Tuition_small)
## Joining with `by = join_by(name)`
semi
## # A tibble: 0 × 3
## # ℹ 3 variables: name <chr>, rank <dbl>, state_name <chr>
Describe the resulting data:
How is it different from the original two datasets?
All values that did not match in one dataset from the other set was returned.
anti <- College_small %>%
anti_join(Tuition_small)
## Joining with `by = join_by(name)`
anti
## # A tibble: 10 × 3
## name rank state_name
## <chr> <dbl> <chr>
## 1 Stevenson University 15 Maryland
## 2 University of Wisconsin-Platteville 4 Wisconsin
## 3 Kennesaw State University 5 Georgia
## 4 Nova Southeastern University 14 Florida
## 5 Taylor University 10 Indiana
## 6 Menlo College 20 California
## 7 Yeshiva University 15 New-York
## 8 LeTourneau University 18 Texas
## 9 University of Nebraska Medical Center 3 Nebraska
## 10 Colgate University 7 New-York