I am using couple of TidyVerse packages here, and college ‘major_list’ dataset from fivethirtyeight.com , and creating a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with my selected dataset.
The first package of TidyVerse to read the file is “Readr” It is a fast way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV).
readr supports the following file formats with these read_*() functions:
read_csv(): comma-separated values (CSV) files read_tsv(): tab-separated values (TSV) files read_delim(): delimited files (CSV and TSV are important special cases) read_fwf(): fixed-width files read_table(): whitespace-separated files read_log(): web log files
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.2
## Warning: package 'stringr' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(dplyr)
college_major <- read_csv("https://raw.githubusercontent.com/uzmabb182/CUNY-SPS-Assignments/main/data_607/tidyverse_assignment/majors-list.csv", na="")
##
## -- Column specification --------------------------------------------------------
## cols(
## FOD1P = col_character(),
## Major = col_character(),
## Major_Category = col_character()
## )
head(college_major)
## # A tibble: 6 x 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 1100 GENERAL AGRICULTURE Agriculture & Natural Resources
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3 1102 AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4 1103 ANIMAL SCIENCES Agriculture & Natural Resources
## 5 1104 FOOD SCIENCE Agriculture & Natural Resources
## 6 1105 PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources
For rename():
college_major <- as_tibble(college_major) # so it prints a little nicer
college_major <- rename(college_major, Major_Id = FOD1P)
head(college_major)
## # A tibble: 6 x 3
## Major_Id Major Major_Category
## <chr> <chr> <chr>
## 1 1100 GENERAL AGRICULTURE Agriculture & Natural Resources
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3 1102 AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4 1103 ANIMAL SCIENCES Agriculture & Natural Resources
## 5 1104 FOOD SCIENCE Agriculture & Natural Resources
## 6 1105 PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources
all_ages <- read_csv("https://raw.githubusercontent.com/uzmabb182/CUNY-SPS-Assignments/main/data_607/tidyverse_assignment/all-ages.csv", na="")
##
## -- Column specification --------------------------------------------------------
## cols(
## Major_code = col_double(),
## Major = col_character(),
## Major_category = col_character(),
## Total = col_double(),
## Employed = col_double(),
## Employed_full_time_year_round = col_double(),
## Unemployed = col_double(),
## Unemployment_rate = col_double(),
## Median = col_double(),
## P25th = col_double(),
## P75th = col_double()
## )
head(all_ages)
## # A tibble: 6 x 11
## Major_code Major Major_category Total Employed Employed_full_t~ Unemployed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1100 GENERA~ Agriculture & ~ 128148 90245 74078 2423
## 2 1101 AGRICU~ Agriculture & ~ 95326 76865 64240 2266
## 3 1102 AGRICU~ Agriculture & ~ 33955 26321 22810 821
## 4 1103 ANIMAL~ Agriculture & ~ 103549 81177 64937 3619
## 5 1104 FOOD S~ Agriculture & ~ 24280 17281 12722 894
## 6 1105 PLANT ~ Agriculture & ~ 79409 63043 51077 2070
## # ... with 4 more variables: Unemployment_rate <dbl>, Median <dbl>,
## # P25th <dbl>, P75th <dbl>
summary <- all_ages %>%
group_by(Major_category) %>%
summarise(Employed = mean(Employed, na.rm = TRUE)) %>%
arrange(desc(Employed))
summary
## # A tibble: 16 x 2
## Major_category Employed
## <chr> <dbl>
## 1 Business 579219.
## 2 Communications & Journalism 355760.
## 3 Social Science 207978.
## 4 Health 182724.
## 5 Education 177075.
## 6 Humanities & Liberal Arts 166612.
## 7 Arts 163587.
## 8 Psychology & Social Work 156887
## 9 Law & Public Policy 143785.
## 10 Computers & Mathematics 128237
## 11 Industrial Arts & Consumer Services 107683.
## 12 Engineering 90413.
## 13 Physical Sciences 70713.
## 14 Biology & Life Science 67647
## 15 Agriculture & Natural Resources 48042.
## 16 Interdisciplinary 35706
ggplot(summary, aes(x = Employed, y = Major_category, fill = Major_category)) +
geom_col(position = "dodge")
grad_students <- read_csv("https://raw.githubusercontent.com/uzmabb182/CUNY-SPS-Assignments/main/data_607/tidyverse_assignment/grad-students.csv", na="")
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Major = col_character(),
## Major_category = col_character()
## )
## i Use `spec()` for the full column specifications.
head(grad_students)
## # A tibble: 6 x 22
## Major_code Major Major_category Grad_total Grad_sample_size Grad_employed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5601 CONSTRUC~ Industrial Art~ 9173 200 7098
## 2 6004 COMMERCI~ Arts 53864 882 40492
## 3 6211 HOSPITAL~ Business 24417 437 18368
## 4 2201 COSMETOL~ Industrial Art~ 5411 72 3590
## 5 2001 COMMUNIC~ Computers & Ma~ 9109 171 7512
## 6 3201 COURT RE~ Law & Public P~ 1542 22 1008
## # ... with 16 more variables: Grad_full_time_year_round <dbl>,
## # Grad_unemployed <dbl>, Grad_unemployment_rate <dbl>, Grad_median <dbl>,
## # Grad_P25 <dbl>, Grad_P75 <dbl>, Nongrad_total <dbl>,
## # Nongrad_employed <dbl>, Nongrad_full_time_year_round <dbl>,
## # Nongrad_unemployed <dbl>, Nongrad_unemployment_rate <dbl>,
## # Nongrad_median <dbl>, Nongrad_P25 <dbl>, Nongrad_P75 <dbl>,
## # Grad_share <dbl>, Grad_premium <dbl>
Mutating joins allow you to combine variables from multiple tables.
merge_df <- all_ages %>% left_join(grad_students, by = "Major_code")
merge_df
## # A tibble: 173 x 32
## Major_code Major.x Major_category.x Total Employed Employed_full_tim~
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1100 GENERAL AGR~ Agriculture & Nat~ 128148 90245 74078
## 2 1101 AGRICULTURE~ Agriculture & Nat~ 95326 76865 64240
## 3 1102 AGRICULTURA~ Agriculture & Nat~ 33955 26321 22810
## 4 1103 ANIMAL SCIE~ Agriculture & Nat~ 103549 81177 64937
## 5 1104 FOOD SCIENCE Agriculture & Nat~ 24280 17281 12722
## 6 1105 PLANT SCIEN~ Agriculture & Nat~ 79409 63043 51077
## 7 1106 SOIL SCIENCE Agriculture & Nat~ 6586 4926 4042
## 8 1199 MISCELLANEO~ Agriculture & Nat~ 8549 6392 5074
## 9 1301 ENVIRONMENT~ Biology & Life Sc~ 106106 87602 65238
## 10 1302 FORESTRY Agriculture & Nat~ 69447 48228 39613
## # ... with 163 more rows, and 26 more variables: Unemployed <dbl>,
## # Unemployment_rate <dbl>, Median <dbl>, P25th <dbl>, P75th <dbl>,
## # Major.y <chr>, Major_category.y <chr>, Grad_total <dbl>,
## # Grad_sample_size <dbl>, Grad_employed <dbl>,
## # Grad_full_time_year_round <dbl>, Grad_unemployed <dbl>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <dbl>,
## # Grad_P75 <dbl>, Nongrad_total <dbl>, Nongrad_employed <dbl>,
## # Nongrad_full_time_year_round <dbl>, Nongrad_unemployed <dbl>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>, Nongrad_P25 <dbl>,
## # Nongrad_P75 <dbl>, Grad_share <dbl>, Grad_premium <dbl>
merge_df <- select(merge_df, -c(Major_category.y, Major.y))
head(merge_df)
## # A tibble: 6 x 30
## Major_code Major.x Major_category.x Total Employed Employed_full_tim~
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1100 GENERAL AGRI~ Agriculture & Nat~ 128148 90245 74078
## 2 1101 AGRICULTURE ~ Agriculture & Nat~ 95326 76865 64240
## 3 1102 AGRICULTURAL~ Agriculture & Nat~ 33955 26321 22810
## 4 1103 ANIMAL SCIEN~ Agriculture & Nat~ 103549 81177 64937
## 5 1104 FOOD SCIENCE Agriculture & Nat~ 24280 17281 12722
## 6 1105 PLANT SCIENC~ Agriculture & Nat~ 79409 63043 51077
## # ... with 24 more variables: Unemployed <dbl>, Unemployment_rate <dbl>,
## # Median <dbl>, P25th <dbl>, P75th <dbl>, Grad_total <dbl>,
## # Grad_sample_size <dbl>, Grad_employed <dbl>,
## # Grad_full_time_year_round <dbl>, Grad_unemployed <dbl>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <dbl>,
## # Grad_P75 <dbl>, Nongrad_total <dbl>, Nongrad_employed <dbl>,
## # Nongrad_full_time_year_round <dbl>, Nongrad_unemployed <dbl>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>, Nongrad_P25 <dbl>,
## # Nongrad_P75 <dbl>, Grad_share <dbl>, Grad_premium <dbl>
merge_df <- merge_df %>%
rename(Major_category = Major_category.x,
Major = Major.x
)
head(merge_df)
## # A tibble: 6 x 30
## Major_code Major Major_category Total Employed Employed_full_t~ Unemployed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1100 GENERA~ Agriculture & ~ 128148 90245 74078 2423
## 2 1101 AGRICU~ Agriculture & ~ 95326 76865 64240 2266
## 3 1102 AGRICU~ Agriculture & ~ 33955 26321 22810 821
## 4 1103 ANIMAL~ Agriculture & ~ 103549 81177 64937 3619
## 5 1104 FOOD S~ Agriculture & ~ 24280 17281 12722 894
## 6 1105 PLANT ~ Agriculture & ~ 79409 63043 51077 2070
## # ... with 23 more variables: Unemployment_rate <dbl>, Median <dbl>,
## # P25th <dbl>, P75th <dbl>, Grad_total <dbl>, Grad_sample_size <dbl>,
## # Grad_employed <dbl>, Grad_full_time_year_round <dbl>,
## # Grad_unemployed <dbl>, Grad_unemployment_rate <dbl>, Grad_median <dbl>,
## # Grad_P25 <dbl>, Grad_P75 <dbl>, Nongrad_total <dbl>,
## # Nongrad_employed <dbl>, Nongrad_full_time_year_round <dbl>,
## # Nongrad_unemployed <dbl>, Nongrad_unemployment_rate <dbl>,
## # Nongrad_median <dbl>, Nongrad_P25 <dbl>, Nongrad_P75 <dbl>,
## # Grad_share <dbl>, Grad_premium <dbl>
women_stem <- read_csv("https://raw.githubusercontent.com/uzmabb182/CUNY-SPS-Assignments/main/data_607/tidyverse_assignment/grad-students.csv", na="")
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Major = col_character(),
## Major_category = col_character()
## )
## i Use `spec()` for the full column specifications.
head(women_stem)
## # A tibble: 6 x 22
## Major_code Major Major_category Grad_total Grad_sample_size Grad_employed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5601 CONSTRUC~ Industrial Art~ 9173 200 7098
## 2 6004 COMMERCI~ Arts 53864 882 40492
## 3 6211 HOSPITAL~ Business 24417 437 18368
## 4 2201 COSMETOL~ Industrial Art~ 5411 72 3590
## 5 2001 COMMUNIC~ Computers & Ma~ 9109 171 7512
## 6 3201 COURT RE~ Law & Public P~ 1542 22 1008
## # ... with 16 more variables: Grad_full_time_year_round <dbl>,
## # Grad_unemployed <dbl>, Grad_unemployment_rate <dbl>, Grad_median <dbl>,
## # Grad_P25 <dbl>, Grad_P75 <dbl>, Nongrad_total <dbl>,
## # Nongrad_employed <dbl>, Nongrad_full_time_year_round <dbl>,
## # Nongrad_unemployed <dbl>, Nongrad_unemployment_rate <dbl>,
## # Nongrad_median <dbl>, Nongrad_P25 <dbl>, Nongrad_P75 <dbl>,
## # Grad_share <dbl>, Grad_premium <dbl>
recent_grads <- read_csv("https://raw.githubusercontent.com/uzmabb182/CUNY-SPS-Assignments/main/data_607/tidyverse_assignment/recent-grads.csv", na="")
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Major = col_character(),
## Major_category = col_character()
## )
## i Use `spec()` for the full column specifications.
head(recent_grads)
## # A tibble: 6 x 21
## Rank Major_code Major Total Men Women Major_category ShareWomen Sample_size
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 1 2419 PETR~ 2339 2057 282 Engineering 0.121 36
## 2 2 2416 MINI~ 756 679 77 Engineering 0.102 7
## 3 3 2415 META~ 856 725 131 Engineering 0.153 3
## 4 4 2417 NAVA~ 1258 1123 135 Engineering 0.107 16
## 5 5 2405 CHEM~ 32260 21239 11021 Engineering 0.342 289
## 6 6 2418 NUCL~ 2573 2200 373 Engineering 0.145 17
## # ... with 12 more variables: Employed <dbl>, Full_time <dbl>, Part_time <dbl>,
## # Full_time_year_round <dbl>, Unemployed <dbl>, Unemployment_rate <dbl>,
## # Median <dbl>, P25th <dbl>, P75th <dbl>, College_jobs <dbl>,
## # Non_college_jobs <dbl>, Low_wage_jobs <dbl>
merge_df <- merge_df %>% left_join(recent_grads, by = "Major_code")
merge_df
## # A tibble: 173 x 50
## Major_code Major.x Major_category.x Total.x Employed.x Employed_full_ti~
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1100 GENERAL AG~ Agriculture & Na~ 128148 90245 74078
## 2 1101 AGRICULTUR~ Agriculture & Na~ 95326 76865 64240
## 3 1102 AGRICULTUR~ Agriculture & Na~ 33955 26321 22810
## 4 1103 ANIMAL SCI~ Agriculture & Na~ 103549 81177 64937
## 5 1104 FOOD SCIEN~ Agriculture & Na~ 24280 17281 12722
## 6 1105 PLANT SCIE~ Agriculture & Na~ 79409 63043 51077
## 7 1106 SOIL SCIEN~ Agriculture & Na~ 6586 4926 4042
## 8 1199 MISCELLANE~ Agriculture & Na~ 8549 6392 5074
## 9 1301 ENVIRONMEN~ Biology & Life S~ 106106 87602 65238
## 10 1302 FORESTRY Agriculture & Na~ 69447 48228 39613
## # ... with 163 more rows, and 44 more variables: Unemployed.x <dbl>,
## # Unemployment_rate.x <dbl>, Median.x <dbl>, P25th.x <dbl>, P75th.x <dbl>,
## # Grad_total <dbl>, Grad_sample_size <dbl>, Grad_employed <dbl>,
## # Grad_full_time_year_round <dbl>, Grad_unemployed <dbl>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <dbl>,
## # Grad_P75 <dbl>, Nongrad_total <dbl>, Nongrad_employed <dbl>,
## # Nongrad_full_time_year_round <dbl>, Nongrad_unemployed <dbl>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>, Nongrad_P25 <dbl>,
## # Nongrad_P75 <dbl>, Grad_share <dbl>, Grad_premium <dbl>, Rank <dbl>,
## # Major.y <chr>, Total.y <dbl>, Men <dbl>, Women <dbl>,
## # Major_category.y <chr>, ShareWomen <dbl>, Sample_size <dbl>,
## # Employed.y <dbl>, Full_time <dbl>, Part_time <dbl>,
## # Full_time_year_round <dbl>, Unemployed.y <dbl>, Unemployment_rate.y <dbl>,
## # Median.y <dbl>, P25th.y <dbl>, P75th.y <dbl>, College_jobs <dbl>,
## # Non_college_jobs <dbl>, Low_wage_jobs <dbl>
merge_df <- select(merge_df, -c(Major_category.y, Major.y))
merge_df
## # A tibble: 173 x 48
## Major_code Major.x Major_category.x Total.x Employed.x Employed_full_ti~
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1100 GENERAL AG~ Agriculture & Na~ 128148 90245 74078
## 2 1101 AGRICULTUR~ Agriculture & Na~ 95326 76865 64240
## 3 1102 AGRICULTUR~ Agriculture & Na~ 33955 26321 22810
## 4 1103 ANIMAL SCI~ Agriculture & Na~ 103549 81177 64937
## 5 1104 FOOD SCIEN~ Agriculture & Na~ 24280 17281 12722
## 6 1105 PLANT SCIE~ Agriculture & Na~ 79409 63043 51077
## 7 1106 SOIL SCIEN~ Agriculture & Na~ 6586 4926 4042
## 8 1199 MISCELLANE~ Agriculture & Na~ 8549 6392 5074
## 9 1301 ENVIRONMEN~ Biology & Life S~ 106106 87602 65238
## 10 1302 FORESTRY Agriculture & Na~ 69447 48228 39613
## # ... with 163 more rows, and 42 more variables: Unemployed.x <dbl>,
## # Unemployment_rate.x <dbl>, Median.x <dbl>, P25th.x <dbl>, P75th.x <dbl>,
## # Grad_total <dbl>, Grad_sample_size <dbl>, Grad_employed <dbl>,
## # Grad_full_time_year_round <dbl>, Grad_unemployed <dbl>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <dbl>,
## # Grad_P75 <dbl>, Nongrad_total <dbl>, Nongrad_employed <dbl>,
## # Nongrad_full_time_year_round <dbl>, Nongrad_unemployed <dbl>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>, Nongrad_P25 <dbl>,
## # Nongrad_P75 <dbl>, Grad_share <dbl>, Grad_premium <dbl>, Rank <dbl>,
## # Total.y <dbl>, Men <dbl>, Women <dbl>, ShareWomen <dbl>, Sample_size <dbl>,
## # Employed.y <dbl>, Full_time <dbl>, Part_time <dbl>,
## # Full_time_year_round <dbl>, Unemployed.y <dbl>, Unemployment_rate.y <dbl>,
## # Median.y <dbl>, P25th.y <dbl>, P75th.y <dbl>, College_jobs <dbl>,
## # Non_college_jobs <dbl>, Low_wage_jobs <dbl>
colnames(merge_df)<-gsub(".x","",colnames(merge_df))
merge_df
## # A tibble: 173 x 48
## Major_code Major Major_category Total Employed Employed_full_t~ Unemployed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1100 GENERA~ Agriculture &~ 128148 90245 74078 2423
## 2 1101 AGRICU~ Agriculture &~ 95326 76865 64240 2266
## 3 1102 AGRICU~ Agriculture &~ 33955 26321 22810 821
## 4 1103 ANIMAL~ Agriculture &~ 103549 81177 64937 3619
## 5 1104 FOOD S~ Agriculture &~ 24280 17281 12722 894
## 6 1105 PLANT ~ Agriculture &~ 79409 63043 51077 2070
## 7 1106 SOIL S~ Agriculture &~ 6586 4926 4042 264
## 8 1199 MISCEL~ Agriculture &~ 8549 6392 5074 261
## 9 1301 ENVIRO~ Biology & Lif~ 106106 87602 65238 4736
## 10 1302 FOREST~ Agriculture &~ 69447 48228 39613 2144
## # ... with 163 more rows, and 41 more variables: Unemployment_rate <dbl>,
## # Median <dbl>, P25th <dbl>, P75th <dbl>, Grad_total <dbl>,
## # Grad_sample_size <dbl>, Grad_employed <dbl>,
## # Grad_full_time_year_round <dbl>, Grad_unemployed <dbl>,
## # Grad_unemployment_rate <dbl>, Grad_median <dbl>, Grad_P25 <dbl>,
## # Grad_P75 <dbl>, Nongrad_total <dbl>, Nongrad_employed <dbl>,
## # Nongrad_full_time_year_round <dbl>, Nongrad_unemployed <dbl>,
## # Nongrad_unemployment_rate <dbl>, Nongrad_median <dbl>, Nongrad_P25 <dbl>,
## # Nongrad_P75 <dbl>, Grad_share <dbl>, Grad_premium <dbl>, Rank <dbl>,
## # Total.y <dbl>, Men <dbl>, Women <dbl>, ShareWomen <dbl>, Sample_size <dbl>,
## # Employed.y <dbl>, Full_time <dbl>, Part_time <dbl>,
## # Full_time_year_round <dbl>, Unemployed.y <dbl>, Unemployment_rate.y <dbl>,
## # Median.y <dbl>, P25th.y <dbl>, P75th.y <dbl>, College_jobs <dbl>,
## # Non_college_jobs <dbl>, Low_wage_jobs <dbl>
To demonstrate this new flexibility in a more useful situation, let’s take a look at quantile(). quantile() was hard to use previously because it returns multiple values. Now it’s straightforward
merge_df %>%
group_by(Major_category) %>%
summarise(x = quantile(Grad_unemployment_rate, c(0.25, 0.5, 0.75)), q = c(0.25, 0.5, 0.75))
## `summarise()` has grouped output by 'Major_category'. You can override using the `.groups` argument.
## # A tibble: 48 x 3
## # Groups: Major_category [16]
## Major_category x q
## <chr> <dbl> <dbl>
## 1 Agriculture & Natural Resources 0.0223 0.25
## 2 Agriculture & Natural Resources 0.0304 0.5
## 3 Agriculture & Natural Resources 0.0344 0.75
## 4 Arts 0.0407 0.25
## 5 Arts 0.0541 0.5
## 6 Arts 0.0627 0.75
## 7 Biology & Life Science 0.0210 0.25
## 8 Biology & Life Science 0.0250 0.5
## 9 Biology & Life Science 0.0329 0.75
## 10 Business 0.0419 0.25
## # ... with 38 more rows
summary1 <- merge_df %>%
group_by(Major_category) %>%
summarise(Grad_unemployed = mean(Grad_unemployed, na.rm = TRUE)) %>%
arrange(desc(Grad_unemployed))
summary1
## # A tibble: 16 x 2
## Major_category Grad_unemployed
## <chr> <dbl>
## 1 Business 7846.
## 2 Social Science 6725.
## 3 Humanities & Liberal Arts 5669.
## 4 Psychology & Social Work 5492
## 5 Communications & Journalism 4433.
## 6 Education 4184.
## 7 Arts 3070.
## 8 Computers & Mathematics 2642
## 9 Physical Sciences 2403
## 10 Biology & Life Science 2287.
## 11 Engineering 2244.
## 12 Health 2164.
## 13 Law & Public Policy 2002.
## 14 Industrial Arts & Consumer Services 1283.
## 15 Agriculture & Natural Resources 500.
## 16 Interdisciplinary 261
ggplot(data = summary1) +
geom_count(mapping = aes(x = Grad_unemployed, y = Major_category, color = Major_category) )