library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
For my final project I will be using data on college majors. The data is taken from FiveThirtyEight and can be found on their Github page: https://github.com/fivethirtyeight/data/tree/master/college-majors The data is split into different .csv files. The one I will be working with is the all-ages.csv dataset.
What I want to show is the different majors by their categories and the median salary made and/or to show employment vs unemployment by major. I feel this data is important because it helps students decide on what major to pick and to see trends on what majors will get them a job. The data is a little old and it can be used to compare to recent data to see trends in how the job market has changed over the years.
data <- read.csv("https://raw.githubusercontent.com/bpersaud104/Data608/master/all-ages.csv")
head(data, 50)
## Major_code Major
## 1 1100 GENERAL AGRICULTURE
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 3 1102 AGRICULTURAL ECONOMICS
## 4 1103 ANIMAL SCIENCES
## 5 1104 FOOD SCIENCE
## 6 1105 PLANT SCIENCE AND AGRONOMY
## 7 1106 SOIL SCIENCE
## 8 1199 MISCELLANEOUS AGRICULTURE
## 9 1301 ENVIRONMENTAL SCIENCE
## 10 1302 FORESTRY
## 11 1303 NATURAL RESOURCES MANAGEMENT
## 12 1401 ARCHITECTURE
## 13 1501 AREA ETHNIC AND CIVILIZATION STUDIES
## 14 1901 COMMUNICATIONS
## 15 1902 JOURNALISM
## 16 1903 MASS MEDIA
## 17 1904 ADVERTISING AND PUBLIC RELATIONS
## 18 2001 COMMUNICATION TECHNOLOGIES
## 19 2100 COMPUTER AND INFORMATION SYSTEMS
## 20 2101 COMPUTER PROGRAMMING AND DATA PROCESSING
## 21 2102 COMPUTER SCIENCE
## 22 2105 INFORMATION SCIENCES
## 23 2106 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY
## 24 2107 COMPUTER NETWORKING AND TELECOMMUNICATIONS
## 25 2201 COSMETOLOGY SERVICES AND CULINARY ARTS
## 26 2300 GENERAL EDUCATION
## 27 2301 EDUCATIONAL ADMINISTRATION AND SUPERVISION
## 28 2303 SCHOOL STUDENT COUNSELING
## 29 2304 ELEMENTARY EDUCATION
## 30 2305 MATHEMATICS TEACHER EDUCATION
## 31 2306 PHYSICAL AND HEALTH EDUCATION TEACHING
## 32 2307 EARLY CHILDHOOD EDUCATION
## 33 2308 SCIENCE AND COMPUTER TEACHER EDUCATION
## 34 2309 SECONDARY TEACHER EDUCATION
## 35 2310 SPECIAL NEEDS EDUCATION
## 36 2311 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION
## 37 2312 TEACHER EDUCATION: MULTIPLE LEVELS
## 38 2313 LANGUAGE AND DRAMA EDUCATION
## 39 2314 ART AND MUSIC EDUCATION
## 40 2399 MISCELLANEOUS EDUCATION
## 41 2400 GENERAL ENGINEERING
## 42 2401 AEROSPACE ENGINEERING
## 43 2402 BIOLOGICAL ENGINEERING
## 44 2403 ARCHITECTURAL ENGINEERING
## 45 2404 BIOMEDICAL ENGINEERING
## 46 2405 CHEMICAL ENGINEERING
## 47 2406 CIVIL ENGINEERING
## 48 2407 COMPUTER ENGINEERING
## 49 2408 ELECTRICAL ENGINEERING
## 50 2409 ENGINEERING MECHANICS PHYSICS AND SCIENCE
## Major_category Total Employed
## 1 Agriculture & Natural Resources 128148 90245
## 2 Agriculture & Natural Resources 95326 76865
## 3 Agriculture & Natural Resources 33955 26321
## 4 Agriculture & Natural Resources 103549 81177
## 5 Agriculture & Natural Resources 24280 17281
## 6 Agriculture & Natural Resources 79409 63043
## 7 Agriculture & Natural Resources 6586 4926
## 8 Agriculture & Natural Resources 8549 6392
## 9 Biology & Life Science 106106 87602
## 10 Agriculture & Natural Resources 69447 48228
## 11 Agriculture & Natural Resources 83188 65937
## 12 Engineering 294692 216770
## 13 Humanities & Liberal Arts 103740 75798
## 14 Communications & Journalism 987676 790696
## 15 Communications & Journalism 418104 314438
## 16 Communications & Journalism 211213 170474
## 17 Communications & Journalism 186829 147433
## 18 Computers & Mathematics 62141 49609
## 19 Computers & Mathematics 253782 218248
## 20 Computers & Mathematics 29317 22828
## 21 Computers & Mathematics 783292 656372
## 22 Computers & Mathematics 77805 66393
## 23 Computers & Mathematics 39362 32366
## 24 Computers & Mathematics 51771 44071
## 25 Industrial Arts & Consumer Services 42325 33388
## 26 Education 1438867 843693
## 27 Education 4037 3113
## 28 Education 2396 1492
## 29 Education 1446701 819393
## 30 Education 68808 47203
## 31 Education 281661 193542
## 32 Education 157079 113460
## 33 Education 56477 36224
## 34 Education 224262 129486
## 35 Education 149689 108272
## 36 Education 127022 78785
## 37 Education 88067 58885
## 38 Education 181445 111347
## 39 Education 231861 155159
## 40 Education 225553 126054
## 41 Engineering 503080 359172
## 42 Engineering 65734 44944
## 43 Engineering 32748 24270
## 44 Engineering 19587 13713
## 45 Engineering 18347 12876
## 46 Engineering 188046 131697
## 47 Engineering 358593 262831
## 48 Engineering 154160 128742
## 49 Engineering 671647 489965
## 50 Engineering 20582 14909
## Employed_full_time_year_round Unemployed Unemployment_rate Median P25th
## 1 74078 2423 0.02614711 50000 34000
## 2 64240 2266 0.02863606 54000 36000
## 3 22810 821 0.03024832 63000 40000
## 4 64937 3619 0.04267890 46000 30000
## 5 12722 894 0.04918845 62000 38500
## 6 51077 2070 0.03179089 50000 35000
## 7 4042 264 0.05086705 63000 39400
## 8 5074 261 0.03923042 52000 35000
## 9 65238 4736 0.05128983 52000 38000
## 10 39613 2144 0.04256333 58000 40500
## 11 50595 3789 0.05434128 52000 37100
## 12 163020 20394 0.08599113 63000 40400
## 13 50530 5525 0.06793896 46000 32000
## 14 595739 54390 0.06436031 50000 35000
## 15 235407 20754 0.06191675 50000 35000
## 16 125489 15431 0.08300476 48000 32000
## 17 111552 10624 0.06721626 50000 34000
## 18 37261 4609 0.08500867 50000 34500
## 19 189950 11945 0.05189124 65000 45000
## 20 18747 2265 0.09026422 60000 40000
## 21 561052 34196 0.04951866 78000 51000
## 22 57604 3704 0.05284106 68000 46200
## 23 28156 2626 0.07504572 55000 40000
## 24 35954 2748 0.05869412 55000 36000
## 25 25780 1941 0.05494070 40000 26200
## 26 591863 38742 0.04390352 43000 32000
## 27 2468 0 0.00000000 58000 44750
## 28 1093 169 0.10174594 41000 33200
## 29 501786 32685 0.03835916 40000 31000
## 30 29494 1610 0.03298302 43000 34000
## 31 136343 9389 0.04626696 48400 34000
## 32 71133 5890 0.04935065 35300 27000
## 33 24817 1596 0.04219989 46000 35000
## 34 88917 5925 0.04375568 45000 34000
## 35 71615 5357 0.04714466 42000 34000
## 36 51632 3800 0.04601320 45000 33000
## 37 37892 2032 0.03335686 40000 30000
## 38 67651 5624 0.04808029 42000 32000
## 39 94756 6629 0.04097337 42600 32000
## 40 91322 5145 0.03921524 50000 35600
## 41 312023 17986 0.04768824 75000 50000
## 42 38491 1969 0.04197131 80000 58000
## 43 18621 1521 0.05897406 62000 40000
## 44 11180 1017 0.06904277 78000 50000
## 45 9202 1105 0.07903583 65000 40000
## 46 109406 6388 0.04626136 86000 60000
## 47 220528 14823 0.05338659 78000 55000
## 48 111025 7456 0.05474383 80000 60000
## 49 422317 26064 0.05050879 88000 60000
## 50 12257 683 0.04380452 65000 45000
## P75th
## 1 80000
## 2 80000
## 3 98000
## 4 72000
## 5 90000
## 6 75000
## 7 88000
## 8 75000
## 9 75000
## 10 80000
## 11 75000
## 12 93500
## 13 71000
## 14 80000
## 15 80000
## 16 70000
## 17 75000
## 18 75000
## 19 90000
## 20 85000
## 21 105000
## 22 95000
## 23 80000
## 24 80000
## 25 60000
## 26 59000
## 27 79000
## 28 50000
## 29 50000
## 30 60000
## 31 66500
## 32 45800
## 33 61000
## 34 60000
## 35 53000
## 36 64000
## 37 51000
## 38 54000
## 39 56000
## 40 71000
## 41 100000
## 42 110000
## 43 91000
## 44 102000
## 45 96000
## 46 120000
## 47 105000
## 48 107000
## 49 116000
## 50 100000
summary(data)
## Major_code Major Major_category Total
## Min. :1100 Length:173 Length:173 Min. : 2396
## 1st Qu.:2403 Class :character Class :character 1st Qu.: 24280
## Median :3608 Mode :character Mode :character Median : 75791
## Mean :3880 Mean : 230257
## 3rd Qu.:5503 3rd Qu.: 205763
## Max. :6403 Max. :3123510
## Employed Employed_full_time_year_round Unemployed
## Min. : 1492 Min. : 1093 Min. : 0
## 1st Qu.: 17281 1st Qu.: 12722 1st Qu.: 1101
## Median : 56564 Median : 39613 Median : 3619
## Mean : 166162 Mean : 126308 Mean : 9725
## 3rd Qu.: 142879 3rd Qu.: 111025 3rd Qu.: 8862
## Max. :2354398 Max. :1939384 Max. :147261
## Unemployment_rate Median P25th P75th
## Min. :0.00000 Min. : 35000 Min. :24900 Min. : 45800
## 1st Qu.:0.04626 1st Qu.: 46000 1st Qu.:32000 1st Qu.: 70000
## Median :0.05472 Median : 53000 Median :36000 Median : 80000
## Mean :0.05736 Mean : 56816 Mean :38697 Mean : 82506
## 3rd Qu.:0.06904 3rd Qu.: 65000 3rd Qu.:42000 3rd Qu.: 95000
## Max. :0.15615 Max. :125000 Max. :78000 Max. :210000
The data consists of 11 different variables with 173 observations. Each observation is a different major, so the data shown consists of 173 different majors. The total number of students in each major can be seen as well as information on employment and the median salary made.
sapply(data, function(x)
sum(is.na(x)))
## Major_code Major
## 0 0
## Major_category Total
## 0 0
## Employed Employed_full_time_year_round
## 0 0
## Unemployed Unemployment_rate
## 0 0
## Median P25th
## 0 0
## P75th
## 0
Here we see that there is no missing data in the dataset.
unique(data$Major_category)
## [1] "Agriculture & Natural Resources" "Biology & Life Science"
## [3] "Engineering" "Humanities & Liberal Arts"
## [5] "Communications & Journalism" "Computers & Mathematics"
## [7] "Industrial Arts & Consumer Services" "Education"
## [9] "Law & Public Policy" "Interdisciplinary"
## [11] "Health" "Social Science"
## [13] "Physical Sciences" "Psychology & Social Work"
## [15] "Arts" "Business"
There are 16 different categories that the majors are split into. This is useful to help distinguish a major based on the subject it falls under.
data %>%
select(Major, Total) %>%
arrange(desc(Total)) %>%
group_by(Major)
## # A tibble: 173 x 2
## # Groups: Major [173]
## Major Total
## <chr> <int>
## 1 BUSINESS MANAGEMENT AND ADMINISTRATION 3123510
## 2 GENERAL BUSINESS 2148712
## 3 ACCOUNTING 1779219
## 4 NURSING 1769892
## 5 PSYCHOLOGY 1484075
## 6 ELEMENTARY EDUCATION 1446701
## 7 GENERAL EDUCATION 1438867
## 8 MARKETING AND MARKETING RESEARCH 1114624
## 9 ENGLISH LANGUAGE AND LITERATURE 1098647
## 10 COMMUNICATIONS 987676
## # ... with 163 more rows
The top 5 majors with the most students are Business Management and Administration, General Business, Accounting, Nursing, and Psychology.
data %>%
select(Major, Employed) %>%
arrange(desc(Employed)) %>%
group_by(Major)
## # A tibble: 173 x 2
## # Groups: Major [173]
## Major Employed
## <chr> <int>
## 1 BUSINESS MANAGEMENT AND ADMINISTRATION 2354398
## 2 GENERAL BUSINESS 1580978
## 3 ACCOUNTING 1335825
## 4 NURSING 1325711
## 5 PSYCHOLOGY 1055854
## 6 MARKETING AND MARKETING RESEARCH 890125
## 7 GENERAL EDUCATION 843693
## 8 ELEMENTARY EDUCATION 819393
## 9 COMMUNICATIONS 790696
## 10 ENGLISH LANGUAGE AND LITERATURE 708882
## # ... with 163 more rows
data %>%
select(Major, Unemployed) %>%
arrange(desc(Unemployed)) %>%
group_by(Major)
## # A tibble: 173 x 2
## # Groups: Major [173]
## Major Unemployed
## <chr> <int>
## 1 BUSINESS MANAGEMENT AND ADMINISTRATION 147261
## 2 GENERAL BUSINESS 85626
## 3 PSYCHOLOGY 79066
## 4 ACCOUNTING 75379
## 5 COMMUNICATIONS 54390
## 6 ENGLISH LANGUAGE AND LITERATURE 52248
## 7 MARKETING AND MARKETING RESEARCH 51839
## 8 POLITICAL SCIENCE AND GOVERNMENT 40376
## 9 GENERAL EDUCATION 38742
## 10 BIOLOGY 36757
## # ... with 163 more rows
The top 5 majors with the most employment are Business Management and Administration, General Business, Accounting, Nursing, and Psychology. However the top 5 majors with the most unemployment are Business Management and Administration, General Business, Psychology, Accounting, and Communications. Both are slightly different but are close to the top majors with the most students.
data %>%
select(Major, Median) %>%
arrange(desc(Median)) %>%
group_by(Major)
## # A tibble: 173 x 2
## # Groups: Major [173]
## Major Median
## <chr> <int>
## 1 PETROLEUM ENGINEERING 125000
## 2 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION 106000
## 3 NAVAL ARCHITECTURE AND MARINE ENGINEERING 97000
## 4 METALLURGICAL ENGINEERING 96000
## 5 NUCLEAR ENGINEERING 95000
## 6 MINING AND MINERAL ENGINEERING 92000
## 7 MATHEMATICS AND COMPUTER SCIENCE 92000
## 8 ELECTRICAL ENGINEERING 88000
## 9 CHEMICAL ENGINEERING 86000
## 10 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 85000
## # ... with 163 more rows
The top 5 majors with the most median salary are Petroleum Engineering, Pharmacy Pharmaceutical Sciences and Administration, Naval Architecture and Marine Engineering, Metallurgical Engineering, and Nuclear Engineering.
data %>%
select(Major, Employed, Unemployed, Median) %>%
arrange(desc(Employed)) %>%
group_by(Major)
## # A tibble: 173 x 4
## # Groups: Major [173]
## Major Employed Unemployed Median
## <chr> <int> <int> <int>
## 1 BUSINESS MANAGEMENT AND ADMINISTRATION 2354398 147261 58000
## 2 GENERAL BUSINESS 1580978 85626 60000
## 3 ACCOUNTING 1335825 75379 65000
## 4 NURSING 1325711 36503 62000
## 5 PSYCHOLOGY 1055854 79066 45000
## 6 MARKETING AND MARKETING RESEARCH 890125 51839 56000
## 7 GENERAL EDUCATION 843693 38742 43000
## 8 ELEMENTARY EDUCATION 819393 32685 40000
## 9 COMMUNICATIONS 790696 54390 50000
## 10 ENGLISH LANGUAGE AND LITERATURE 708882 52248 50000
## # ... with 163 more rows
data %>%
select(Major, Employed, Unemployed, Median) %>%
arrange(desc(Median)) %>%
group_by(Major)
## # A tibble: 173 x 4
## # Groups: Major [173]
## Major Employed Unemployed Median
## <chr> <int> <int> <int>
## 1 PETROLEUM ENGINEERING 14002 617 125000
## 2 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATI~ 124058 4414 106000
## 3 NAVAL ARCHITECTURE AND MARINE ENGINEERING 10690 449 97000
## 4 METALLURGICAL ENGINEERING 6939 326 96000
## 5 NUCLEAR ENGINEERING 7320 527 95000
## 6 MINING AND MINERAL ENGINEERING 7416 366 92000
## 7 MATHEMATICS AND COMPUTER SCIENCE 5874 150 92000
## 8 ELECTRICAL ENGINEERING 489965 26064 88000
## 9 CHEMICAL ENGINEERING 131697 6388 86000
## 10 GEOLOGICAL AND GEOPHYSICAL ENGINEERING 4120 0 85000
## # ... with 163 more rows
Here we compare the employment of a major to the median salary. Above we saw that the majors with the most employment were different from the majors with the most median salary. This tells us that there is no relationship between employment and median salary of a major.
A shiny app was created to visualize the dataset. The shiny app can be found here: https://bpersaud104.shinyapps.io/Data608FinalProject/
The visualization was made while taking into consideration the dataset. The dataset was taken from FiveThirtyEight and the data is from the American Community Survey 2010-2012 Public Use Microdata Series. The data consists of 173 different majors split into 16 different categories.There are also other information such as the total students in the major, the employment by major, and the median salary.
The shiny app consists of three different graphs, one graph to show the total students in a major, one graph to show the median salary of a major, and one graph to show employment vs unemployment of a major. The first two graphs were designed in a way where you select from the different major categories to help narrow down the different majors by the subject of the major. The third graph you select from a list of all the majors to see the employment vs unemployment. I think this is important because the shiny app was made in a way where a student can explore the data to see important information on the different majors. Whether the student knows what major they want to pursue or is undecided, this can help the student in picking a major that will help them to get a job.