library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 2.0.1 v dplyr 0.8.0.1
## v tidyr 0.8.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts -------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(knitr)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Choice of major in graduation has always been a major decision in every student’s life. Having a good estimate about the job and salary prospects in a major in future defintely helps students in making an informative decision about his/her grad choice. We have seen numerous articles advising which course to choose and which one would be in demand with prospective employers. ‘STEM’ which stands for Science Technology Engineering Mathematics, and has been highlighted to be a great choice in majors to find high paying jobs. In this project, I’m researching to find whether the choice of STEM major in graduation has an impact on a graduate salary?
Data collection:The recent grad list compiled by census.gov has provided us with a list of compiled data of grads during 2010-2012. There are observations made against 173 majors. Cases: Each observation represents a major with data mentioning the number of grads who passed, their employement rate, salary ranges etc. There are totally 173 observations which each observation representing a major. Variables:We will be studying two variables, the median salary which is a response variable and course major which is an independent variable also an explantory variable. This is an observation study since it’s derived from a data which is taken from an observation during a period of time. The population of interest is the graduates of United States. And the findings from this analysis can be generalized for the entire population of interest i.e., the graduates. Causality: We can use this data to establish a causality between the variables of interest. Since the data has been sampled randomly from a large population of data source.
Source: https://github.com/fivethirtyeight/data/tree/master/college-majors
grads <- read.csv("https://raw.githubusercontent.com/san123i/CUNY/master/Semester1/606/Final%20Project/grad-students.csv")
head(grads)%>%
kable() %>%
kable_styling() %>% scroll_box(width = "800px")
| Major_code | Major | Major_category | Grad_total | Grad_sample_size | Grad_employed | Grad_full_time_year_round | Grad_unemployed | Grad_unemployment_rate | Grad_median | Grad_P25 | Grad_P75 | Nongrad_total | Nongrad_employed | Nongrad_full_time_year_round | Nongrad_unemployed | Nongrad_unemployment_rate | Nongrad_median | Nongrad_P25 | Nongrad_P75 | Grad_share | Grad_premium |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5601 | CONSTRUCTION SERVICES | Industrial Arts & Consumer Services | 9173 | 200 | 7098 | 6511 | 681 | 0.0875434 | 75000 | 53000 | 110000 | 86062 | 73607 | 62435 | 3928 | 0.0506610 | 65000 | 47000 | 98000 | 0.0963196 | 0.1538462 |
| 6004 | COMMERCIAL ART AND GRAPHIC DESIGN | Arts | 53864 | 882 | 40492 | 29553 | 2482 | 0.0577559 | 60000 | 40000 | 89000 | 461977 | 347166 | 250596 | 25484 | 0.0683859 | 48000 | 34000 | 71000 | 0.1044198 | 0.2500000 |
| 6211 | HOSPITALITY MANAGEMENT | Business | 24417 | 437 | 18368 | 14784 | 1465 | 0.0738668 | 65000 | 45000 | 100000 | 179335 | 145597 | 113579 | 7409 | 0.0484229 | 50000 | 35000 | 75000 | 0.1198369 | 0.3000000 |
| 2201 | COSMETOLOGY SERVICES AND CULINARY ARTS | Industrial Arts & Consumer Services | 5411 | 72 | 3590 | 2701 | 316 | 0.0809012 | 47000 | 24500 | 85000 | 37575 | 29738 | 23249 | 1661 | 0.0528998 | 41600 | 29000 | 60000 | 0.1258782 | 0.1298077 |
| 2001 | COMMUNICATION TECHNOLOGIES | Computers & Mathematics | 9109 | 171 | 7512 | 5622 | 466 | 0.0584106 | 57000 | 40600 | 83700 | 53819 | 43163 | 34231 | 3389 | 0.0728003 | 52000 | 36000 | 78000 | 0.1447527 | 0.0961538 |
| 3201 | COURT REPORTING | Law & Public Policy | 1542 | 22 | 1008 | 860 | 0 | 0.0000000 | 75000 | 55000 | 120000 | 8921 | 6967 | 6063 | 518 | 0.0692051 | 50000 | 34000 | 75000 | 0.1473765 | 0.5000000 |
summary(grads)
## Major_code Major
## Min. :1100 ACCOUNTING : 1
## 1st Qu.:2403 ACTUARIAL SCIENCE : 1
## Median :3608 ADVERTISING AND PUBLIC RELATIONS : 1
## Mean :3880 AEROSPACE ENGINEERING : 1
## 3rd Qu.:5503 AGRICULTURAL ECONOMICS : 1
## Max. :6403 AGRICULTURE PRODUCTION AND MANAGEMENT: 1
## (Other) :167
## Major_category Grad_total Grad_sample_size
## Engineering :29 Min. : 1542 Min. : 22
## Education :16 1st Qu.: 15284 1st Qu.: 314
## Humanities & Liberal Arts:15 Median : 37872 Median : 688
## Biology & Life Science :14 Mean : 127672 Mean : 2251
## Business :13 3rd Qu.: 148255 3rd Qu.: 2528
## Health :12 Max. :1184158 Max. :21994
## (Other) :74
## Grad_employed Grad_full_time_year_round Grad_unemployed
## Min. : 1008 Min. : 770 Min. : 0
## 1st Qu.: 12659 1st Qu.: 9894 1st Qu.: 453
## Median : 28930 Median : 22523 Median : 1179
## Mean : 94037 Mean : 72861 Mean : 3506
## 3rd Qu.:109944 3rd Qu.: 80794 3rd Qu.: 3329
## Max. :915341 Max. :703347 Max. :35718
##
## Grad_unemployment_rate Grad_median Grad_P25 Grad_P75
## Min. :0.00000 Min. : 47000 Min. :24500 Min. : 65000
## 1st Qu.:0.02607 1st Qu.: 65000 1st Qu.:45000 1st Qu.: 93000
## Median :0.03665 Median : 75000 Median :50000 Median :108000
## Mean :0.03934 Mean : 76756 Mean :52597 Mean :112087
## 3rd Qu.:0.04805 3rd Qu.: 90000 3rd Qu.:60000 3rd Qu.:130000
## Max. :0.13851 Max. :135000 Max. :85000 Max. :294000
##
## Nongrad_total Nongrad_employed Nongrad_full_time_year_round
## Min. : 2232 Min. : 1328 Min. : 980
## 1st Qu.: 20564 1st Qu.: 15914 1st Qu.: 11755
## Median : 68993 Median : 50092 Median : 38384
## Mean : 214720 Mean : 154554 Mean : 120737
## 3rd Qu.: 184971 3rd Qu.: 129179 3rd Qu.: 103629
## Max. :2996892 Max. :2253649 Max. :1882507
##
## Nongrad_unemployed Nongrad_unemployment_rate Nongrad_median
## Min. : 0 Min. :0.00000 Min. : 37000
## 1st Qu.: 880 1st Qu.:0.04198 1st Qu.: 48700
## Median : 3157 Median :0.05103 Median : 55000
## Mean : 8486 Mean :0.05395 Mean : 58584
## 3rd Qu.: 7409 3rd Qu.:0.06439 3rd Qu.: 65000
## Max. :136978 Max. :0.16091 Max. :126000
##
## Nongrad_P25 Nongrad_P75 Grad_share Grad_premium
## Min. :25000 Min. : 48000 Min. :0.09632 Min. :-0.0250
## 1st Qu.:34000 1st Qu.: 72000 1st Qu.:0.26757 1st Qu.: 0.2308
## Median :38000 Median : 80000 Median :0.39875 Median : 0.3208
## Mean :40078 Mean : 84333 Mean :0.40059 Mean : 0.3285
## 3rd Qu.:44000 3rd Qu.: 97000 3rd Qu.:0.49912 3rd Qu.: 0.4000
## Max. :80000 Max. :215000 Max. :0.93117 Max. : 1.6471
##
isSTEM <- function(majorCategory){
result <- grepl("science|math|engineering|technology|computer", majorCategory, ignore.case = T)
result <- ifelse(result, !grepl("Social", majorCategory, ignore.case = T), result)
return(ifelse(result, "STEM","NON-STEM"));
}
grads <- grads %>% mutate("Major_Type_STEM"=isSTEM(Major_category))
x <- grads %>% select(Major, Major_category, Major_Type_STEM)
#x[order(x$Major_Type_STEM),]
x[order(x$Major_Type_STEM),] %>%
kable() %>%
kable_styling()
| Major | Major_category | Major_Type_STEM | |
|---|---|---|---|
| 1 | CONSTRUCTION SERVICES | Industrial Arts & Consumer Services | NON-STEM |
| 2 | COMMERCIAL ART AND GRAPHIC DESIGN | Arts | NON-STEM |
| 3 | HOSPITALITY MANAGEMENT | Business | NON-STEM |
| 4 | COSMETOLOGY SERVICES AND CULINARY ARTS | Industrial Arts & Consumer Services | NON-STEM |
| 6 | COURT REPORTING | Law & Public Policy | NON-STEM |
| 7 | MARKETING AND MARKETING RESEARCH | Business | NON-STEM |
| 8 | AGRICULTURE PRODUCTION AND MANAGEMENT | Agriculture & Natural Resources | NON-STEM |
| 10 | ADVERTISING AND PUBLIC RELATIONS | Communications & Journalism | NON-STEM |
| 11 | FILM VIDEO AND PHOTOGRAPHIC ARTS | Arts | NON-STEM |
| 12 | ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION | Industrial Arts & Consumer Services | NON-STEM |
| 14 | MASS MEDIA | Communications & Journalism | NON-STEM |
| 15 | TRANSPORTATION SCIENCES AND TECHNOLOGIES | Industrial Arts & Consumer Services | NON-STEM |
| 17 | MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION | Business | NON-STEM |
| 20 | MISCELLANEOUS FINE ARTS | Arts | NON-STEM |
| 21 | CRIMINAL JUSTICE AND FIRE PROTECTION | Law & Public Policy | NON-STEM |
| 22 | BUSINESS MANAGEMENT AND ADMINISTRATION | Business | NON-STEM |
| 23 | CRIMINOLOGY | Social Science | NON-STEM |
| 24 | MANAGEMENT INFORMATION SYSTEMS AND STATISTICS | Business | NON-STEM |
| 26 | OPERATIONS LOGISTICS AND E-COMMERCE | Business | NON-STEM |
| 27 | GENERAL BUSINESS | Business | NON-STEM |
| 28 | MEDICAL TECHNOLOGIES TECHNICIANS | Health | NON-STEM |
| 30 | COMMUNICATIONS | Communications & Journalism | NON-STEM |
| 31 | ACTUARIAL SCIENCE | Business | NON-STEM |
| 33 | JOURNALISM | Communications & Journalism | NON-STEM |
| 34 | MEDICAL ASSISTING SERVICES | Health | NON-STEM |
| 36 | ACCOUNTING | Business | NON-STEM |
| 37 | FINE ARTS | Arts | NON-STEM |
| 38 | NURSING | Health | NON-STEM |
| 41 | MULTI/INTERDISCIPLINARY STUDIES | Interdisciplinary | NON-STEM |
| 43 | GENERAL AGRICULTURE | Agriculture & Natural Resources | NON-STEM |
| 44 | FORESTRY | Agriculture & Natural Resources | NON-STEM |
| 45 | LIBERAL ARTS | Humanities & Liberal Arts | NON-STEM |
| 46 | HUMAN SERVICES AND COMMUNITY ORGANIZATION | Psychology & Social Work | NON-STEM |
| 47 | VISUAL AND PERFORMING ARTS | Arts | NON-STEM |
| 48 | NATURAL RESOURCES MANAGEMENT | Agriculture & Natural Resources | NON-STEM |
| 49 | STUDIO ARTS | Arts | NON-STEM |
| 50 | FAMILY AND CONSUMER SCIENCES | Industrial Arts & Consumer Services | NON-STEM |
| 51 | PHYSICAL FITNESS PARKS RECREATION AND LEISURE | Industrial Arts & Consumer Services | NON-STEM |
| 52 | FINANCE | Business | NON-STEM |
| 54 | PLANT SCIENCE AND AGRONOMY | Agriculture & Natural Resources | NON-STEM |
| 55 | HUMAN RESOURCES AND PERSONNEL MANAGEMENT | Business | NON-STEM |
| 56 | INTERNATIONAL BUSINESS | Business | NON-STEM |
| 57 | COMPOSITION AND RHETORIC | Humanities & Liberal Arts | NON-STEM |
| 58 | DRAMA AND THEATER ARTS | Arts | NON-STEM |
| 59 | BUSINESS ECONOMICS | Business | NON-STEM |
| 62 | HEALTH AND MEDICAL ADMINISTRATIVE SERVICES | Health | NON-STEM |
| 63 | AGRICULTURAL ECONOMICS | Agriculture & Natural Resources | NON-STEM |
| 65 | GEOGRAPHY | Social Science | NON-STEM |
| 68 | INTERDISCIPLINARY SOCIAL SCIENCES | Social Science | NON-STEM |
| 70 | SOIL SCIENCE | Agriculture & Natural Resources | NON-STEM |
| 71 | PRE-LAW AND LEGAL STUDIES | Law & Public Policy | NON-STEM |
| 77 | EARLY CHILDHOOD EDUCATION | Education | NON-STEM |
| 78 | SOCIOLOGY | Social Science | NON-STEM |
| 79 | GENERAL SOCIAL SCIENCES | Social Science | NON-STEM |
| 80 | ANIMAL SCIENCES | Agriculture & Natural Resources | NON-STEM |
| 81 | TREATMENT THERAPY PROFESSIONS | Health | NON-STEM |
| 82 | MISCELLANEOUS AGRICULTURE | Agriculture & Natural Resources | NON-STEM |
| 84 | HUMANITIES | Humanities & Liberal Arts | NON-STEM |
| 85 | FOOD SCIENCE | Agriculture & Natural Resources | NON-STEM |
| 88 | SOCIAL PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 91 | ART HISTORY AND CRITICISM | Humanities & Liberal Arts | NON-STEM |
| 92 | MISCELLANEOUS HEALTH MEDICAL PROFESSIONS | Health | NON-STEM |
| 93 | GENERAL MEDICAL AND HEALTH SERVICES | Health | NON-STEM |
| 94 | INTERCULTURAL AND INTERNATIONAL STUDIES | Humanities & Liberal Arts | NON-STEM |
| 95 | NUTRITION SCIENCES | Health | NON-STEM |
| 96 | ECONOMICS | Social Science | NON-STEM |
| 97 | PHYSICAL AND HEALTH EDUCATION TEACHING | Education | NON-STEM |
| 98 | COMMUNITY AND PUBLIC HEALTH | Health | NON-STEM |
| 100 | THEOLOGY AND RELIGIOUS VOCATIONS | Humanities & Liberal Arts | NON-STEM |
| 102 | MISCELLANEOUS EDUCATION | Education | NON-STEM |
| 104 | PUBLIC ADMINISTRATION | Law & Public Policy | NON-STEM |
| 105 | ELEMENTARY EDUCATION | Education | NON-STEM |
| 106 | INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 107 | MILITARY TECHNOLOGIES | Industrial Arts & Consumer Services | NON-STEM |
| 108 | GENERAL EDUCATION | Education | NON-STEM |
| 109 | MUSIC | Arts | NON-STEM |
| 110 | ART AND MUSIC EDUCATION | Education | NON-STEM |
| 111 | LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE | Humanities & Liberal Arts | NON-STEM |
| 113 | ANTHROPOLOGY AND ARCHEOLOGY | Humanities & Liberal Arts | NON-STEM |
| 114 | SOCIAL WORK | Psychology & Social Work | NON-STEM |
| 115 | ENGLISH LANGUAGE AND LITERATURE | Humanities & Liberal Arts | NON-STEM |
| 116 | TEACHER EDUCATION: MULTIPLE LEVELS | Education | NON-STEM |
| 118 | PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION | Health | NON-STEM |
| 119 | OTHER FOREIGN LANGUAGES | Humanities & Liberal Arts | NON-STEM |
| 120 | PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 121 | AREA ETHNIC AND CIVILIZATION STUDIES | Humanities & Liberal Arts | NON-STEM |
| 126 | HISTORY | Humanities & Liberal Arts | NON-STEM |
| 127 | MISCELLANEOUS SOCIAL SCIENCES | Social Science | NON-STEM |
| 130 | FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES | Humanities & Liberal Arts | NON-STEM |
| 131 | SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION | Education | NON-STEM |
| 133 | POLITICAL SCIENCE AND GOVERNMENT | Social Science | NON-STEM |
| 134 | INTERNATIONAL RELATIONS | Social Science | NON-STEM |
| 137 | MISCELLANEOUS PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 139 | SECONDARY TEACHER EDUCATION | Education | NON-STEM |
| 141 | UNITED STATES HISTORY | Humanities & Liberal Arts | NON-STEM |
| 144 | LANGUAGE AND DRAMA EDUCATION | Education | NON-STEM |
| 146 | PUBLIC POLICY | Law & Public Policy | NON-STEM |
| 147 | MATHEMATICS TEACHER EDUCATION | Education | NON-STEM |
| 148 | SCIENCE AND COMPUTER TEACHER EDUCATION | Education | NON-STEM |
| 150 | PHILOSOPHY AND RELIGIOUS STUDIES | Humanities & Liberal Arts | NON-STEM |
| 151 | SPECIAL NEEDS EDUCATION | Education | NON-STEM |
| 158 | LIBRARY SCIENCE | Education | NON-STEM |
| 164 | EDUCATIONAL PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 168 | COMMUNICATION DISORDERS SCIENCES AND SERVICES | Health | NON-STEM |
| 169 | COUNSELING PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 170 | CLINICAL PSYCHOLOGY | Psychology & Social Work | NON-STEM |
| 171 | HEALTH AND MEDICAL PREPARATORY PROGRAMS | Health | NON-STEM |
| 172 | SCHOOL STUDENT COUNSELING | Education | NON-STEM |
| 173 | EDUCATIONAL ADMINISTRATION AND SUPERVISION | Education | NON-STEM |
| 5 | COMMUNICATION TECHNOLOGIES | Computers & Mathematics | STEM |
| 9 | COMPUTER PROGRAMMING AND DATA PROCESSING | Computers & Mathematics | STEM |
| 13 | MECHANICAL ENGINEERING RELATED TECHNOLOGIES | Engineering | STEM |
| 16 | COMPUTER NETWORKING AND TELECOMMUNICATIONS | Computers & Mathematics | STEM |
| 18 | MISCELLANEOUS ENGINEERING TECHNOLOGIES | Engineering | STEM |
| 19 | INDUSTRIAL PRODUCTION TECHNOLOGIES | Engineering | STEM |
| 25 | COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY | Computers & Mathematics | STEM |
| 29 | COMPUTER AND INFORMATION SYSTEMS | Computers & Mathematics | STEM |
| 32 | ELECTRICAL ENGINEERING TECHNOLOGY | Engineering | STEM |
| 35 | ENGINEERING TECHNOLOGIES | Engineering | STEM |
| 39 | INFORMATION SCIENCES | Computers & Mathematics | STEM |
| 40 | ARCHITECTURAL ENGINEERING | Engineering | STEM |
| 42 | NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES | Physical Sciences | STEM |
| 53 | PETROLEUM ENGINEERING | Engineering | STEM |
| 60 | ENGINEERING AND INDUSTRIAL MANAGEMENT | Engineering | STEM |
| 61 | COMPUTER SCIENCE | Computers & Mathematics | STEM |
| 64 | ENVIRONMENTAL SCIENCE | Biology & Life Science | STEM |
| 66 | MISCELLANEOUS ENGINEERING | Engineering | STEM |
| 67 | ECOLOGY | Biology & Life Science | STEM |
| 69 | ARCHITECTURE | Engineering | STEM |
| 72 | GENERAL ENGINEERING | Engineering | STEM |
| 73 | MULTI-DISCIPLINARY OR GENERAL SCIENCE | Physical Sciences | STEM |
| 74 | CIVIL ENGINEERING | Engineering | STEM |
| 75 | COMPUTER ENGINEERING | Engineering | STEM |
| 76 | MINING AND MINERAL ENGINEERING | Engineering | STEM |
| 83 | MECHANICAL ENGINEERING | Engineering | STEM |
| 86 | INDUSTRIAL AND MANUFACTURING ENGINEERING | Engineering | STEM |
| 87 | GEOLOGICAL AND GEOPHYSICAL ENGINEERING | Engineering | STEM |
| 89 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | Engineering | STEM |
| 90 | MATHEMATICS AND COMPUTER SCIENCE | Computers & Mathematics | STEM |
| 99 | ELECTRICAL ENGINEERING | Engineering | STEM |
| 101 | OCEANOGRAPHY | Physical Sciences | STEM |
| 103 | BIOLOGICAL ENGINEERING | Engineering | STEM |
| 112 | MATERIALS ENGINEERING AND MATERIALS SCIENCE | Engineering | STEM |
| 117 | GEOLOGY AND EARTH SCIENCE | Physical Sciences | STEM |
| 122 | PHYSICAL SCIENCES | Physical Sciences | STEM |
| 123 | ATMOSPHERIC SCIENCES AND METEOROLOGY | Physical Sciences | STEM |
| 124 | CHEMICAL ENGINEERING | Engineering | STEM |
| 125 | AEROSPACE ENGINEERING | Engineering | STEM |
| 128 | APPLIED MATHEMATICS | Computers & Mathematics | STEM |
| 129 | STATISTICS AND DECISION SCIENCE | Computers & Mathematics | STEM |
| 132 | MATHEMATICS | Computers & Mathematics | STEM |
| 135 | ENVIRONMENTAL ENGINEERING | Engineering | STEM |
| 136 | MISCELLANEOUS BIOLOGY | Biology & Life Science | STEM |
| 138 | METALLURGICAL ENGINEERING | Engineering | STEM |
| 140 | GEOSCIENCES | Physical Sciences | STEM |
| 142 | ENGINEERING MECHANICS PHYSICS AND SCIENCE | Engineering | STEM |
| 143 | COGNITIVE SCIENCE AND BIOPSYCHOLOGY | Biology & Life Science | STEM |
| 145 | NUCLEAR ENGINEERING | Engineering | STEM |
| 149 | MICROBIOLOGY | Biology & Life Science | STEM |
| 152 | BOTANY | Biology & Life Science | STEM |
| 153 | BIOLOGY | Biology & Life Science | STEM |
| 154 | ASTRONOMY AND ASTROPHYSICS | Physical Sciences | STEM |
| 155 | CHEMISTRY | Physical Sciences | STEM |
| 156 | PHYSIOLOGY | Biology & Life Science | STEM |
| 157 | BIOMEDICAL ENGINEERING | Engineering | STEM |
| 159 | MOLECULAR BIOLOGY | Biology & Life Science | STEM |
| 160 | PHARMACOLOGY | Biology & Life Science | STEM |
| 161 | ZOOLOGY | Biology & Life Science | STEM |
| 162 | PHYSICS | Physical Sciences | STEM |
| 163 | NEUROSCIENCE | Biology & Life Science | STEM |
| 165 | BIOCHEMICAL SCIENCES | Biology & Life Science | STEM |
| 166 | GENETICS | Biology & Life Science | STEM |
| 167 | MATERIALS SCIENCE | Engineering | STEM |
The below plot shows that majority of the graduates fall under ‘Education’ major type where as the least are under ‘interdiscriplinary’ and ‘Agriculture & Natural Resources’ category. Also, there are only 4 major categories which fall under STEM i.e. ‘Biology & Life Sciences’, ‘Computers & Mathematics’, ‘Engineering’ and ‘Physical Sciences’.
Let’s do an analysis on the number of graduates in each major.
ggplot(grads, aes(x=reorder(Major_category, Grad_total), y = Grad_total, fill=Major_Type_STEM)) + geom_bar(stat="identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs( title = 'Graduates by major category type')
list <- aggregate(Grad_total ~ Major_category, grads, sum)
ggplot(list, aes(x=reorder(Major_category, Grad_total), y=Grad_total, label=Grad_total, label.size=.25)) + geom_col() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs(title = '# of Graduates by major category type') +
geom_text(size = 3, position = position_stack(vjust = 0.5), color='white')
list <- aggregate(Grad_total ~ Major_Type_STEM, grads, sum)
ggplot(list, aes(x=reorder(Major_Type_STEM, Grad_total), y=Grad_total, label=Grad_total, fill=Major_Type_STEM)) + geom_col() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+ labs(title = '# of Graduates by STEM type') +
geom_text(size = 3, position = position_stack(vjust = 0.5), color='white')
# label=sum(Grad_total)
The below plot shows the grad median salary in each major category.
ggplot(grads, aes(Major_category, Grad_median, fill=Major_Type_STEM)) + geom_boxplot() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
ggplot(grads, aes(x=Major, y=Grad_median, shape=Major_Type_STEM, colour=Major_Type_STEM)) + geom_point() + scale_shape_manual(values=c(19, 2))
ggplot(grads, aes(x=reorder(Major_category, Grad_P25), y = Grad_P25, fill=Major_Type_STEM)) + geom_bar(stat="identity") +labs(title = 'a. 25Perc of Grad salary by Major_category') + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
ggplot(grads, aes(x=reorder(Major_category, Grad_P75), y = Grad_P75, fill=Major_Type_STEM)) + geom_bar(stat="identity") +labs(title = 'b. 75Perc of Grad salary by Major_category') + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
ggplot(grads, aes(x=reorder(Major_category, Grad_median), y = Grad_median, fill=Major_Type_STEM)) + geom_bar(stat="identity") +labs(title = 'c. Median of Grad salary by Major_category') + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
ggplot(grads, aes(Major_Type_STEM, Grad_median, fill=Major_Type_STEM)) + geom_boxplot() + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
We are using Analysis of Variance (ANOVA) method to dervie an inference from the Hypothesis. Below are the conditions we evaluated and found to be satisfied. a. Independence of cases - The cases provided as per data are independent in nature b. Distribution of Residuals are Normal - The below qqnorm graph displays that residuals are normally distributed without major outliers c. Homogentiy of variances
Let’s start our inference with a Null Hypothesis test H0 <- The median salary for an average graduate in STEM and Non-STEM categories are same HA <- The median salary for an average graduate in STEM and non-STEM categories are not same.
grads_anova <- aov(grads$Grad_median ~ grads$Major_Type_STEM)
summary(grads_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## grads$Major_Type_STEM 1 1.644e+10 1.644e+10 85.8 <2e-16 ***
## Residuals 171 3.276e+10 1.916e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(grads_anova)
Since the P value is less than .05, we reject the null hypothesis (H0), and can conclude that average graduate salary for STEM and non-STEM category students is not equal. And from the geom plot, we can identify that Median salary from STEM category students is significantly higher than non-STEM categories.
Using the given data, we were able to do an exploratory analysis and identify a clear pattern/trend that STEM course major students have a median salary higher than non-grad students. And using the ANOVA analysis, we were able to confirm that grad median salaries for both STEM and Non-STEM grad student are NOT equals. Therefore, we can clearly state that grad median salaries are higher for STEM students. Future research: I believe there are few other factors which can influence the grad salary such as market conditions, number of open job opportunities, type of economies i.e., a developed, emerging or under developed. If I can get access to these datasets, then an even more realistic and robust model can be designed.