The goal of this assignment is to practice Loading Data into a Data Frame, study the dataset and the associated description of the data. The data was available from fivethirtyeight.com datasets. the selected data is behind an article about Picking A College Major. All data is from American Community Survey 2010-2012 Public Use Microdata Series. , it was uploaded to github.
First we will load the data from Github
CollegeMajors <- read.csv("https://raw.githubusercontent.com/aaitelmouden/DATA607S2020/master/Week1/recent-grads.csv")
Obtain the first several rows : Let’s view the dataset before we start transforming it:
head(CollegeMajors)
# Retrieve the dimension of the Data.
dim(CollegeMajors)
## [1] 173 21
# Column Names in Our Data Frame: names column names are meaningful, so no replacment is needed. instead headers for recent-grads.csv are shown below:
| Header | Description |
|---|---|
| Major_code | Rank by median earnings |
| Rank | Major code, FO1DP in ACS PUMS |
| Major | Major description |
| Major_category | Category of major from Carnevale et al |
| Total | Total number of people with major |
| Sample_size | Sample size (unweighted) of full-time, year-round ONLY (used for earnings) |
| Men | Male graduates |
| Women | Female graduates |
| ShareWomen | Women as share of total |
| Employed | Number employed (ESR == 1 or 2) |
| Full_time | Employed 35 hours or more |
| Part_time | Employed less than 35 hours |
| Full_time_year_round | Employed at least 50 weeks (WKW == 1) and at least 35 hours (WKHP >= 35) |
| Unemployed | Number unemployed (ESR == 3) |
| Unemployment_rate | Unemployed / (Unemployed + Employed) |
| Median | Median earnings of full-time, year-round workers |
| P25th | 25th percentile of earnigns |
| P75th | 75th percentile of earnings |
| College_jobs | Number with job requiring a college degree |
| Non_college_jobs | Number with job not requiring a college degree |
| Low_wage_jobs | Number in low-wage service jobs |
MajorEmploy <- subset(CollegeMajors, select = c(Major, Employed, Unemployed, Median))
head(MajorEmploy)
# We'll use the level function to have access to the available Majors.
levels(CollegeMajors$Major)
## [1] "ACCOUNTING"
## [2] "ACTUARIAL SCIENCE"
## [3] "ADVERTISING AND PUBLIC RELATIONS"
## [4] "AEROSPACE ENGINEERING"
## [5] "AGRICULTURAL ECONOMICS"
## [6] "AGRICULTURE PRODUCTION AND MANAGEMENT"
## [7] "ANIMAL SCIENCES"
## [8] "ANTHROPOLOGY AND ARCHEOLOGY"
## [9] "APPLIED MATHEMATICS"
## [10] "ARCHITECTURAL ENGINEERING"
## [11] "ARCHITECTURE"
## [12] "AREA ETHNIC AND CIVILIZATION STUDIES"
## [13] "ART AND MUSIC EDUCATION"
## [14] "ART HISTORY AND CRITICISM"
## [15] "ASTRONOMY AND ASTROPHYSICS"
## [16] "ATMOSPHERIC SCIENCES AND METEOROLOGY"
## [17] "BIOCHEMICAL SCIENCES"
## [18] "BIOLOGICAL ENGINEERING"
## [19] "BIOLOGY"
## [20] "BIOMEDICAL ENGINEERING"
## [21] "BOTANY"
## [22] "BUSINESS ECONOMICS"
## [23] "BUSINESS MANAGEMENT AND ADMINISTRATION"
## [24] "CHEMICAL ENGINEERING"
## [25] "CHEMISTRY"
## [26] "CIVIL ENGINEERING"
## [27] "CLINICAL PSYCHOLOGY"
## [28] "COGNITIVE SCIENCE AND BIOPSYCHOLOGY"
## [29] "COMMERCIAL ART AND GRAPHIC DESIGN"
## [30] "COMMUNICATION DISORDERS SCIENCES AND SERVICES"
## [31] "COMMUNICATION TECHNOLOGIES"
## [32] "COMMUNICATIONS"
## [33] "COMMUNITY AND PUBLIC HEALTH"
## [34] "COMPOSITION AND RHETORIC"
## [35] "COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY"
## [36] "COMPUTER AND INFORMATION SYSTEMS"
## [37] "COMPUTER ENGINEERING"
## [38] "COMPUTER NETWORKING AND TELECOMMUNICATIONS"
## [39] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [40] "COMPUTER SCIENCE"
## [41] "CONSTRUCTION SERVICES"
## [42] "COSMETOLOGY SERVICES AND CULINARY ARTS"
## [43] "COUNSELING PSYCHOLOGY"
## [44] "COURT REPORTING"
## [45] "CRIMINAL JUSTICE AND FIRE PROTECTION"
## [46] "CRIMINOLOGY"
## [47] "DRAMA AND THEATER ARTS"
## [48] "EARLY CHILDHOOD EDUCATION"
## [49] "ECOLOGY"
## [50] "ECONOMICS"
## [51] "EDUCATIONAL ADMINISTRATION AND SUPERVISION"
## [52] "EDUCATIONAL PSYCHOLOGY"
## [53] "ELECTRICAL ENGINEERING"
## [54] "ELECTRICAL ENGINEERING TECHNOLOGY"
## [55] "ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION"
## [56] "ELEMENTARY EDUCATION"
## [57] "ENGINEERING AND INDUSTRIAL MANAGEMENT"
## [58] "ENGINEERING MECHANICS PHYSICS AND SCIENCE"
## [59] "ENGINEERING TECHNOLOGIES"
## [60] "ENGLISH LANGUAGE AND LITERATURE"
## [61] "ENVIRONMENTAL ENGINEERING"
## [62] "ENVIRONMENTAL SCIENCE"
## [63] "FAMILY AND CONSUMER SCIENCES"
## [64] "FILM VIDEO AND PHOTOGRAPHIC ARTS"
## [65] "FINANCE"
## [66] "FINE ARTS"
## [67] "FOOD SCIENCE"
## [68] "FORESTRY"
## [69] "FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES"
## [70] "GENERAL AGRICULTURE"
## [71] "GENERAL BUSINESS"
## [72] "GENERAL EDUCATION"
## [73] "GENERAL ENGINEERING"
## [74] "GENERAL MEDICAL AND HEALTH SERVICES"
## [75] "GENERAL SOCIAL SCIENCES"
## [76] "GENETICS"
## [77] "GEOGRAPHY"
## [78] "GEOLOGICAL AND GEOPHYSICAL ENGINEERING"
## [79] "GEOLOGY AND EARTH SCIENCE"
## [80] "GEOSCIENCES"
## [81] "HEALTH AND MEDICAL ADMINISTRATIVE SERVICES"
## [82] "HEALTH AND MEDICAL PREPARATORY PROGRAMS"
## [83] "HISTORY"
## [84] "HOSPITALITY MANAGEMENT"
## [85] "HUMAN RESOURCES AND PERSONNEL MANAGEMENT"
## [86] "HUMAN SERVICES AND COMMUNITY ORGANIZATION"
## [87] "HUMANITIES"
## [88] "INDUSTRIAL AND MANUFACTURING ENGINEERING"
## [89] "INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY"
## [90] "INDUSTRIAL PRODUCTION TECHNOLOGIES"
## [91] "INFORMATION SCIENCES"
## [92] "INTERCULTURAL AND INTERNATIONAL STUDIES"
## [93] "INTERDISCIPLINARY SOCIAL SCIENCES"
## [94] "INTERNATIONAL BUSINESS"
## [95] "INTERNATIONAL RELATIONS"
## [96] "JOURNALISM"
## [97] "LANGUAGE AND DRAMA EDUCATION"
## [98] "LIBERAL ARTS"
## [99] "LIBRARY SCIENCE"
## [100] "LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE"
## [101] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [102] "MARKETING AND MARKETING RESEARCH"
## [103] "MASS MEDIA"
## [104] "MATERIALS ENGINEERING AND MATERIALS SCIENCE"
## [105] "MATERIALS SCIENCE"
## [106] "MATHEMATICS"
## [107] "MATHEMATICS AND COMPUTER SCIENCE"
## [108] "MATHEMATICS TEACHER EDUCATION"
## [109] "MECHANICAL ENGINEERING"
## [110] "MECHANICAL ENGINEERING RELATED TECHNOLOGIES"
## [111] "MEDICAL ASSISTING SERVICES"
## [112] "MEDICAL TECHNOLOGIES TECHNICIANS"
## [113] "METALLURGICAL ENGINEERING"
## [114] "MICROBIOLOGY"
## [115] "MILITARY TECHNOLOGIES"
## [116] "MINING AND MINERAL ENGINEERING"
## [117] "MISCELLANEOUS AGRICULTURE"
## [118] "MISCELLANEOUS BIOLOGY"
## [119] "MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION"
## [120] "MISCELLANEOUS EDUCATION"
## [121] "MISCELLANEOUS ENGINEERING"
## [122] "MISCELLANEOUS ENGINEERING TECHNOLOGIES"
## [123] "MISCELLANEOUS FINE ARTS"
## [124] "MISCELLANEOUS HEALTH MEDICAL PROFESSIONS"
## [125] "MISCELLANEOUS PSYCHOLOGY"
## [126] "MISCELLANEOUS SOCIAL SCIENCES"
## [127] "MOLECULAR BIOLOGY"
## [128] "MULTI-DISCIPLINARY OR GENERAL SCIENCE"
## [129] "MULTI/INTERDISCIPLINARY STUDIES"
## [130] "MUSIC"
## [131] "NATURAL RESOURCES MANAGEMENT"
## [132] "NAVAL ARCHITECTURE AND MARINE ENGINEERING"
## [133] "NEUROSCIENCE"
## [134] "NUCLEAR ENGINEERING"
## [135] "NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES"
## [136] "NURSING"
## [137] "NUTRITION SCIENCES"
## [138] "OCEANOGRAPHY"
## [139] "OPERATIONS LOGISTICS AND E-COMMERCE"
## [140] "OTHER FOREIGN LANGUAGES"
## [141] "PETROLEUM ENGINEERING"
## [142] "PHARMACOLOGY"
## [143] "PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION"
## [144] "PHILOSOPHY AND RELIGIOUS STUDIES"
## [145] "PHYSICAL AND HEALTH EDUCATION TEACHING"
## [146] "PHYSICAL FITNESS PARKS RECREATION AND LEISURE"
## [147] "PHYSICAL SCIENCES"
## [148] "PHYSICS"
## [149] "PHYSIOLOGY"
## [150] "PLANT SCIENCE AND AGRONOMY"
## [151] "POLITICAL SCIENCE AND GOVERNMENT"
## [152] "PRE-LAW AND LEGAL STUDIES"
## [153] "PSYCHOLOGY"
## [154] "PUBLIC ADMINISTRATION"
## [155] "PUBLIC POLICY"
## [156] "SCHOOL STUDENT COUNSELING"
## [157] "SCIENCE AND COMPUTER TEACHER EDUCATION"
## [158] "SECONDARY TEACHER EDUCATION"
## [159] "SOCIAL PSYCHOLOGY"
## [160] "SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION"
## [161] "SOCIAL WORK"
## [162] "SOCIOLOGY"
## [163] "SOIL SCIENCE"
## [164] "SPECIAL NEEDS EDUCATION"
## [165] "STATISTICS AND DECISION SCIENCE"
## [166] "STUDIO ARTS"
## [167] "TEACHER EDUCATION: MULTIPLE LEVELS"
## [168] "THEOLOGY AND RELIGIOUS VOCATIONS"
## [169] "TRANSPORTATION SCIENCES AND TECHNOLOGIES"
## [170] "TREATMENT THERAPY PROFESSIONS"
## [171] "UNITED STATES HISTORY"
## [172] "VISUAL AND PERFORMING ARTS"
## [173] "ZOOLOGY"
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
theme_set(theme_light())
CollegeMajors %>% # pipe operation
ggplot(aes(Median)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
So as we can seen most major’s make a little over 30K
CollegeMajors %>%
count(Major_category, wt = Total, sort = TRUE) %>%
ggplot(aes(Major_category, n)) + geom_col() + coord_flip() +
labs(x = "",
y = "Total # of Graduates")
R allows practicing a wide variety of statistical and graphical techniques. Using R I was able to explore dataset about college Majors, and after reading a little bit about tidyverse library, I was able to produced a handful of figures that I found pretty interesting.