This article delves into the economic realities that college students face when choosing a major, highlighting how a degree alone no longer guarantees financial success. It examines detailed data on earnings across different fields of study, revealing significant disparities in income potential. By analyzing trends and offering insights, the article underscores the importance of making informed choices when selecting a major, as it can dramatically influence graduates’ long-term financial outcomes and career trajectories.
The link to the article: https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
file_path<-"https://raw.githubusercontent.com/Natacode819/Character-Manipulation-and-Date-Processing/main/all-ages.csv"
majors<-read.csv(file_path)
head(majors, 10)
## Major_code Major
## 1 1100 GENERAL AGRICULTURE
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 3 1102 AGRICULTURAL ECONOMICS
## 4 1103 ANIMAL SCIENCES
## 5 1104 FOOD SCIENCE
## 6 1105 PLANT SCIENCE AND AGRONOMY
## 7 1106 SOIL SCIENCE
## 8 1199 MISCELLANEOUS AGRICULTURE
## 9 1301 ENVIRONMENTAL SCIENCE
## 10 1302 FORESTRY
## Major_category Total Employed
## 1 Agriculture & Natural Resources 128148 90245
## 2 Agriculture & Natural Resources 95326 76865
## 3 Agriculture & Natural Resources 33955 26321
## 4 Agriculture & Natural Resources 103549 81177
## 5 Agriculture & Natural Resources 24280 17281
## 6 Agriculture & Natural Resources 79409 63043
## 7 Agriculture & Natural Resources 6586 4926
## 8 Agriculture & Natural Resources 8549 6392
## 9 Biology & Life Science 106106 87602
## 10 Agriculture & Natural Resources 69447 48228
## Employed_full_time_year_round Unemployed Unemployment_rate Median P25th
## 1 74078 2423 0.02614711 50000 34000
## 2 64240 2266 0.02863606 54000 36000
## 3 22810 821 0.03024832 63000 40000
## 4 64937 3619 0.04267890 46000 30000
## 5 12722 894 0.04918845 62000 38500
## 6 51077 2070 0.03179089 50000 35000
## 7 4042 264 0.05086705 63000 39400
## 8 5074 261 0.03923042 52000 35000
## 9 65238 4736 0.05128983 52000 38000
## 10 39613 2144 0.04256333 58000 40500
## P75th
## 1 80000
## 2 80000
## 3 98000
## 4 72000
## 5 90000
## 6 75000
## 7 88000
## 8 75000
## 9 75000
## 10 80000
colnames(majors)
## [1] "Major_code" "Major"
## [3] "Major_category" "Total"
## [5] "Employed" "Employed_full_time_year_round"
## [7] "Unemployed" "Unemployment_rate"
## [9] "Median" "P25th"
## [11] "P75th"
subset_majors<-majors%>% filter(grepl("DATA|STATISTICS", Major, ,ignore.case=TRUE))
subset_majors
## Major_code Major
## 1 2101 COMPUTER PROGRAMMING AND DATA PROCESSING
## 2 3702 STATISTICS AND DECISION SCIENCE
## 3 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## Major_category Total Employed Employed_full_time_year_round
## 1 Computers & Mathematics 29317 22828 18747
## 2 Computers & Mathematics 24806 18808 14468
## 3 Business 156673 134478 118249
## Unemployed Unemployment_rate Median P25th P75th
## 1 2265 0.09026422 60000 40000 85000
## 2 1138 0.05705405 70000 43000 102000
## 3 6186 0.04397714 72000 50000 100000
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
input_data <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
input_data
## [1] "[1] \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n[5] \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \n[9] \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \n[13] \"olive\" \"salal berry\""
First, I remove the indices and newlines, then collapse the string into a single line
cleaned_data <- gsub("\\[\\d+\\]", "", input_data) # Remove indices
cleaned_data
## [1] " \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \n \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \n \"olive\" \"salal berry\""
Second, I remove newlines
cleaned_data <- gsub("\n", "", cleaned_data)
cleaned_data
## [1] " \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\" \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \"olive\" \"salal berry\""
Third, I replace multiple spaces with a single space
cleaned_data <- gsub(" +", " ", cleaned_data)
cleaned_data
## [1] " \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\" \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \"olive\" \"salal berry\""
Forth, I add commas between items
cleaned_data <- gsub('" "', '", "', cleaned_data)
cleaned_data
## [1] " \"bell pepper\", \"bilberry\", \"blackberry\", \"blood orange\", \"blueberry\", \"cantaloupe\", \"chili pepper\", \"cloudberry\", \"elderberry\", \"lime\", \"lychee\", \"mulberry\", \"olive\", \"salal berry\""
Next, I format the final output
formatted_output <- paste0("c(",cleaned_data,")")
formatted_output
## [1] "c( \"bell pepper\", \"bilberry\", \"blackberry\", \"blood orange\", \"blueberry\", \"cantaloupe\", \"chili pepper\", \"cloudberry\", \"elderberry\", \"lime\", \"lychee\", \"mulberry\", \"olive\", \"salal berry\")"
Last, I remove leading/trailing spaces
formatted_output <- gsub('^\\s+"|"$', '', formatted_output)
formatted_output
## [1] "c( \"bell pepper\", \"bilberry\", \"blackberry\", \"blood orange\", \"blueberry\", \"cantaloupe\", \"chili pepper\", \"cloudberry\", \"elderberry\", \"lime\", \"lychee\", \"mulberry\", \"olive\", \"salal berry\")"
To provide a desired format, I print the formatted output using cat() function
cat(formatted_output)
## c( "bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
(.)\\1\\1
(.)(.)\\2\\1
(..)\1
(.)\\.\\1.\\1
(.)(.)(.).*\\3\\2\\1
library(stringr)
Data to match expressions:
words<-c("aaaaa", "dddy" , "reremind", "xyxy", "agga", "129921", "a.a.a", "1.1.1", "a.b.a", "7755745", "abcxyzcbazyx", "123456321", "fggggddsss")
I use str_detect() to check if the expression (.)\\1\\1
matches.
.) and
stores it in a capturing group.\\1 refers to the first capturing group, so
it matches the same character captured in the first group.\\1 again refers to the first capturing
group, matching the same character once more.(.)\\1\\1 will match any string where the
same character appears three times in a row.match1<-words%>%str_subset("(.)\\1\\1")
match1
## [1] "aaaaa" "dddy" "fggggddsss"
The next example is to match (.)(.)\\2\\1
expression.
(.)(.) captures two consecutive characters. Each
character is captured by a separate group:. captures any single character and stores it
in the first capturing group (\\1).. captures any single character and stores
it in the second capturing group (\\2).\\2 matches the same character as captured by the
second capturing group. So, this part matches the second character
again.\\1 matches the same character as captured by the first
capturing group. So, this part matches the first character again.(.)(.)\\2\\1 matches a sequence where the
first two characters are followed by those same characters in reverse
order.match2<-words%>%str_subset("(.)(.)\\2\\1")
match2
## [1] "aaaaa" "agga" "129921" "7755745" "fggggddsss"
The third example is to match (..)\1 expression.This
expression is used to match a sequence where a two-character substring
is immediately followed by a repetition of the same two-character
substring. The details are as follows:
( .. ) captures any two characters into a capturing
group. This means the first part of the pattern is any two-character
sequence.\1 refers to the content of the first capturing group,
meaning it matches the same two-character sequence captured by the first
group.(..)\1 will match any string where a
two-character sequence is immediately followed by the same two-character
sequence.match3<-words%>%str_subset("\\b(..)\\1\\b")
match3
## [1] "xyxy" "a.a.a" "1.1.1"
The next example is to match (.)\\.\\1.\\1 expression.
This expression is used to match a pattern where a single character is
repeated at specific positions in a string, separated by literal dots.
Here’s a breakdown:
(.) captures any single character into the first
capturing group.\. matches a literal dot.\1 refers to the same character captured by the first
capturing group.\. matches another literal dot.\1 refers to the same character again, matching the
same character as the one captured in the first group.(.)\\.\\1.\\1 will match a string where a
single character is followed by a dot, then the same character is
followed by another dot, and then the same character appears again.match4<-words%>%str_subset("(.)\\.\\1.\\1")
match4
## [1] "a.a.a" "1.1.1"
The final example is to match (.)(.)(.).*\\3\\2\\1
expression.This expression is used to match a string where three
specific characters are repeated in reverse order after any number of
other characters. Here’s a breakdown:
(.)(.)(.) captures three consecutive characters into
three separate capturing groups:(.) captures the first character.(.) captures the second character.(.) captures the third character..* matches any number of any characters. This allows
for any content to appear between the initial three characters and their
reversed repetition.\\3 refers to the third capturing group.\\2 refers to the second capturing group.\\1 refers to the first capturing group.(.)(.)(.).*\\3\\2\\1 will match any string
where the first three characters appers first; after these characters,
there can be any sequence of characters; and the same three characters
appear again in reverse order.match5<-words%>%str_subset("(.)(.)(.).*\\3\\2\\1")
match5
## [1] "129921" "abcxyzcbazyx" "123456321"
Start and end with the same character.
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
^ asserts the start of the string.(.) captures the first character..* matches any characters (zero or more) in
between.\\1 ensures that the last character matches the first
captured character.$ asserts the end of the string.words_match1<-c("throughout", "level", "window", "see", "need", "refer")
words_match1
## [1] "throughout" "level" "window" "see" "need"
## [6] "refer"
match6<-words_match1%>%str_subset("^(.)[\\s\\S]*\\1$")
match6
## [1] "throughout" "level" "window" "refer"
(..): Captures any two consecutive characters (a
pair)..*: Matches any number of characters in between (if
any).\\1: Ensures the pair captured is repeated somewhere
later in the word.words_match2 <- c("church", "datada", "banana", "abcd", "chch", "different")
words_match2
## [1] "church" "datada" "banana" "abcd" "chch" "different"
match7<-words_match2%>%str_subset("(..).*?\\1")
match7
## [1] "church" "datada" "banana" "chch"
(.): Captures any single character into group 1.\\w*: Matches any number of word characters (letters,
digits, and underscores) in between.\\1: Ensures that the captured character appears at
least three times.words_match3<- c("eleven", "dataset", "success", "mathematical", "complete")
words_match3
## [1] "eleven" "dataset" "success" "mathematical" "complete"
match8<-words_match3%>%str_subset("(.)\\w+\\1\\w*\\1")
match8
## [1] "eleven" "success" "mathematical"