1

Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

majors = read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv')

head(majors)
unique(majors$Major)
##   [1] "PETROLEUM ENGINEERING"                                            
##   [2] "MINING AND MINERAL ENGINEERING"                                   
##   [3] "METALLURGICAL ENGINEERING"                                        
##   [4] "NAVAL ARCHITECTURE AND MARINE ENGINEERING"                        
##   [5] "CHEMICAL ENGINEERING"                                             
##   [6] "NUCLEAR ENGINEERING"                                              
##   [7] "ACTUARIAL SCIENCE"                                                
##   [8] "ASTRONOMY AND ASTROPHYSICS"                                       
##   [9] "MECHANICAL ENGINEERING"                                           
##  [10] "ELECTRICAL ENGINEERING"                                           
##  [11] "COMPUTER ENGINEERING"                                             
##  [12] "AEROSPACE ENGINEERING"                                            
##  [13] "BIOMEDICAL ENGINEERING"                                           
##  [14] "MATERIALS SCIENCE"                                                
##  [15] "ENGINEERING MECHANICS PHYSICS AND SCIENCE"                        
##  [16] "BIOLOGICAL ENGINEERING"                                           
##  [17] "INDUSTRIAL AND MANUFACTURING ENGINEERING"                         
##  [18] "GENERAL ENGINEERING"                                              
##  [19] "ARCHITECTURAL ENGINEERING"                                        
##  [20] "COURT REPORTING"                                                  
##  [21] "COMPUTER SCIENCE"                                                 
##  [22] "FOOD SCIENCE"                                                     
##  [23] "ELECTRICAL ENGINEERING TECHNOLOGY"                                
##  [24] "MATERIALS ENGINEERING AND MATERIALS SCIENCE"                      
##  [25] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"                    
##  [26] "CIVIL ENGINEERING"                                                
##  [27] "CONSTRUCTION SERVICES"                                            
##  [28] "OPERATIONS LOGISTICS AND E-COMMERCE"                              
##  [29] "MISCELLANEOUS ENGINEERING"                                        
##  [30] "PUBLIC POLICY"                                                    
##  [31] "ENVIRONMENTAL ENGINEERING"                                        
##  [32] "ENGINEERING TECHNOLOGIES"                                         
##  [33] "MISCELLANEOUS FINE ARTS"                                          
##  [34] "GEOLOGICAL AND GEOPHYSICAL ENGINEERING"                           
##  [35] "NURSING"                                                          
##  [36] "FINANCE"                                                          
##  [37] "ECONOMICS"                                                        
##  [38] "BUSINESS ECONOMICS"                                               
##  [39] "INDUSTRIAL PRODUCTION TECHNOLOGIES"                               
##  [40] "NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES"       
##  [41] "ACCOUNTING"                                                       
##  [42] "MATHEMATICS"                                                      
##  [43] "COMPUTER AND INFORMATION SYSTEMS"                                 
##  [44] "PHYSICS"                                                          
##  [45] "MEDICAL TECHNOLOGIES TECHNICIANS"                                 
##  [46] "INFORMATION SCIENCES"                                             
##  [47] "STATISTICS AND DECISION SCIENCE"                                  
##  [48] "APPLIED MATHEMATICS"                                              
##  [49] "PHARMACOLOGY"                                                     
##  [50] "OCEANOGRAPHY"                                                     
##  [51] "ENGINEERING AND INDUSTRIAL MANAGEMENT"                            
##  [52] "MEDICAL ASSISTING SERVICES"                                       
##  [53] "MATHEMATICS AND COMPUTER SCIENCE"                                 
##  [54] "COMPUTER PROGRAMMING AND DATA PROCESSING"                         
##  [55] "COGNITIVE SCIENCE AND BIOPSYCHOLOGY"                              
##  [56] "SCHOOL STUDENT COUNSELING"                                        
##  [57] "INTERNATIONAL RELATIONS"                                          
##  [58] "GENERAL BUSINESS"                                                 
##  [59] "ARCHITECTURE"                                                     
##  [60] "INTERNATIONAL BUSINESS"                                           
##  [61] "PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION"              
##  [62] "MOLECULAR BIOLOGY"                                                
##  [63] "MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION"                  
##  [64] "AGRICULTURE PRODUCTION AND MANAGEMENT"                            
##  [65] "GENERAL AGRICULTURE"                                              
##  [66] "MISCELLANEOUS ENGINEERING TECHNOLOGIES"                           
##  [67] "MECHANICAL ENGINEERING RELATED TECHNOLOGIES"                      
##  [68] "GENETICS"                                                         
##  [69] "MISCELLANEOUS SOCIAL SCIENCES"                                    
##  [70] "UNITED STATES HISTORY"                                            
##  [71] "INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY"                         
##  [72] "AGRICULTURAL ECONOMICS"                                           
##  [73] "PHYSICAL SCIENCES"                                                
##  [74] "MILITARY TECHNOLOGIES"                                            
##  [75] "CHEMISTRY"                                                        
##  [76] "ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION"
##  [77] "BUSINESS MANAGEMENT AND ADMINISTRATION"                           
##  [78] "MARKETING AND MARKETING RESEARCH"                                 
##  [79] "POLITICAL SCIENCE AND GOVERNMENT"                                 
##  [80] "GEOGRAPHY"                                                        
##  [81] "MICROBIOLOGY"                                                     
##  [82] "COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY"                  
##  [83] "BIOCHEMICAL SCIENCES"                                             
##  [84] "BOTANY"                                                           
##  [85] "COMPUTER NETWORKING AND TELECOMMUNICATIONS"                       
##  [86] "GEOLOGY AND EARTH SCIENCE"                                        
##  [87] "HUMAN RESOURCES AND PERSONNEL MANAGEMENT"                         
##  [88] "PRE-LAW AND LEGAL STUDIES"                                        
##  [89] "MISCELLANEOUS HEALTH MEDICAL PROFESSIONS"                         
##  [90] "PUBLIC ADMINISTRATION"                                            
##  [91] "GEOSCIENCES"                                                      
##  [92] "SOCIAL PSYCHOLOGY"                                                
##  [93] "ENVIRONMENTAL SCIENCE"                                            
##  [94] "COMMUNICATIONS"                                                   
##  [95] "CRIMINAL JUSTICE AND FIRE PROTECTION"                             
##  [96] "COMMERCIAL ART AND GRAPHIC DESIGN"                                
##  [97] "JOURNALISM"                                                       
##  [98] "MULTI-DISCIPLINARY OR GENERAL SCIENCE"                            
##  [99] "ADVERTISING AND PUBLIC RELATIONS"                                 
## [100] "AREA ETHNIC AND CIVILIZATION STUDIES"                             
## [101] "SPECIAL NEEDS EDUCATION"                                          
## [102] "PHYSIOLOGY"                                                       
## [103] "CRIMINOLOGY"                                                      
## [104] "NUTRITION SCIENCES"                                               
## [105] "HEALTH AND MEDICAL ADMINISTRATIVE SERVICES"                       
## [106] "COMMUNICATION TECHNOLOGIES"                                       
## [107] "TRANSPORTATION SCIENCES AND TECHNOLOGIES"                         
## [108] "NATURAL RESOURCES MANAGEMENT"                                     
## [109] "NEUROSCIENCE"                                                     
## [110] "MULTI/INTERDISCIPLINARY STUDIES"                                  
## [111] "ATMOSPHERIC SCIENCES AND METEOROLOGY"                             
## [112] "FORESTRY"                                                         
## [113] "SOIL SCIENCE"                                                     
## [114] "GENERAL EDUCATION"                                                
## [115] "HISTORY"                                                          
## [116] "FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES"    
## [117] "INTERCULTURAL AND INTERNATIONAL STUDIES"                          
## [118] "SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION"                      
## [119] "COMMUNITY AND PUBLIC HEALTH"                                      
## [120] "MATHEMATICS TEACHER EDUCATION"                                    
## [121] "EDUCATIONAL ADMINISTRATION AND SUPERVISION"                       
## [122] "HEALTH AND MEDICAL PREPARATORY PROGRAMS"                          
## [123] "MISCELLANEOUS BIOLOGY"                                            
## [124] "BIOLOGY"                                                          
## [125] "SOCIOLOGY"                                                        
## [126] "MASS MEDIA"                                                       
## [127] "TREATMENT THERAPY PROFESSIONS"                                    
## [128] "HOSPITALITY MANAGEMENT"                                           
## [129] "LANGUAGE AND DRAMA EDUCATION"                                     
## [130] "LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE"              
## [131] "MISCELLANEOUS EDUCATION"                                          
## [132] "INTERDISCIPLINARY SOCIAL SCIENCES"                                
## [133] "ECOLOGY"                                                          
## [134] "SECONDARY TEACHER EDUCATION"                                      
## [135] "GENERAL MEDICAL AND HEALTH SERVICES"                              
## [136] "PHILOSOPHY AND RELIGIOUS STUDIES"                                 
## [137] "ART AND MUSIC EDUCATION"                                          
## [138] "ENGLISH LANGUAGE AND LITERATURE"                                  
## [139] "ELEMENTARY EDUCATION"                                             
## [140] "PHYSICAL FITNESS PARKS RECREATION AND LEISURE"                    
## [141] "LIBERAL ARTS"                                                     
## [142] "FILM VIDEO AND PHOTOGRAPHIC ARTS"                                 
## [143] "GENERAL SOCIAL SCIENCES"                                          
## [144] "PLANT SCIENCE AND AGRONOMY"                                       
## [145] "SCIENCE AND COMPUTER TEACHER EDUCATION"                           
## [146] "PSYCHOLOGY"                                                       
## [147] "MUSIC"                                                            
## [148] "PHYSICAL AND HEALTH EDUCATION TEACHING"                           
## [149] "ART HISTORY AND CRITICISM"                                        
## [150] "FINE ARTS"                                                        
## [151] "FAMILY AND CONSUMER SCIENCES"                                     
## [152] "SOCIAL WORK"                                                      
## [153] "ANIMAL SCIENCES"                                                  
## [154] "VISUAL AND PERFORMING ARTS"                                       
## [155] "TEACHER EDUCATION: MULTIPLE LEVELS"                               
## [156] "MISCELLANEOUS PSYCHOLOGY"                                         
## [157] "HUMAN SERVICES AND COMMUNITY ORGANIZATION"                        
## [158] "HUMANITIES"                                                       
## [159] "THEOLOGY AND RELIGIOUS VOCATIONS"                                 
## [160] "STUDIO ARTS"                                                      
## [161] "COSMETOLOGY SERVICES AND CULINARY ARTS"                           
## [162] "MISCELLANEOUS AGRICULTURE"                                        
## [163] "ANTHROPOLOGY AND ARCHEOLOGY"                                      
## [164] "COMMUNICATION DISORDERS SCIENCES AND SERVICES"                    
## [165] "EARLY CHILDHOOD EDUCATION"                                        
## [166] "OTHER FOREIGN LANGUAGES"                                          
## [167] "DRAMA AND THEATER ARTS"                                           
## [168] "COMPOSITION AND RHETORIC"                                         
## [169] "ZOOLOGY"                                                          
## [170] "EDUCATIONAL PSYCHOLOGY"                                           
## [171] "CLINICAL PSYCHOLOGY"                                              
## [172] "COUNSELING PSYCHOLOGY"                                            
## [173] "LIBRARY SCIENCE"
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.3
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
majors %>%
  filter(str_detect(Major, 'DATA|STATISTICS'))

We can see that there are 3 majors which include the the words DATA and STATISTICS in them

2

1 “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

string = '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"

[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

[9] "elderberry"   "lime"         "lychee"       "mulberry"    

[13] "olive"        "salal berry"'


string = unlist(str_extract_all(string, pattern = "\"([a-z]+.[a-z]+)\""))
string = str_remove_all(string, "\"")

hold = "c("

for (str in string)
{
  hold = paste(hold, "\"", str, "\", ", sep="")
}

hold = substr(hold, 0, nchar(hold)-2)

hold = paste(hold,")", sep="")

cat(hold)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

3 Describe, in words, what these expressions will match:

1. (.)\1\1
2. "(.)(.)\\2\\1"
3. (..)\1
4. "(.).\\1.\\1"
5. "(.)(.)(.).*\\3\\2\\1"


1. This will match the first letter and repeat it twice. Examples are "aaa", "bbb"
2. This will store the first and second letter, repeat the second letter, and then state the first letter. Examples are "abba"
3. This will store the first two letters and repeat them. Examples are "abab"
4. This will store the first letter and allow anything for the second letter. Then repeat the first letter, allow for any letter then repeat the first letter again. Examples are "abaca"
5. This will store the first 3 letters and allow for any letter in the 4th position. The * will then allow any letter of any length and will look for the ending of the first 3 letters backwards. Examples are "abcefghicba"

4 Construct regular expressions to match words that:

1. Start and end with the same character.
2. Contain a repeated pair of letters (e.g. "church" contains "ch" repeated twice.)
3. Contain one letter repeated in at least three places (e.g. "eleven" contains three "e"s.)

1. "^(.).*\\1$"
str_view(c("church", "abc", "aba"), "^(.).*\\1$", match = TRUE)
2. "(..).*\\1"
str_view(c("church", "abc", "aba", "abab"), "(..).*\\1", match = TRUE)
3. ".(.).*\\1.*\\1"
str_view(c("abc", "eleven", "aaa"), "(.).*\\1.*\\1", match = TRUE)