The given dataset, PUMS_reduced.csv, was obtained from the US Census, 2012-2016 ACS PUMS DATA. The sample data includes 67,248 New Hampshire residents.
See below for infomration on Public Use Microdata Sample (PUMS)
# Load packages
library(tidyverse)
# Import data
PUMS_reduced <- read.csv("~/R/Buss Sat/DATA/PUMS_reduced.csv") %>% as_tibble()
PUMS_reduced
## # A tibble: 67,248 x 7
## X PUMA age education field_of_degree income occupation
## <int> <int> <int> <fct> <fct> <int> <fct>
## 1 1 1000 87 lessthanBA <NA> 11800 <NA>
## 2 2 900 42 lessthanBA <NA> 8800 Cashiers
## 3 3 800 43 BAorhigher English Language 10000 Human Resources ~
## 4 4 800 43 lessthanBA <NA> 112000 Securities, Comm~
## 5 5 800 14 lessthanBA <NA> NA <NA>
## 6 6 800 11 lessthanBA <NA> NA <NA>
## 7 7 900 63 lessthanBA <NA> 23900 Driver/Sales Wor~
## 8 8 900 59 BAorhigher Early Childhood E~ 34600 Elementary And M~
## 9 9 900 65 lessthanBA <NA> 9400 Retail Salespers~
## 10 10 300 50 lessthanBA <NA> 18000 Retail Sales Wor~
## # ... with 67,238 more rows
The data is numeric
The R object is Data Frame since it has more than one variable on it
It has the information on the age,education,field of degree, occupation and income
Hint: Use count() with the sort argument.
PUMS_reduced %>%count(occupation,sort = TRUE)
## # A tibble: 469 x 2
## occupation n
## <fct> <int>
## 1 <NA> 23193
## 2 Miscellaneous Managers, Includ 1336
## 3 Retail Salespersons 1150
## 4 Elementary And Middle School T 1139
## 5 Cashiers 1077
## 6 Secretaries And Administrative 1045
## 7 Registered Nurses 1015
## 8 Line Supervisors Of Retail Sal 865
## 9 Driver/Sales Workers And Truck 855
## 10 Janitors And Building Cleaners 717
## # ... with 459 more rows
Misc managers is the most common occupation
Hint: Take PUMS_reduced, pipe it to dplyr::count, and pipe it to dplyr::filter. Remember you can enter more than one variable in the count() function.
PUMS_reduced %>% count(field_of_degree, occupation) %>% filter(field_of_degree =="Finance")
## # A tibble: 69 x 3
## field_of_degree occupation n
## <fct> <fct> <int>
## 1 Finance Accountants And Auditors 16
## 2 Finance Aircraft Pilots And Flight Eng 1
## 3 Finance Bill And Account Collectors 1
## 4 Finance Billing And Posting Clerks 1
## 5 Finance Bookkeeping, Accounting, And A 6
## 6 Finance Business Operations Specialist 1
## 7 Finance Cashiers 1
## 8 Finance Chefs And Head Cooks 1
## 9 Finance Chief Executives And Legislato 10
## 10 Finance "Claims Adjusters, Appraisers, " 1
## # ... with 59 more rows
accountants and auditors is the most common among finance
Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.
PUMS_reduced %>% group_by(field_of_degree) %>%
summarise(median_income = median(income),n=n()) %>%
arrange(desc(median_income))
## # A tibble: 169 x 3
## field_of_degree median_income n
## <fct> <dbl> <int>
## 1 Petroleum Engineering 188000 1
## 2 Materials Science 154800 4
## 3 Nuclear Engineering 148300 7
## 4 Physical Sciences 129000 7
## 5 Mechanical Engineering Related Technologies 111900 14
## 6 Pharmacy Pharmaceutical Sciences 106700 83
## 7 Biological Engineering 101800 8
## 8 Metallurgical Engineering 99700 7
## 9 Naval Architecture 97000 18
## 10 Electrical Engineering 94000 465
## # ... with 159 more rows
Petroleum Engineering is the top occupation they make 188000.0
Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, pipe it to dplyr::arrange, and pipe it to data.frame().
PUMS_reduced %>%
group_by(field_of_degree) %>%
summarise(median_income = median(income)) %>%
arrange(desc(median_income))%>%
data.frame
## field_of_degree median_income
## 1 Petroleum Engineering 188000.0
## 2 Materials Science 154800.0
## 3 Nuclear Engineering 148300.0
## 4 Physical Sciences 129000.0
## 5 Mechanical Engineering Related Technologies 111900.0
## 6 Pharmacy Pharmaceutical Sciences 106700.0
## 7 Biological Engineering 101800.0
## 8 Metallurgical Engineering 99700.0
## 9 Naval Architecture 97000.0
## 10 Electrical Engineering 94000.0
## 11 Computer Engineering 91075.0
## 12 Aerospace Engineering 89200.0
## 13 Information Sciences 88250.0
## 14 Management Information Systems 86650.0
## 15 Physics 86250.0
## 16 Computer Science 86000.0
## 17 Mechanical Engineering 85400.0
## 18 Transportation Sciences 85000.0
## 19 Materials Engineering 84200.0
## 20 Miscellaneous Business 82500.0
## 21 Oceanography 81505.0
## 22 General Engineering 80000.0
## 23 Industrial 76415.0
## 24 Computer Programming 76100.0
## 25 Actuarial Science 76000.0
## 26 Statistics 76000.0
## 27 Miscellaneous Social Sciences 75900.0
## 28 Finance 75000.0
## 29 Engineering Technologies 74040.0
## 30 Operations Logistics 74005.0
## 31 Agricultural Economics 74000.0
## 32 Electrical Engineering Technology 73000.0
## 33 Chemical Engineering 71450.0
## 34 Architectural Engineering 70000.0
## 35 Engineering 70000.0
## 36 Environmental Engineering 70000.0
## 37 Soil Science 70000.0
## 38 Mathematics 69100.0
## 39 Chemistry 69000.0
## 40 Engineering Mechanics Physics 69000.0
## 41 Computer Administration Management 68400.0
## 42 Medical Technologies Technicians 68000.0
## 43 Biochemical Sciences 66000.0
## 44 Economics 66000.0
## 45 Geosciences 66000.0
## 46 Business Economics 65540.0
## 47 Miscellaneous Engineering 65000.0
## 48 Construction Services 64500.0
## 49 Civil Engineering 64410.0
## 50 Physiology 63000.0
## 51 Computer 62140.0
## 52 Geological 59750.0
## 53 Biology 59040.0
## 54 Astronomy 59000.0
## 55 Plant Science 58500.0
## 56 Political Science 57500.0
## 57 Public Administration 57500.0
## 58 Biomedical Engineering 57000.0
## 59 Forestry 56600.0
## 60 Industrial Production Technologies 56400.0
## 61 Nuclear, Industrial Radiology, 56000.0
## 62 School Student Counseling 55750.0
## 63 Accounting 55007.0
## 64 Business Management 55000.0
## 65 Computer Networking 55000.0
## 66 Genetics 55000.0
## 67 Microbiology 55000.0
## 68 Treatment Therapy Professions 55000.0
## 69 United States History 54800.0
## 70 Pharmacology 54500.0
## 71 Humanities 54000.0
## 72 Marketing 54000.0
## 73 Communication Disorders Sciences 53750.0
## 74 Other Foreign Languages 53050.0
## 75 General Business 53010.0
## 76 Geology 52700.0
## 77 Nursing 52150.0
## 78 Multi-Disciplinary Or General Science 50004.0
## 79 Botany 50000.0
## 80 Geography 50000.0
## 81 Health 50000.0
## 82 Medical Assisting Services 50000.0
## 83 Miscellaneous Engineering Technologies 50000.0
## 84 Natural Resources Management 50000.0
## 85 Miscellaneous Education 49800.0
## 86 Human Resources 49500.0
## 87 Physical 49202.0
## 88 Mathematics Teacher Education 49200.0
## 89 Atmospheric Sciences 48835.0
## 90 Public Policy 48600.0
## 91 Educational Psychology 48250.0
## 92 Criminal Justice 48000.0
## 93 Pre-Law 48000.0
## 94 Secondary Teacher Education 48000.0
## 95 Counseling Psychology 47000.0
## 96 International Business 47000.0
## 97 Miscellaneous Biology 47000.0
## 98 History 46100.0
## 99 International Relations 45910.0
## 100 Language 45650.0
## 101 Animal Sciences 45000.0
## 102 French German Latin 45000.0
## 103 Intercultural 45000.0
## 104 Molecular Biology 44000.0
## 105 Mass Media 43105.0
## 106 Environmental Science 43050.0
## 107 Journalism 43000.0
## 108 Miscellaneous Health Medical Professions 43000.0
## 109 Applied Mathematics 42720.0
## 110 Miscellaneous Fine Arts 42500.0
## 111 Science 42500.0
## 112 Social Science Or History Teacher Education 42040.0
## 113 Architecture 42000.0
## 114 Community 41500.0
## 115 Library Science 41500.0
## 116 Philosophy 41400.0
## 117 Human Services 41250.0
## 118 Educational Administration 40900.0
## 119 Special Needs Education 40800.0
## 120 Communications 40610.0
## 121 Sociology 40500.0
## 122 Zoology 40500.0
## 123 Clinical Psychology 40050.0
## 124 Art 40000.0
## 125 Art History 40000.0
## 126 Nutrition Sciences 40000.0
## 127 Physical Fitness Parks Recreation 40000.0
## 128 Psychology 40000.0
## 129 Area Ethnic 39960.0
## 130 English Language 38920.0
## 131 Elementary Education 38600.0
## 132 Agriculture Production 38500.0
## 133 General Social Sciences 38500.0
## 134 Linguistics 38450.0
## 135 Music 38300.0
## 136 Hospitality Management 38005.0
## 137 Anthropology 37000.5
## 138 Film Video 36000.0
## 139 Liberal Arts 36000.0
## 140 Theology 35600.0
## 141 Ecology 35500.0
## 142 General Education 35200.0
## 143 Neuroscience 35000.0
## 144 Interdisciplinary Social Sciences 34260.0
## 145 Miscellaneous Psychology 34100.0
## 146 Advertising 33900.0
## 147 General Agriculture 33350.0
## 148 Family 32700.0
## 149 Cosmetology Services 32500.0
## 150 Communication Technologies 31950.0
## 151 Drama 30500.0
## 152 Social Work 30300.0
## 153 Fine Arts 30000.0
## 154 General Medical 30000.0
## 155 Composition 29200.0
## 156 Military Technologies 29000.0
## 157 Visual 28500.0
## 158 Commercial Art 28400.0
## 159 Early Childhood Education 28000.0
## 160 Teacher Education: Multiple Levels 27400.0
## 161 Studio Arts 27000.0
## 162 Criminology 25000.0
## 163 Multi/Interdisciplinary Studies 23000.0
## 164 Social Psychology 21500.0
## 165 Electrical, Mechanical, 17000.0
## 166 Food Science 16905.0
## 167 Cognitive Science 16500.0
## 168 Miscellaneous Agriculture 14500.0
## 169 <NA> NA
Finance is number 28