The given dataset, PUMS_reduced.csv, was obtained from the US Census, 2012-2016 ACS PUMS DATA. The sample data includes 67,248 New Hampshire residents.
See below for infomration on Public Use Microdata Sample (PUMS)
# Load packages
library(tidyverse)
# Import data
PUMS_reduced <- read.csv("~/R/busStat/Data/PUMS_reduced.csv") %>% as_tibble()
PUMS_reduced
## # A tibble: 67,248 x 7
## X PUMA age education field_of_degree income occupation
## <int> <int> <int> <fct> <fct> <int> <fct>
## 1 1 1000 87 lessthanBA <NA> 11800 <NA>
## 2 2 900 42 lessthanBA <NA> 8800 Cashiers
## 3 3 800 43 BAorhigher English Language 10000 Human Resources ~
## 4 4 800 43 lessthanBA <NA> 112000 Securities, Comm~
## 5 5 800 14 lessthanBA <NA> NA <NA>
## 6 6 800 11 lessthanBA <NA> NA <NA>
## 7 7 900 63 lessthanBA <NA> 23900 Driver/Sales Wor~
## 8 8 900 59 BAorhigher Early Childhood E~ 34600 Elementary And M~
## 9 9 900 65 lessthanBA <NA> 9400 Retail Salespers~
## 10 10 300 50 lessthanBA <NA> 18000 Retail Sales Wor~
## # ... with 67,238 more rows
The type of data the variable occupation is character.
R object in PUMS_reduced is data frame because each variable is catorgorized by the individual person from New Hampshire and looks like a table with different data type.
The first observation (first Row) using all the variables is, first set up by 1000 PUMA the geographic unit of the Seacoat of New Hampshire which is located in Portsmouth. This individual is 87 years old, with an education less than a BA degree with a field of degree N/A earning about $11,800 in come a year. This individual does not have an occupation so maybe this individual is retired at the age of 87.
Hint: Use count() with the sort argument.
PUMS_reduced %>% count(occupation, sort = TRUE)
## # A tibble: 469 x 2
## occupation n
## <fct> <int>
## 1 <NA> 23193
## 2 Miscellaneous Managers, Includ 1336
## 3 Retail Salespersons 1150
## 4 Elementary And Middle School T 1139
## 5 Cashiers 1077
## 6 Secretaries And Administrative 1045
## 7 Registered Nurses 1015
## 8 Line Supervisors Of Retail Sal 865
## 9 Driver/Sales Workers And Truck 855
## 10 Janitors And Building Cleaners 717
## # ... with 459 more rows
The most common occupation in New Hampshire is N/A with 23,193 individuals but the second most common is miscellaneous Managers with 1,336 New Hampshire residents.
Hint: Take PUMS_reduced, pipe it to dplyr::count, and pipe it to dplyr::filter. Remember you can enter more than one variable in the count() function.
PUMS_reduced %>% count(field_of_degree, occupation, sort = TRUE) %>% filter(field_of_degree == "Finance")
## # A tibble: 69 x 3
## field_of_degree occupation n
## <fct> <fct> <int>
## 1 Finance <NA> 20
## 2 Finance Accountants And Auditors 16
## 3 Finance Chief Executives And Legislato 10
## 4 Finance Financial Managers 10
## 5 Finance Miscellaneous Managers, Includ 10
## 6 Finance Personal Financial Advisors 9
## 7 Finance Management Analysts 8
## 8 Finance Bookkeeping, Accounting, And A 6
## 9 Finance Financial Analysts 6
## 10 Finance Sales Representatives, Service 5
## # ... with 59 more rows
The most common occupation in New Hampshire among those who with a finance degree is N/A for the top but the second highest occupation is Accountants And Auditors.
Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.
PUMS_reduced %>%
group_by(occupation) %>%
summarise(median_income = median(income)) %>%
arrange (desc(median_income))
## # A tibble: 469 x 2
## occupation median_income
## <fct> <dbl>
## 1 Dentists 230000
## 2 Physicians And Surgeons 200000
## 3 Petroleum, Mining And Geologic 177500.
## 4 Optometrists 150000
## 5 Air Traffic Controllers And Ai 141000
## 6 Nurse Anesthetists 135000
## 7 Aircraft Pilots And Flight Eng 130000
## 8 "Architectural And Engineering " 125000
## 9 Astronomers And Physicists 125000
## 10 Chemical Engineers 120000
## # ... with 459 more rows
The top occupation in terms of median income is dentists with a median income of $230,000.
Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, pipe it to dplyr::arrange, and pipe it to data.frame().
PUMS_reduced %>%
group_by(field_of_degree) %>%
summarise(median_income = median(income)) %>%
arrange(desc(median_income)) %>%
data.frame()
## field_of_degree median_income
## 1 Petroleum Engineering 188000.0
## 2 Materials Science 154800.0
## 3 Nuclear Engineering 148300.0
## 4 Physical Sciences 129000.0
## 5 Mechanical Engineering Related Technologies 111900.0
## 6 Pharmacy Pharmaceutical Sciences 106700.0
## 7 Biological Engineering 101800.0
## 8 Metallurgical Engineering 99700.0
## 9 Naval Architecture 97000.0
## 10 Electrical Engineering 94000.0
## 11 Computer Engineering 91075.0
## 12 Aerospace Engineering 89200.0
## 13 Information Sciences 88250.0
## 14 Management Information Systems 86650.0
## 15 Physics 86250.0
## 16 Computer Science 86000.0
## 17 Mechanical Engineering 85400.0
## 18 Transportation Sciences 85000.0
## 19 Materials Engineering 84200.0
## 20 Miscellaneous Business 82500.0
## 21 Oceanography 81505.0
## 22 General Engineering 80000.0
## 23 Industrial 76415.0
## 24 Computer Programming 76100.0
## 25 Actuarial Science 76000.0
## 26 Statistics 76000.0
## 27 Miscellaneous Social Sciences 75900.0
## 28 Finance 75000.0
## 29 Engineering Technologies 74040.0
## 30 Operations Logistics 74005.0
## 31 Agricultural Economics 74000.0
## 32 Electrical Engineering Technology 73000.0
## 33 Chemical Engineering 71450.0
## 34 Architectural Engineering 70000.0
## 35 Engineering 70000.0
## 36 Environmental Engineering 70000.0
## 37 Soil Science 70000.0
## 38 Mathematics 69100.0
## 39 Chemistry 69000.0
## 40 Engineering Mechanics Physics 69000.0
## 41 Computer Administration Management 68400.0
## 42 Medical Technologies Technicians 68000.0
## 43 Biochemical Sciences 66000.0
## 44 Economics 66000.0
## 45 Geosciences 66000.0
## 46 Business Economics 65540.0
## 47 Miscellaneous Engineering 65000.0
## 48 Construction Services 64500.0
## 49 Civil Engineering 64410.0
## 50 Physiology 63000.0
## 51 Computer 62140.0
## 52 Geological 59750.0
## 53 Biology 59040.0
## 54 Astronomy 59000.0
## 55 Plant Science 58500.0
## 56 Political Science 57500.0
## 57 Public Administration 57500.0
## 58 Biomedical Engineering 57000.0
## 59 Forestry 56600.0
## 60 Industrial Production Technologies 56400.0
## 61 Nuclear, Industrial Radiology, 56000.0
## 62 School Student Counseling 55750.0
## 63 Accounting 55007.0
## 64 Business Management 55000.0
## 65 Computer Networking 55000.0
## 66 Genetics 55000.0
## 67 Microbiology 55000.0
## 68 Treatment Therapy Professions 55000.0
## 69 United States History 54800.0
## 70 Pharmacology 54500.0
## 71 Humanities 54000.0
## 72 Marketing 54000.0
## 73 Communication Disorders Sciences 53750.0
## 74 Other Foreign Languages 53050.0
## 75 General Business 53010.0
## 76 Geology 52700.0
## 77 Nursing 52150.0
## 78 Multi-Disciplinary Or General Science 50004.0
## 79 Botany 50000.0
## 80 Geography 50000.0
## 81 Health 50000.0
## 82 Medical Assisting Services 50000.0
## 83 Miscellaneous Engineering Technologies 50000.0
## 84 Natural Resources Management 50000.0
## 85 Miscellaneous Education 49800.0
## 86 Human Resources 49500.0
## 87 Physical 49202.0
## 88 Mathematics Teacher Education 49200.0
## 89 Atmospheric Sciences 48835.0
## 90 Public Policy 48600.0
## 91 Educational Psychology 48250.0
## 92 Criminal Justice 48000.0
## 93 Pre-Law 48000.0
## 94 Secondary Teacher Education 48000.0
## 95 Counseling Psychology 47000.0
## 96 International Business 47000.0
## 97 Miscellaneous Biology 47000.0
## 98 History 46100.0
## 99 International Relations 45910.0
## 100 Language 45650.0
## 101 Animal Sciences 45000.0
## 102 French German Latin 45000.0
## 103 Intercultural 45000.0
## 104 Molecular Biology 44000.0
## 105 Mass Media 43105.0
## 106 Environmental Science 43050.0
## 107 Journalism 43000.0
## 108 Miscellaneous Health Medical Professions 43000.0
## 109 Applied Mathematics 42720.0
## 110 Miscellaneous Fine Arts 42500.0
## 111 Science 42500.0
## 112 Social Science Or History Teacher Education 42040.0
## 113 Architecture 42000.0
## 114 Community 41500.0
## 115 Library Science 41500.0
## 116 Philosophy 41400.0
## 117 Human Services 41250.0
## 118 Educational Administration 40900.0
## 119 Special Needs Education 40800.0
## 120 Communications 40610.0
## 121 Sociology 40500.0
## 122 Zoology 40500.0
## 123 Clinical Psychology 40050.0
## 124 Art 40000.0
## 125 Art History 40000.0
## 126 Nutrition Sciences 40000.0
## 127 Physical Fitness Parks Recreation 40000.0
## 128 Psychology 40000.0
## 129 Area Ethnic 39960.0
## 130 English Language 38920.0
## 131 Elementary Education 38600.0
## 132 Agriculture Production 38500.0
## 133 General Social Sciences 38500.0
## 134 Linguistics 38450.0
## 135 Music 38300.0
## 136 Hospitality Management 38005.0
## 137 Anthropology 37000.5
## 138 Film Video 36000.0
## 139 Liberal Arts 36000.0
## 140 Theology 35600.0
## 141 Ecology 35500.0
## 142 General Education 35200.0
## 143 Neuroscience 35000.0
## 144 Interdisciplinary Social Sciences 34260.0
## 145 Miscellaneous Psychology 34100.0
## 146 Advertising 33900.0
## 147 General Agriculture 33350.0
## 148 Family 32700.0
## 149 Cosmetology Services 32500.0
## 150 Communication Technologies 31950.0
## 151 Drama 30500.0
## 152 Social Work 30300.0
## 153 Fine Arts 30000.0
## 154 General Medical 30000.0
## 155 Composition 29200.0
## 156 Military Technologies 29000.0
## 157 Visual 28500.0
## 158 Commercial Art 28400.0
## 159 Early Childhood Education 28000.0
## 160 Teacher Education: Multiple Levels 27400.0
## 161 Studio Arts 27000.0
## 162 Criminology 25000.0
## 163 Multi/Interdisciplinary Studies 23000.0
## 164 Social Psychology 21500.0
## 165 Electrical, Mechanical, 17000.0
## 166 Food Science 16905.0
## 167 Cognitive Science 16500.0
## 168 Miscellaneous Agriculture 14500.0
## 169 <NA> NA
Finance ranks in terms of median income 28th with $75,000