Hello there, below is an example of some basic analyses I was able to perform looking at salary data at the University of North Carolina at Chapel Hill. It was part of the STOR 320 course, which helped me achieve my minor back in college.
Universities are typically opaque, bureaucratic institutions. Many public schools such as the University of North Carolina system make data about university employees publically available. Using this dataset what can we learn about the dynamics of the university system?"
I have removed any direct quotes from my professor, outside of the one above and the exact deliverables for the project.
This dataset comes from Ryan Thornburg’s website, and is from May of 2015
library(tidyverse)
library(dplyr)
library(lubridate)
# Load the UNC departments data
data <- read.csv(url("http://ryanthornburg.com/wp-content/uploads/2015/05/UNC_Salares_NandO_2015-05-06.csv"))
Take a first look at the data
colnames(data)
## [1] "name" "campus" "dept" "position" "exempt2" "employed"
## [7] "hiredate" "fte" "status" "stservyr" "statesal" "nonstsal"
## [13] "totalsal" "age"
str(data)
## 'data.frame': 12287 obs. of 14 variables:
## $ name : chr "AARON, NANCY G" "ABARBANELL, JEFFERY S" "ABARE, BETSY" "ABATE, AARON B" ...
## $ campus : chr "UNC-CH" "UNC-CH" "UNC-CH" "UNC-CH" ...
## $ dept : chr "Romance Languages" "Kenan-Flagler Business School" "Institute of Marine Sciences" "Medicine Administration" ...
## $ position: chr "Senior Lecturer" "Associate Professor" "Research Technician" "Accounting Technician" ...
## $ exempt2 : chr "Exempt" "Exempt" "Subject to State Personnel Act" "Subject to State Personnel Act" ...
## $ employed: int 9 9 12 12 12 12 12 12 12 12 ...
## $ hiredate: int 20030701 19990101 20110912 20090420 20120103 20051003 19960923 20130401 19870101 20120702 ...
## $ fte : num 1 1 1 1 1 1 1 1 1 1 ...
## $ status : chr "Fixed-Term" "Continuing" "Permanent" "Permanent" ...
## $ stservyr: int 11 17 3 5 2 15 34 11 27 2 ...
## $ statesal: int 46350 173000 0 0 41696 56588 41707 0 0 0 ...
## $ nonstsal: int 0 0 38170 50070 0 4412 0 80227 55803 32889 ...
## $ totalsal: int 46350 173000 38170 50070 41696 61000 41707 80227 55803 32889 ...
## $ age : int 55 57 54 29 35 41 62 36 64 26 ...
head(data)
## name campus dept
## 1 AARON, NANCY G UNC-CH Romance Languages
## 2 ABARBANELL, JEFFERY S UNC-CH Kenan-Flagler Business School
## 3 ABARE, BETSY UNC-CH Institute of Marine Sciences
## 4 ABATE, AARON B UNC-CH Medicine Administration
## 5 ABATEMARCO, JODI M UNC-CH School of Education
## 6 ABBOTT-LUNSFORD, SHELBY L UNC-CH Medicine Administration
## position exempt2 employed hiredate
## 1 Senior Lecturer Exempt 9 20030701
## 2 Associate Professor Exempt 9 19990101
## 3 Research Technician Subject to State Personnel Act 12 20110912
## 4 Accounting Technician Subject to State Personnel Act 12 20090420
## 5 Student Services Assistant Subject to State Personnel Act 12 20120103
## 6 HR Consultant Subject to State Personnel Act 12 20051003
## fte status stservyr statesal nonstsal totalsal age
## 1 1 Fixed-Term 11 46350 0 46350 55
## 2 1 Continuing 17 173000 0 173000 57
## 3 1 Permanent 3 0 38170 38170 54
## 4 1 Permanent 5 0 50070 50070 29
## 5 1 Permanent 2 41696 0 41696 35
## 6 1 Permanent 15 56588 4412 61000 41
Return a data frame with columns: name, dept, age,totalsal
data_small <- select(data, name, dept, age, totalsal)
head(data_small)
## name dept age totalsal
## 1 AARON, NANCY G Romance Languages 55 46350
## 2 ABARBANELL, JEFFERY S Kenan-Flagler Business School 57 173000
## 3 ABARE, BETSY Institute of Marine Sciences 54 38170
## 4 ABATE, AARON B Medicine Administration 29 50070
## 5 ABATEMARCO, JODI M School of Education 35 41696
## 6 ABBOTT-LUNSFORD, SHELBY L Medicine Administration 41 61000
Return a data frame with columns: statesal, nonstsal,totalsal using one of the
select_helpersfunctions. If needed, google ‘select_helpers in R’.
data_salary_only <- select(data, ends_with("sal"))
head(data_salary_only)
## statesal nonstsal totalsal
## 1 46350 0 46350
## 2 173000 0 173000
## 3 0 38170 38170
## 4 0 50070 50070
## 5 41696 0 41696
## 6 56588 4412 61000
Rename the
ftecolumn tofulltime. Make sure this change is saved (i.e.data <- ...).
colnames(data)[8] <- "fulltime"
## I found that 'fte' was the 8th unit in the sequence and reassigned the name
colnames(data)
## [1] "name" "campus" "dept" "position" "exempt2" "employed"
## [7] "hiredate" "fulltime" "status" "stservyr" "statesal" "nonstsal"
## [13] "totalsal" "age"
#These are to double check it was all saved in the data frame "data"
summary(data)
## name campus dept position
## Length:12287 Length:12287 Length:12287 Length:12287
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## exempt2 employed hiredate fulltime
## Length:12287 Min. : 9.00 Min. :19570901 Min. :0.00
## Class :character 1st Qu.:12.00 1st Qu.:20010701 1st Qu.:1.00
## Mode :character Median :12.00 Median :20070901 Median :1.00
## Mean :11.64 Mean :20050068 Mean :0.97
## 3rd Qu.:12.00 3rd Qu.:20120618 3rd Qu.:1.00
## Max. :12.00 Max. :20140929 Max. :1.00
## status stservyr statesal nonstsal
## Length:12287 Min. : 0.00 Min. : 0 Min. : 0
## Class :character 1st Qu.: 4.00 1st Qu.: 0 1st Qu.: 0
## Mode :character Median : 9.00 Median : 2696 Median : 38239
## Mean :10.68 Mean : 31372 Mean : 49567
## 3rd Qu.:15.00 3rd Qu.: 50000 3rd Qu.: 64763
## Max. :57.00 Max. :520000 Max. :625686
## totalsal age
## Min. : 3000 Min. :21.00
## 1st Qu.: 41884 1st Qu.:37.00
## Median : 59342 Median :46.00
## Mean : 80938 Mean :46.32
## 3rd Qu.: 92865 3rd Qu.:56.00
## Max. :819069 Max. :89.00
What is the mean salary in the Neurosurgery department?
by_dept <- group_by(data_small, dept)
by_dept <- filter(by_dept, dept == "Neurosurgery")
by_dept
## # A tibble: 17 x 4
## # Groups: dept [1]
## name dept age totalsal
## <chr> <chr> <int> <int>
## 1 BHOWMICK, DEB Neurosurgery 36 345100
## 2 CAMPBELL, DENNIS M Neurosurgery 34 621722
## 3 CARSON, LARRY V Neurosurgery 63 621722
## 4 DELAMETTER, GRETCHEN L Neurosurgery 56 100745
## 5 ELTON, SCOTT W Neurosurgery 47 400000
## 6 EWEND, MATTHEW G Neurosurgery 50 607648
## 7 FREUND, VICTOR T Neurosurgery 47 423373
## 8 HADAR, ELDAD J Neurosurgery 51 395550
## 9 JAIKUMAR, SIVAKUMAR Neurosurgery 45 360325
## 10 JAUFMANN, BRUCE P Neurosurgery 56 621722
## 11 KILPATRICK, MICHAUX R Neurosurgery 43 621722
## 12 PILLSBURY, MATTHEW C Neurosurgery 35 52713
## 13 SASAKI-ADAMS, DEANNA M Neurosurgery 38 345100
## 14 SHARPLESS, JULIE L Neurosurgery 51 162906
## 15 WADON, CAROL M Neurosurgery 56 505390
## 16 WATRAL, MELODY A Neurosurgery 59 100250
## 17 WU, JING Neurosurgery 45 175000
avg_sal_by_dept <- summarise(by_dept,
avg_sal = mean(totalsal, na.rm = TRUE))
avg_sal_by_dept
## # A tibble: 1 x 2
## dept avg_sal
## <chr> <dbl>
## 1 Neurosurgery 380058.
according to the table, the average salary for the Neurosurgery dept is: 380058.12
Return a data frame with employee’s in the Neurosurgery department making more than $500,000. Why might these professors be so well paid?
Neurosurgery <- select(data, name, dept, totalsal)
Neurosurgery <- filter(Neurosurgery, dept == "Neurosurgery" & totalsal >= 500000)
Neurosurgery
## name dept totalsal
## 1 CAMPBELL, DENNIS M Neurosurgery 621722
## 2 CARSON, LARRY V Neurosurgery 621722
## 3 EWEND, MATTHEW G Neurosurgery 607648
## 4 JAUFMANN, BRUCE P Neurosurgery 621722
## 5 KILPATRICK, MICHAUX R Neurosurgery 621722
## 6 WADON, CAROL M Neurosurgery 505390
These professors are likely so well paid because of the high risk that’s involved with brain surgery and the years of practice it takes before one can actually practice in the field
What is the total amount that full time Dermatology employees get paid?
Dermatology <- select(data, name, dept, fulltime, totalsal)
Dermatology <- filter(Dermatology, dept == "Dermatology" & fulltime == TRUE)
sum(Dermatology$totalsal)
## [1] 5272098
nrow(Dermatology)
## [1] 35
median(Dermatology$totalsal)
## [1] 85385
mean(Dermatology$totalsal)
## [1] 150631.4
ggplot(data = Dermatology, aes(x=totalsal))+geom_histogram(bins=20)
There are 35 full-time dermatology employees, and their total salary is $537,2098 The Median salary is: $85,385 The Mean salary is: $150,631.4
This means there is definitely a huge skew with a number of Dermatology faculty and staff making much more than others.
Create a data frame called radio_dept whose rows are the employees from the Radiology department. - include only the following columns: name, position, age, nonstsal, totalsal. - order the employees by salary First without pipes
radio_dept <- filter(data, dept == "Radiology")
radio_dept <- select(radio_dept, name, position, age, nonstsal, totalsal)
radio_dept <- arrange(radio_dept, desc(totalsal))
radio_dept
## name position age nonstsal
## 1 MAURO, MATTHEW A DIRECTOR 63 614176
## 2 LEE, JOSEPH K Professor 67 375000
## 3 BURKE, CHARLES T Clinical Associate Professor 44 365000
## 4 MOLINA, PAUL L Professor 56 334255
## 5 STAVAS, JOSEPH M Clinical Professor 59 345000
## 6 DIXON, ROBERT G Clinical Associate Professor 55 335000
## 7 CASTILLO, MAURICIO Professor 55 316255
## 8 SEMELKA, RICHARD C Professor 54 306255
## 9 SMITH, J K Professor with Tenure 52 292187
## 10 FIELDING, JULIA R Associate Professor 53 294005
## 11 RENNER, JORDAN B Professor 59 300000
## 12 WONG, TERENCE Z Professor 59 224255
## 13 SOLANDER, STEN Y Clinical Associate Professor 55 279255
## 14 WARSHAUER, DAVID M Professor 61 271995
## 15 LIN, WEILI Dixie Lee Boney Soo Professor 50 263805
## 16 FORDHAM, LYNN A Associate Professor 51 269255
## 17 CLARKE, JOHN P Clinical Associate Professor 63 275000
## 18 JEWELLS, VALERIE L Clinical Associate Professor 53 259255
## 19 CHONG, WUI K Clinical Associate Professor 57 254255
## 20 HYSLOP, WILLIAM B Clinical Associate Professor 54 254255
## 21 KUZMIAK, CHERIE M Associate Professor 46 254255
## 22 KOOMEN, MARCIA A Clinical Associate Professor 66 244255
## 23 BIRCHARD, KATHERINE R Clinical Assistant Professor 40 234255
## 24 HUANG, BENJAMIN Y Assistant Professor 41 239255
## 25 KIM, KYUNG Clinical Assistant Professor 43 251304
## 26 YU, HYEON Clinical Assistant Professor 47 251000
## 27 LURY, KENNETH M Clinical Assist. Prof. 60 234255
## 28 NISSMAN, DANIEL B Clinical Assistant Professor 46 199554
## 29 PARKER, LEONARD A Associate Professor 70 234255
## 30 SHEN, DINGGANG Professor with Tenure 45 174411
## 31 ISAACSON, ARI J Clinical Assistant Professor 36 240000
## 32 JORDAN, SHERYL G Clinical Associate Professor 55 219255
## 33 KHANDANI, AMIR H Associate Professor with Tenure 50 214255
## 34 LEE, SHEILA S Clinical Assistant Professor 38 230000
## 35 BURKE, LAUREN M Clinical Assistant Professor 33 204255
## 36 HARTMAN, HEIDI Clinical Assistant Professor 33 204255
## 37 LEE, YUEH Z Assistant Professor 41 134378
## 38 MEHTA, NISHA Clinical Assistant Professor 32 220000
## 39 NORTHAM, MEREDITH C Clinical Assistant Professor 33 220000
## 40 SAMS, CASSANDRA M Clinical Assistant Professor 33 220000
## 41 HAN, TAE IL Clinical Assistant Professor 48 180969
## 42 COLLICHIO, ROBERT J Assoc Chair for Admin/Radiology 61 181064
## 43 HEYNEMAN, LAURA E Clinical Associate Professor 51 149191
## 44 LI, ZIBO Associate Professor 36 150000
## 45 IVANOVIC, MARIJA Clinical Associate Professor 62 149968
## 46 SMITH, H. E Research Associate Professor 40 72500
## 47 MCCARTNEY, WILLIAM H Professor 69 134627
## 48 WILCOX, CLAIRE B Clinical Associate Professor 68 0
## 49 LEE, ELLIE R Clinical Assistant Professor 44 108191
## 50 CRAWFORD, THOMAS J Systems Specialist 55 114698
## 51 HENDERSON, LOUISE M Assistant Professor 40 109900
## 52 ALVAREZ, HORTENSIA Clinical Professor 57 109000
## 53 BOUGHTON, DANIEL J Business Officer 41 96717
## 54 AN, HONGYU Assistant Professor 45 100000
## 55 PARROTT, MATTHEW C Assistant Professor 37 93965
## 56 SHEIKH, ARIF Clinical Assistant Professor 46 81755
## 57 YUAN, HONG Research Assistant Professor 41 85000
## 58 GAO, WEI Assistant Professor 32 29000
## 59 WU, ZHANHONG Research Assistant Professor 39 0
## 60 YAP, PEW THIAN Assistant Professor 36 26667
## 61 BENEFIELD, THAD S Statistician 38 70840
## 62 PETRIN, FERNAND H Business Systems Analyst 57 68336
## 63 HOLLAND, VICKIE E HR Associate 49 44866
## 64 CREIGHTON, ANGELA H Contracts/Grants Manager 43 61719
## 65 AKER, DIXIE K Systems Analyst 45 60000
## 66 USSERY, LISA A Accounting Manager 50 58602
## 67 KIRK, SHANAH R Research Specialist 50 55940
## 68 RAMALHO, JORGE MIGUEL P Research Instructor 40 0
## 69 SHI, FENG Postdoctoral Research Associate 34 55000
## 70 WU, GUORONG POST-DOC RES ASSOC 36 55000
## 71 BOWEN, ELIZABETH A Executive Assistant 44 53987
## 72 MARSH, MARY W Research Associate 27 50000
## 73 STEED, DOREEN Research Mammographer 50 48564
## 74 NESBITT, ANNE Admin Support Specialist 50 48446
## 75 PRICE, CHERIE L HR Associate 57 48446
## 76 CLARK, MICHELE L Admin Support Specialist 45 48349
## 77 KNOP, GABRIEL F Social/Clinical Research Spec. 30 45000
## 78 ARMAO, DIANE M Research Instructor 59 43546
## 79 BOOMHOWER, JEREMY D Admin. Support Associate 38 42593
## 80 CARVER, VIRGINIA B Admin. Support Associate 39 42593
## 81 HAUSER, JASON M Admin. Support Associate 41 42593
## 82 HARTMAN, TERRY S Social/Clinical Research Asst. 26 42168
## 83 MELVILLE, WILMA C Administrative Secretary II 58 41789
## 84 BARBAL, ISABEL Admin. Support Associate 57 40061
## 85 PENDER, JENNIFER L Accounting Technician 39 37690
## 86 BIRDSONG, LAURIE B Public Communications Specialist 40 37681
## 87 FISCHER, MICHELLE C Admin. Support Associate 25 37142
## 88 HOOTS, TIFFANY N Social/Clinical Research Asst. 31 36360
## totalsal
## 1 614176
## 2 375000
## 3 365000
## 4 350000
## 5 345000
## 6 335000
## 7 332000
## 8 322000
## 9 310000
## 10 309750
## 11 300000
## 12 300000
## 13 295000
## 14 295000
## 15 286540
## 16 285000
## 17 275000
## 18 275000
## 19 270000
## 20 270000
## 21 270000
## 22 260000
## 23 255000
## 24 255000
## 25 251304
## 26 251000
## 27 250000
## 28 250000
## 29 250000
## 30 250000
## 31 240000
## 32 235000
## 33 230000
## 34 230000
## 35 220000
## 36 220000
## 37 220000
## 38 220000
## 39 220000
## 40 220000
## 41 196714
## 42 191064
## 43 161000
## 44 150000
## 45 149968
## 46 145000
## 47 142500
## 48 131250
## 49 120000
## 50 114698
## 51 109900
## 52 109000
## 53 101602
## 54 100000
## 55 98910
## 56 97500
## 57 85000
## 58 80000
## 59 80000
## 60 80000
## 61 70840
## 62 68336
## 63 62350
## 64 61719
## 65 60000
## 66 58602
## 67 55940
## 68 55000
## 69 55000
## 70 55000
## 71 53987
## 72 50000
## 73 48564
## 74 48446
## 75 48446
## 76 48349
## 77 45000
## 78 43546
## 79 42593
## 80 42593
## 81 42593
## 82 42168
## 83 41789
## 84 40061
## 85 37690
## 86 37681
## 87 37142
## 88 36360
Next with pipes
radio_dept_2 <- data %>%
filter(dept == "Radiology") %>%
select(name, position, age, nonstsal, totalsal) %>%
arrange(desc(totalsal))
radio_dept_2
## name position age nonstsal
## 1 MAURO, MATTHEW A DIRECTOR 63 614176
## 2 LEE, JOSEPH K Professor 67 375000
## 3 BURKE, CHARLES T Clinical Associate Professor 44 365000
## 4 MOLINA, PAUL L Professor 56 334255
## 5 STAVAS, JOSEPH M Clinical Professor 59 345000
## 6 DIXON, ROBERT G Clinical Associate Professor 55 335000
## 7 CASTILLO, MAURICIO Professor 55 316255
## 8 SEMELKA, RICHARD C Professor 54 306255
## 9 SMITH, J K Professor with Tenure 52 292187
## 10 FIELDING, JULIA R Associate Professor 53 294005
## 11 RENNER, JORDAN B Professor 59 300000
## 12 WONG, TERENCE Z Professor 59 224255
## 13 SOLANDER, STEN Y Clinical Associate Professor 55 279255
## 14 WARSHAUER, DAVID M Professor 61 271995
## 15 LIN, WEILI Dixie Lee Boney Soo Professor 50 263805
## 16 FORDHAM, LYNN A Associate Professor 51 269255
## 17 CLARKE, JOHN P Clinical Associate Professor 63 275000
## 18 JEWELLS, VALERIE L Clinical Associate Professor 53 259255
## 19 CHONG, WUI K Clinical Associate Professor 57 254255
## 20 HYSLOP, WILLIAM B Clinical Associate Professor 54 254255
## 21 KUZMIAK, CHERIE M Associate Professor 46 254255
## 22 KOOMEN, MARCIA A Clinical Associate Professor 66 244255
## 23 BIRCHARD, KATHERINE R Clinical Assistant Professor 40 234255
## 24 HUANG, BENJAMIN Y Assistant Professor 41 239255
## 25 KIM, KYUNG Clinical Assistant Professor 43 251304
## 26 YU, HYEON Clinical Assistant Professor 47 251000
## 27 LURY, KENNETH M Clinical Assist. Prof. 60 234255
## 28 NISSMAN, DANIEL B Clinical Assistant Professor 46 199554
## 29 PARKER, LEONARD A Associate Professor 70 234255
## 30 SHEN, DINGGANG Professor with Tenure 45 174411
## 31 ISAACSON, ARI J Clinical Assistant Professor 36 240000
## 32 JORDAN, SHERYL G Clinical Associate Professor 55 219255
## 33 KHANDANI, AMIR H Associate Professor with Tenure 50 214255
## 34 LEE, SHEILA S Clinical Assistant Professor 38 230000
## 35 BURKE, LAUREN M Clinical Assistant Professor 33 204255
## 36 HARTMAN, HEIDI Clinical Assistant Professor 33 204255
## 37 LEE, YUEH Z Assistant Professor 41 134378
## 38 MEHTA, NISHA Clinical Assistant Professor 32 220000
## 39 NORTHAM, MEREDITH C Clinical Assistant Professor 33 220000
## 40 SAMS, CASSANDRA M Clinical Assistant Professor 33 220000
## 41 HAN, TAE IL Clinical Assistant Professor 48 180969
## 42 COLLICHIO, ROBERT J Assoc Chair for Admin/Radiology 61 181064
## 43 HEYNEMAN, LAURA E Clinical Associate Professor 51 149191
## 44 LI, ZIBO Associate Professor 36 150000
## 45 IVANOVIC, MARIJA Clinical Associate Professor 62 149968
## 46 SMITH, H. E Research Associate Professor 40 72500
## 47 MCCARTNEY, WILLIAM H Professor 69 134627
## 48 WILCOX, CLAIRE B Clinical Associate Professor 68 0
## 49 LEE, ELLIE R Clinical Assistant Professor 44 108191
## 50 CRAWFORD, THOMAS J Systems Specialist 55 114698
## 51 HENDERSON, LOUISE M Assistant Professor 40 109900
## 52 ALVAREZ, HORTENSIA Clinical Professor 57 109000
## 53 BOUGHTON, DANIEL J Business Officer 41 96717
## 54 AN, HONGYU Assistant Professor 45 100000
## 55 PARROTT, MATTHEW C Assistant Professor 37 93965
## 56 SHEIKH, ARIF Clinical Assistant Professor 46 81755
## 57 YUAN, HONG Research Assistant Professor 41 85000
## 58 GAO, WEI Assistant Professor 32 29000
## 59 WU, ZHANHONG Research Assistant Professor 39 0
## 60 YAP, PEW THIAN Assistant Professor 36 26667
## 61 BENEFIELD, THAD S Statistician 38 70840
## 62 PETRIN, FERNAND H Business Systems Analyst 57 68336
## 63 HOLLAND, VICKIE E HR Associate 49 44866
## 64 CREIGHTON, ANGELA H Contracts/Grants Manager 43 61719
## 65 AKER, DIXIE K Systems Analyst 45 60000
## 66 USSERY, LISA A Accounting Manager 50 58602
## 67 KIRK, SHANAH R Research Specialist 50 55940
## 68 RAMALHO, JORGE MIGUEL P Research Instructor 40 0
## 69 SHI, FENG Postdoctoral Research Associate 34 55000
## 70 WU, GUORONG POST-DOC RES ASSOC 36 55000
## 71 BOWEN, ELIZABETH A Executive Assistant 44 53987
## 72 MARSH, MARY W Research Associate 27 50000
## 73 STEED, DOREEN Research Mammographer 50 48564
## 74 NESBITT, ANNE Admin Support Specialist 50 48446
## 75 PRICE, CHERIE L HR Associate 57 48446
## 76 CLARK, MICHELE L Admin Support Specialist 45 48349
## 77 KNOP, GABRIEL F Social/Clinical Research Spec. 30 45000
## 78 ARMAO, DIANE M Research Instructor 59 43546
## 79 BOOMHOWER, JEREMY D Admin. Support Associate 38 42593
## 80 CARVER, VIRGINIA B Admin. Support Associate 39 42593
## 81 HAUSER, JASON M Admin. Support Associate 41 42593
## 82 HARTMAN, TERRY S Social/Clinical Research Asst. 26 42168
## 83 MELVILLE, WILMA C Administrative Secretary II 58 41789
## 84 BARBAL, ISABEL Admin. Support Associate 57 40061
## 85 PENDER, JENNIFER L Accounting Technician 39 37690
## 86 BIRDSONG, LAURIE B Public Communications Specialist 40 37681
## 87 FISCHER, MICHELLE C Admin. Support Associate 25 37142
## 88 HOOTS, TIFFANY N Social/Clinical Research Asst. 31 36360
## totalsal
## 1 614176
## 2 375000
## 3 365000
## 4 350000
## 5 345000
## 6 335000
## 7 332000
## 8 322000
## 9 310000
## 10 309750
## 11 300000
## 12 300000
## 13 295000
## 14 295000
## 15 286540
## 16 285000
## 17 275000
## 18 275000
## 19 270000
## 20 270000
## 21 270000
## 22 260000
## 23 255000
## 24 255000
## 25 251304
## 26 251000
## 27 250000
## 28 250000
## 29 250000
## 30 250000
## 31 240000
## 32 235000
## 33 230000
## 34 230000
## 35 220000
## 36 220000
## 37 220000
## 38 220000
## 39 220000
## 40 220000
## 41 196714
## 42 191064
## 43 161000
## 44 150000
## 45 149968
## 46 145000
## 47 142500
## 48 131250
## 49 120000
## 50 114698
## 51 109900
## 52 109000
## 53 101602
## 54 100000
## 55 98910
## 56 97500
## 57 85000
## 58 80000
## 59 80000
## 60 80000
## 61 70840
## 62 68336
## 63 62350
## 64 61719
## 65 60000
## 66 58602
## 67 55940
## 68 55000
## 69 55000
## 70 55000
## 71 53987
## 72 50000
## 73 48564
## 74 48446
## 75 48446
## 76 48349
## 77 45000
## 78 43546
## 79 42593
## 80 42593
## 81 42593
## 82 42168
## 83 41789
## 84 40061
## 85 37690
## 86 37681
## 87 37142
## 88 36360
Make a histogram of Radiology salaries
ggplot(data = radio_dept) +
geom_histogram(mapping = aes(x = totalsal))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Create a data frame called
dept_summarywhose rows are the departments and whose columns are: department size, mean department salary, median department salary, and maximum salary (using totalsal for salary).
step1<- data %>%
arrange(dept) %>%
group_by(dept)
dept_summary <- summarise(step1,
mean_dept_sal = mean(totalsal),
median_dept_sal = median(totalsal),
maximum_dept_sal = max(totalsal),
total = n())
dept_summary
## # A tibble: 304 x 5
## dept mean_dept_sal median_dept_sal maximum_dept_sal total
## <chr> <dbl> <dbl> <int> <int>
## 1 Acad Sup Prog Student-A~ 55798. 50600 115000 15
## 2 Academic Advising 49985. 45000 109625 42
## 3 Accounting Services 57417. 59342 103306 17
## 4 Ackland Art Museum 51543. 41000 140050 19
## 5 Admissions 57487. 49000 195700 46
## 6 African Studies Center 35970 35970 43475 2
## 7 African, Afri-Amer & Di~ 65170. 68000 135608 23
## 8 AHEC Support-Comm Med C~ 69789. 64533 135193 26
## 9 Airport 47351 47351 47351 1
## 10 Alcohol Studies Center 49232. 49180. 84685 16
## # ... with 294 more rows
Order the departments by highest mean salary and print the 10 highest paid departments.
high_mean <- arrange(dept_summary, desc(mean_dept_sal))
head(high_mean)
## # A tibble: 6 x 5
## dept mean_dept_sal median_dept_sal maximum_dept_sal total
## <chr> <dbl> <dbl> <int> <int>
## 1 Neurosurgery 380058. 395550 621722 17
## 2 Provost 273790 240080 445000 4
## 3 Urology 216291. 237500 520569 20
## 4 Orthopaedics 216205. 240000 554559 31
## 5 Surgery 201917. 176083 677310 121
## 6 Anesthesiology 187177. 222645 620040 94
Order the departments by highest median salary and print the 10 highest paid departments.
high_median <- arrange(dept_summary, desc(median_dept_sal))
head(high_median)
## # A tibble: 6 x 5
## dept mean_dept_sal median_dept_sal maximum_dept_sal total
## <chr> <dbl> <dbl> <int> <int>
## 1 Neurosurgery 380058. 395550 621722 17
## 2 Provost 273790 240080 445000 4
## 3 Orthopaedics 216205. 240000 554559 31
## 4 Urology 216291. 237500 520569 20
## 5 Anesthesiology 187177. 222645 620040 94
## 6 Carolina Counts 182160 182160 182160 1
How many departments have at least 10 employees?
dept_summary_employees <- data %>%
group_by(dept) %>%
tally()%>%
filter(n >= 10)
dept_summary_employees
## # A tibble: 194 x 2
## dept n
## <chr> <int>
## 1 Acad Sup Prog Student-Athletes 15
## 2 Academic Advising 42
## 3 Accounting Services 17
## 4 Ackland Art Museum 19
## 5 Admissions 46
## 6 African, Afri-Amer & Diaspora 23
## 7 AHEC Support-Comm Med Care 26
## 8 Alcohol Studies Center 16
## 9 Allied Health Sciences 101
## 10 American Studies 18
## # ... with 184 more rows
There are 194 deptartments with at least 10 employees.
Which department hired the most people in 2010? Hint: you probably have to modify
hiredate.
data_2010 <- mutate(data, hiredate = parse_date_time(hiredate, "%Y%m%d")) %>%
filter(year(hiredate) == 2010) %>%
arrange(dept) %>%
group_by(dept)
nrow(data_2010)
## [1] 645
There were 645 employees hired in 2010
Make a list of all the department names and sort this list alphabetically. What is the 42nd department in this list?
department_names <- data %>%
select(dept) %>%
arrange(dept)
head(department_names)
## dept
## 1 Acad Sup Prog Student-Athletes
## 2 Acad Sup Prog Student-Athletes
## 3 Acad Sup Prog Student-Athletes
## 4 Acad Sup Prog Student-Athletes
## 5 Acad Sup Prog Student-Athletes
## 6 Acad Sup Prog Student-Athletes
department_names <- data %>%
select(dept) %>%
arrange(dept) %>%
slice(42)#hint given in class
department_names
## dept
## 1 Academic Advising
Plot number of people hired by the CS dept per year vs. year
data <- mutate(data, hiredate = parse_date_time(hiredate, "%Y%m%d"))
data_expand <- mutate(data, year = year(hiredate))
CS_data <- data_expand %>%
filter(dept == "Computer Science") %>%
group_by(year) %>%
summarise(n = n())
CS_data
## # A tibble: 27 x 2
## year n
## <dbl> <int>
## 1 1964 1
## 2 1967 1
## 3 1978 1
## 4 1981 1
## 5 1984 1
## 6 1985 1
## 7 1987 1
## 8 1989 2
## 9 1991 3
## 10 1993 3
## # ... with 17 more rows
ggplot(data = CS_data)+
geom_point(mapping = aes(x = year, y = n)) +
geom_smooth(mapping = aes(x= year, y = n), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
It looks as if there was a spike in hires in the early nineties, with an exponential growth starting in mid 2000s.
Now add STOR, Math, Biostatistics, SILS and Physics to the above plot
CS_data_exp <- data_expand %>%
group_by(dept, year) %>%
select(dept, year, position) %>%
filter(dept == "Computer Science" | dept == "School of Info & Libr Science" | dept == "Biostatistics"| dept == "Mathematics"| dept == "Physics-Astronomy"| dept == "Statistics and Operations Res") %>%
arrange(dept) %>%
tally()
CS_data_exp
## # A tibble: 149 x 3
## # Groups: dept [6]
## dept year n
## <chr> <dbl> <int>
## 1 Biostatistics 1967 2
## 2 Biostatistics 1972 1
## 3 Biostatistics 1973 1
## 4 Biostatistics 1980 1
## 5 Biostatistics 1981 1
## 6 Biostatistics 1982 1
## 7 Biostatistics 1985 2
## 8 Biostatistics 1987 1
## 9 Biostatistics 1990 4
## 10 Biostatistics 1991 2
## # ... with 139 more rows
ggplot(data = CS_data_exp) +
geom_point(mapping = aes(x = year , y = n, color = dept))
#Self-Guided Exploration
This was the part of the project where we basically made our own analyses. My writing has greatly improved from those days haha
age_and_sal <- data %>%
filter(dept == "Computer Science" | dept == "School of Info & Libr Science" | dept == "Biostatistics"| dept == "Mathematics"| dept == "Physics-Astronomy"| dept == "Statistics and Operations Res") %>%
select(age, dept, totalsal, hiredate) %>%
group_by(dept) %>%
summarise(avg_total_sal = mean(totalsal),
avg_age = mean(age))
age_and_sal
## # A tibble: 6 x 3
## dept avg_total_sal avg_age
## <chr> <dbl> <dbl>
## 1 Biostatistics 102439. 47.8
## 2 Computer Science 103307. 47.3
## 3 Mathematics 93061. 49.7
## 4 Physics-Astronomy 86372. 47.1
## 5 School of Info & Libr Science 82971. 47.4
## 6 Statistics and Operations Res 100471. 50.2
ggplot(data = age_and_sal) +
geom_point(mapping = aes(x=avg_age, y = avg_total_sal, color = dept))
The above scatterplot shows the relationship between average age and average salary by the six departments of Computer Science, Biostatistics, Mathematics, Physics, SILS, and STOR at UNC. It seems to be the case that the faculty in the Mathematics and STOR departments tend to be significantly older than the rest. This could be because of the number of younger individuals that come to study Information and Library Science, Physics, Biostatistics or Computer Science at UNC, thus skewing their average. These departments may have more graduate student employees. It could also be reflective of the STOR and Mathematics departments requiring more detail-oriented work in their field, thus taking more time to receive a position.
Psychology <- data %>%
filter(dept == "Psychology") %>%
select(age, position, totalsal)%>%
arrange(desc(totalsal))
head(Psychology)
## age position totalsal
## 1 56 DIRECTOR 196415
## 2 50 Kenan Distinguished Professor 171899
## 3 50 Professor 165000
## 4 51 Professor 160000
## 5 45 Professor 155000
## 6 65 Distinguished Professor 152110
nrow(Psychology)
## [1] 89
ggplot(data = Psychology)+
geom_point(mapping = aes(x = age, y = totalsal))+
geom_smooth(mapping = aes(x = age, y = totalsal))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The above graph shows within the psychology department one’s age generally has an effect on how much they are paid. However, looking at the curve I can see that the standard error is fairly high, showing that age may not necessarily be a factor in one’s salary. This leads me to question whether or not the position has a factor on how much someone makes in psychology.
UNC_age_sal <- data %>%
select(age, totalsal)
ggplot(data = UNC_age_sal)+
geom_point(mapping = aes(x = age, y = totalsal))+
geom_smooth(mapping = aes(x = age, y = totalsal))
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
The above graph shows the relationship between a person’s age and their total salary across UNC as a whole. This graph leads me to believe that there is no true relationship between age and salary across UNC. There are many other factors that influence what salary an employee of the university is likely to receive. It seems that factors such as position, department, and fulltime status show a much stronger representation of what causes an individual to have a higher salary at UNC. The questions asked in the problem set are certainly more important for understanding what type of salary a UNC employee receives.