Homework Assignment from College

Hello there, below is an example of some basic analyses I was able to perform looking at salary data at the University of North Carolina at Chapel Hill. It was part of the STOR 320 course, which helped me achieve my minor back in college.

The Premise:

Universities are typically opaque, bureaucratic institutions. Many public schools such as the University of North Carolina system make data about university employees publically available. Using this dataset what can we learn about the dynamics of the university system?"

I have removed any direct quotes from my professor, outside of the one above and the exact deliverables for the project.

The data

This dataset comes from Ryan Thornburg’s website, and is from May of 2015

library(tidyverse)
library(dplyr)
library(lubridate)

# Load the UNC departments data
data <- read.csv(url("http://ryanthornburg.com/wp-content/uploads/2015/05/UNC_Salares_NandO_2015-05-06.csv"))

Take a first look at the data

colnames(data)
##  [1] "name"     "campus"   "dept"     "position" "exempt2"  "employed"
##  [7] "hiredate" "fte"      "status"   "stservyr" "statesal" "nonstsal"
## [13] "totalsal" "age"
str(data)
## 'data.frame':    12287 obs. of  14 variables:
##  $ name    : chr  "AARON, NANCY G" "ABARBANELL, JEFFERY S" "ABARE, BETSY" "ABATE, AARON B" ...
##  $ campus  : chr  "UNC-CH" "UNC-CH" "UNC-CH" "UNC-CH" ...
##  $ dept    : chr  "Romance Languages" "Kenan-Flagler Business School" "Institute of Marine Sciences" "Medicine Administration" ...
##  $ position: chr  "Senior Lecturer" "Associate Professor" "Research Technician" "Accounting Technician" ...
##  $ exempt2 : chr  "Exempt" "Exempt" "Subject to State Personnel Act" "Subject to State Personnel Act" ...
##  $ employed: int  9 9 12 12 12 12 12 12 12 12 ...
##  $ hiredate: int  20030701 19990101 20110912 20090420 20120103 20051003 19960923 20130401 19870101 20120702 ...
##  $ fte     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ status  : chr  "Fixed-Term" "Continuing" "Permanent" "Permanent" ...
##  $ stservyr: int  11 17 3 5 2 15 34 11 27 2 ...
##  $ statesal: int  46350 173000 0 0 41696 56588 41707 0 0 0 ...
##  $ nonstsal: int  0 0 38170 50070 0 4412 0 80227 55803 32889 ...
##  $ totalsal: int  46350 173000 38170 50070 41696 61000 41707 80227 55803 32889 ...
##  $ age     : int  55 57 54 29 35 41 62 36 64 26 ...
head(data)
##                        name campus                          dept
## 1            AARON, NANCY G UNC-CH             Romance Languages
## 2     ABARBANELL, JEFFERY S UNC-CH Kenan-Flagler Business School
## 3              ABARE, BETSY UNC-CH  Institute of Marine Sciences
## 4            ABATE, AARON B UNC-CH       Medicine Administration
## 5        ABATEMARCO, JODI M UNC-CH           School of Education
## 6 ABBOTT-LUNSFORD, SHELBY L UNC-CH       Medicine Administration
##                     position                        exempt2 employed hiredate
## 1            Senior Lecturer                         Exempt        9 20030701
## 2        Associate Professor                         Exempt        9 19990101
## 3        Research Technician Subject to State Personnel Act       12 20110912
## 4      Accounting Technician Subject to State Personnel Act       12 20090420
## 5 Student Services Assistant Subject to State Personnel Act       12 20120103
## 6              HR Consultant Subject to State Personnel Act       12 20051003
##   fte     status stservyr statesal nonstsal totalsal age
## 1   1 Fixed-Term       11    46350        0    46350  55
## 2   1 Continuing       17   173000        0   173000  57
## 3   1  Permanent        3        0    38170    38170  54
## 4   1  Permanent        5        0    50070    50070  29
## 5   1  Permanent        2    41696        0    41696  35
## 6   1  Permanent       15    56588     4412    61000  41

Assignment

Question 1

Return a data frame with columns: name, dept, age,totalsal

data_small <- select(data, name, dept, age, totalsal)
head(data_small)
##                        name                          dept age totalsal
## 1            AARON, NANCY G             Romance Languages  55    46350
## 2     ABARBANELL, JEFFERY S Kenan-Flagler Business School  57   173000
## 3              ABARE, BETSY  Institute of Marine Sciences  54    38170
## 4            ABATE, AARON B       Medicine Administration  29    50070
## 5        ABATEMARCO, JODI M           School of Education  35    41696
## 6 ABBOTT-LUNSFORD, SHELBY L       Medicine Administration  41    61000

Return a data frame with columns: statesal, nonstsal,totalsal using one of the select_helpers functions. If needed, google ‘select_helpers in R’.

data_salary_only <- select(data, ends_with("sal"))
head(data_salary_only)
##   statesal nonstsal totalsal
## 1    46350        0    46350
## 2   173000        0   173000
## 3        0    38170    38170
## 4        0    50070    50070
## 5    41696        0    41696
## 6    56588     4412    61000

Question 2

Rename the fte column to fulltime. Make sure this change is saved (i.e. data <- ...).

colnames(data)[8] <- "fulltime"
## I found that 'fte' was the 8th unit in the sequence and reassigned the name
colnames(data)
##  [1] "name"     "campus"   "dept"     "position" "exempt2"  "employed"
##  [7] "hiredate" "fulltime" "status"   "stservyr" "statesal" "nonstsal"
## [13] "totalsal" "age"
#These are to double check it was all saved in the data frame "data"
summary(data)
##      name              campus              dept             position        
##  Length:12287       Length:12287       Length:12287       Length:12287      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    exempt2             employed        hiredate           fulltime   
##  Length:12287       Min.   : 9.00   Min.   :19570901   Min.   :0.00  
##  Class :character   1st Qu.:12.00   1st Qu.:20010701   1st Qu.:1.00  
##  Mode  :character   Median :12.00   Median :20070901   Median :1.00  
##                     Mean   :11.64   Mean   :20050068   Mean   :0.97  
##                     3rd Qu.:12.00   3rd Qu.:20120618   3rd Qu.:1.00  
##                     Max.   :12.00   Max.   :20140929   Max.   :1.00  
##     status             stservyr        statesal         nonstsal     
##  Length:12287       Min.   : 0.00   Min.   :     0   Min.   :     0  
##  Class :character   1st Qu.: 4.00   1st Qu.:     0   1st Qu.:     0  
##  Mode  :character   Median : 9.00   Median :  2696   Median : 38239  
##                     Mean   :10.68   Mean   : 31372   Mean   : 49567  
##                     3rd Qu.:15.00   3rd Qu.: 50000   3rd Qu.: 64763  
##                     Max.   :57.00   Max.   :520000   Max.   :625686  
##     totalsal           age       
##  Min.   :  3000   Min.   :21.00  
##  1st Qu.: 41884   1st Qu.:37.00  
##  Median : 59342   Median :46.00  
##  Mean   : 80938   Mean   :46.32  
##  3rd Qu.: 92865   3rd Qu.:56.00  
##  Max.   :819069   Max.   :89.00

Question 3

What is the mean salary in the Neurosurgery department?

by_dept <- group_by(data_small, dept)
by_dept <- filter(by_dept, dept == "Neurosurgery")
by_dept
## # A tibble: 17 x 4
## # Groups:   dept [1]
##    name                   dept           age totalsal
##    <chr>                  <chr>        <int>    <int>
##  1 BHOWMICK, DEB          Neurosurgery    36   345100
##  2 CAMPBELL, DENNIS M     Neurosurgery    34   621722
##  3 CARSON, LARRY V        Neurosurgery    63   621722
##  4 DELAMETTER, GRETCHEN L Neurosurgery    56   100745
##  5 ELTON, SCOTT W         Neurosurgery    47   400000
##  6 EWEND, MATTHEW G       Neurosurgery    50   607648
##  7 FREUND, VICTOR T       Neurosurgery    47   423373
##  8 HADAR, ELDAD J         Neurosurgery    51   395550
##  9 JAIKUMAR, SIVAKUMAR    Neurosurgery    45   360325
## 10 JAUFMANN, BRUCE P      Neurosurgery    56   621722
## 11 KILPATRICK, MICHAUX R  Neurosurgery    43   621722
## 12 PILLSBURY, MATTHEW C   Neurosurgery    35    52713
## 13 SASAKI-ADAMS, DEANNA M Neurosurgery    38   345100
## 14 SHARPLESS, JULIE L     Neurosurgery    51   162906
## 15 WADON, CAROL M         Neurosurgery    56   505390
## 16 WATRAL, MELODY A       Neurosurgery    59   100250
## 17 WU, JING               Neurosurgery    45   175000
avg_sal_by_dept <- summarise(by_dept, 
                             avg_sal = mean(totalsal, na.rm = TRUE))
avg_sal_by_dept
## # A tibble: 1 x 2
##   dept         avg_sal
##   <chr>          <dbl>
## 1 Neurosurgery 380058.

according to the table, the average salary for the Neurosurgery dept is: 380058.12

Return a data frame with employee’s in the Neurosurgery department making more than $500,000. Why might these professors be so well paid?

Neurosurgery <- select(data, name, dept, totalsal)

Neurosurgery <- filter(Neurosurgery, dept == "Neurosurgery" & totalsal >= 500000)
Neurosurgery
##                    name         dept totalsal
## 1    CAMPBELL, DENNIS M Neurosurgery   621722
## 2       CARSON, LARRY V Neurosurgery   621722
## 3      EWEND, MATTHEW G Neurosurgery   607648
## 4     JAUFMANN, BRUCE P Neurosurgery   621722
## 5 KILPATRICK, MICHAUX R Neurosurgery   621722
## 6        WADON, CAROL M Neurosurgery   505390

These professors are likely so well paid because of the high risk that’s involved with brain surgery and the years of practice it takes before one can actually practice in the field

Question 4

What is the total amount that full time Dermatology employees get paid?

Dermatology <- select(data, name, dept, fulltime, totalsal)
Dermatology <- filter(Dermatology, dept == "Dermatology" & fulltime == TRUE)
sum(Dermatology$totalsal)
## [1] 5272098
nrow(Dermatology)
## [1] 35
median(Dermatology$totalsal)
## [1] 85385
mean(Dermatology$totalsal)
## [1] 150631.4
ggplot(data = Dermatology, aes(x=totalsal))+geom_histogram(bins=20)

There are 35 full-time dermatology employees, and their total salary is $537,2098 The Median salary is: $85,385 The Mean salary is: $150,631.4

This means there is definitely a huge skew with a number of Dermatology faculty and staff making much more than others.

Question 5

Create a data frame called radio_dept whose rows are the employees from the Radiology department. - include only the following columns: name, position, age, nonstsal, totalsal. - order the employees by salary First without pipes

radio_dept <- filter(data, dept == "Radiology")
radio_dept <- select(radio_dept, name, position, age, nonstsal, totalsal)
radio_dept <- arrange(radio_dept, desc(totalsal))
radio_dept
##                       name                         position age nonstsal
## 1         MAURO, MATTHEW A                         DIRECTOR  63   614176
## 2            LEE, JOSEPH K                        Professor  67   375000
## 3         BURKE, CHARLES T     Clinical Associate Professor  44   365000
## 4           MOLINA, PAUL L                        Professor  56   334255
## 5         STAVAS, JOSEPH M               Clinical Professor  59   345000
## 6          DIXON, ROBERT G     Clinical Associate Professor  55   335000
## 7       CASTILLO, MAURICIO                        Professor  55   316255
## 8       SEMELKA, RICHARD C                        Professor  54   306255
## 9               SMITH, J K            Professor with Tenure  52   292187
## 10       FIELDING, JULIA R              Associate Professor  53   294005
## 11        RENNER, JORDAN B                        Professor  59   300000
## 12         WONG, TERENCE Z                        Professor  59   224255
## 13        SOLANDER, STEN Y     Clinical Associate Professor  55   279255
## 14      WARSHAUER, DAVID M                        Professor  61   271995
## 15              LIN, WEILI    Dixie Lee Boney Soo Professor  50   263805
## 16         FORDHAM, LYNN A              Associate Professor  51   269255
## 17          CLARKE, JOHN P     Clinical Associate Professor  63   275000
## 18      JEWELLS, VALERIE L     Clinical Associate Professor  53   259255
## 19            CHONG, WUI K     Clinical Associate Professor  57   254255
## 20       HYSLOP, WILLIAM B     Clinical Associate Professor  54   254255
## 21       KUZMIAK, CHERIE M              Associate Professor  46   254255
## 22        KOOMEN, MARCIA A     Clinical Associate Professor  66   244255
## 23   BIRCHARD, KATHERINE R     Clinical Assistant Professor  40   234255
## 24       HUANG, BENJAMIN Y              Assistant Professor  41   239255
## 25              KIM, KYUNG     Clinical Assistant Professor  43   251304
## 26               YU, HYEON     Clinical Assistant Professor  47   251000
## 27         LURY, KENNETH M           Clinical Assist. Prof.  60   234255
## 28       NISSMAN, DANIEL B     Clinical Assistant Professor  46   199554
## 29       PARKER, LEONARD A              Associate Professor  70   234255
## 30          SHEN, DINGGANG            Professor with Tenure  45   174411
## 31         ISAACSON, ARI J     Clinical Assistant Professor  36   240000
## 32        JORDAN, SHERYL G     Clinical Associate Professor  55   219255
## 33        KHANDANI, AMIR H  Associate Professor with Tenure  50   214255
## 34           LEE, SHEILA S     Clinical Assistant Professor  38   230000
## 35         BURKE, LAUREN M     Clinical Assistant Professor  33   204255
## 36          HARTMAN, HEIDI     Clinical Assistant Professor  33   204255
## 37             LEE, YUEH Z              Assistant Professor  41   134378
## 38            MEHTA, NISHA     Clinical Assistant Professor  32   220000
## 39     NORTHAM, MEREDITH C     Clinical Assistant Professor  33   220000
## 40       SAMS, CASSANDRA M     Clinical Assistant Professor  33   220000
## 41             HAN, TAE IL     Clinical Assistant Professor  48   180969
## 42     COLLICHIO, ROBERT J  Assoc Chair for Admin/Radiology  61   181064
## 43       HEYNEMAN, LAURA E     Clinical Associate Professor  51   149191
## 44                LI, ZIBO              Associate Professor  36   150000
## 45        IVANOVIC, MARIJA     Clinical Associate Professor  62   149968
## 46             SMITH, H. E     Research Associate Professor  40    72500
## 47    MCCARTNEY, WILLIAM H                        Professor  69   134627
## 48        WILCOX, CLAIRE B     Clinical Associate Professor  68        0
## 49            LEE, ELLIE R     Clinical Assistant Professor  44   108191
## 50      CRAWFORD, THOMAS J               Systems Specialist  55   114698
## 51     HENDERSON, LOUISE M              Assistant Professor  40   109900
## 52      ALVAREZ, HORTENSIA               Clinical Professor  57   109000
## 53      BOUGHTON, DANIEL J                 Business Officer  41    96717
## 54              AN, HONGYU              Assistant Professor  45   100000
## 55      PARROTT, MATTHEW C              Assistant Professor  37    93965
## 56            SHEIKH, ARIF     Clinical Assistant Professor  46    81755
## 57              YUAN, HONG     Research Assistant Professor  41    85000
## 58                GAO, WEI              Assistant Professor  32    29000
## 59            WU, ZHANHONG     Research Assistant Professor  39        0
## 60          YAP, PEW THIAN              Assistant Professor  36    26667
## 61       BENEFIELD, THAD S                     Statistician  38    70840
## 62       PETRIN, FERNAND H         Business Systems Analyst  57    68336
## 63       HOLLAND, VICKIE E                     HR Associate  49    44866
## 64     CREIGHTON, ANGELA H         Contracts/Grants Manager  43    61719
## 65           AKER, DIXIE K                  Systems Analyst  45    60000
## 66          USSERY, LISA A               Accounting Manager  50    58602
## 67          KIRK, SHANAH R              Research Specialist  50    55940
## 68 RAMALHO, JORGE MIGUEL P              Research Instructor  40        0
## 69               SHI, FENG  Postdoctoral Research Associate  34    55000
## 70             WU, GUORONG               POST-DOC RES ASSOC  36    55000
## 71      BOWEN, ELIZABETH A              Executive Assistant  44    53987
## 72           MARSH, MARY W               Research Associate  27    50000
## 73           STEED, DOREEN            Research Mammographer  50    48564
## 74           NESBITT, ANNE         Admin Support Specialist  50    48446
## 75         PRICE, CHERIE L                     HR Associate  57    48446
## 76        CLARK, MICHELE L         Admin Support Specialist  45    48349
## 77         KNOP, GABRIEL F   Social/Clinical Research Spec.  30    45000
## 78          ARMAO, DIANE M              Research Instructor  59    43546
## 79     BOOMHOWER, JEREMY D         Admin. Support Associate  38    42593
## 80      CARVER, VIRGINIA B         Admin. Support Associate  39    42593
## 81         HAUSER, JASON M         Admin. Support Associate  41    42593
## 82        HARTMAN, TERRY S   Social/Clinical Research Asst.  26    42168
## 83       MELVILLE, WILMA C      Administrative Secretary II  58    41789
## 84          BARBAL, ISABEL         Admin. Support Associate  57    40061
## 85      PENDER, JENNIFER L            Accounting Technician  39    37690
## 86      BIRDSONG, LAURIE B Public Communications Specialist  40    37681
## 87     FISCHER, MICHELLE C         Admin. Support Associate  25    37142
## 88        HOOTS, TIFFANY N   Social/Clinical Research Asst.  31    36360
##    totalsal
## 1    614176
## 2    375000
## 3    365000
## 4    350000
## 5    345000
## 6    335000
## 7    332000
## 8    322000
## 9    310000
## 10   309750
## 11   300000
## 12   300000
## 13   295000
## 14   295000
## 15   286540
## 16   285000
## 17   275000
## 18   275000
## 19   270000
## 20   270000
## 21   270000
## 22   260000
## 23   255000
## 24   255000
## 25   251304
## 26   251000
## 27   250000
## 28   250000
## 29   250000
## 30   250000
## 31   240000
## 32   235000
## 33   230000
## 34   230000
## 35   220000
## 36   220000
## 37   220000
## 38   220000
## 39   220000
## 40   220000
## 41   196714
## 42   191064
## 43   161000
## 44   150000
## 45   149968
## 46   145000
## 47   142500
## 48   131250
## 49   120000
## 50   114698
## 51   109900
## 52   109000
## 53   101602
## 54   100000
## 55    98910
## 56    97500
## 57    85000
## 58    80000
## 59    80000
## 60    80000
## 61    70840
## 62    68336
## 63    62350
## 64    61719
## 65    60000
## 66    58602
## 67    55940
## 68    55000
## 69    55000
## 70    55000
## 71    53987
## 72    50000
## 73    48564
## 74    48446
## 75    48446
## 76    48349
## 77    45000
## 78    43546
## 79    42593
## 80    42593
## 81    42593
## 82    42168
## 83    41789
## 84    40061
## 85    37690
## 86    37681
## 87    37142
## 88    36360

Next with pipes

radio_dept_2 <- data %>%
  filter(dept == "Radiology") %>%
  select(name, position, age, nonstsal, totalsal) %>%
  arrange(desc(totalsal))
 radio_dept_2
##                       name                         position age nonstsal
## 1         MAURO, MATTHEW A                         DIRECTOR  63   614176
## 2            LEE, JOSEPH K                        Professor  67   375000
## 3         BURKE, CHARLES T     Clinical Associate Professor  44   365000
## 4           MOLINA, PAUL L                        Professor  56   334255
## 5         STAVAS, JOSEPH M               Clinical Professor  59   345000
## 6          DIXON, ROBERT G     Clinical Associate Professor  55   335000
## 7       CASTILLO, MAURICIO                        Professor  55   316255
## 8       SEMELKA, RICHARD C                        Professor  54   306255
## 9               SMITH, J K            Professor with Tenure  52   292187
## 10       FIELDING, JULIA R              Associate Professor  53   294005
## 11        RENNER, JORDAN B                        Professor  59   300000
## 12         WONG, TERENCE Z                        Professor  59   224255
## 13        SOLANDER, STEN Y     Clinical Associate Professor  55   279255
## 14      WARSHAUER, DAVID M                        Professor  61   271995
## 15              LIN, WEILI    Dixie Lee Boney Soo Professor  50   263805
## 16         FORDHAM, LYNN A              Associate Professor  51   269255
## 17          CLARKE, JOHN P     Clinical Associate Professor  63   275000
## 18      JEWELLS, VALERIE L     Clinical Associate Professor  53   259255
## 19            CHONG, WUI K     Clinical Associate Professor  57   254255
## 20       HYSLOP, WILLIAM B     Clinical Associate Professor  54   254255
## 21       KUZMIAK, CHERIE M              Associate Professor  46   254255
## 22        KOOMEN, MARCIA A     Clinical Associate Professor  66   244255
## 23   BIRCHARD, KATHERINE R     Clinical Assistant Professor  40   234255
## 24       HUANG, BENJAMIN Y              Assistant Professor  41   239255
## 25              KIM, KYUNG     Clinical Assistant Professor  43   251304
## 26               YU, HYEON     Clinical Assistant Professor  47   251000
## 27         LURY, KENNETH M           Clinical Assist. Prof.  60   234255
## 28       NISSMAN, DANIEL B     Clinical Assistant Professor  46   199554
## 29       PARKER, LEONARD A              Associate Professor  70   234255
## 30          SHEN, DINGGANG            Professor with Tenure  45   174411
## 31         ISAACSON, ARI J     Clinical Assistant Professor  36   240000
## 32        JORDAN, SHERYL G     Clinical Associate Professor  55   219255
## 33        KHANDANI, AMIR H  Associate Professor with Tenure  50   214255
## 34           LEE, SHEILA S     Clinical Assistant Professor  38   230000
## 35         BURKE, LAUREN M     Clinical Assistant Professor  33   204255
## 36          HARTMAN, HEIDI     Clinical Assistant Professor  33   204255
## 37             LEE, YUEH Z              Assistant Professor  41   134378
## 38            MEHTA, NISHA     Clinical Assistant Professor  32   220000
## 39     NORTHAM, MEREDITH C     Clinical Assistant Professor  33   220000
## 40       SAMS, CASSANDRA M     Clinical Assistant Professor  33   220000
## 41             HAN, TAE IL     Clinical Assistant Professor  48   180969
## 42     COLLICHIO, ROBERT J  Assoc Chair for Admin/Radiology  61   181064
## 43       HEYNEMAN, LAURA E     Clinical Associate Professor  51   149191
## 44                LI, ZIBO              Associate Professor  36   150000
## 45        IVANOVIC, MARIJA     Clinical Associate Professor  62   149968
## 46             SMITH, H. E     Research Associate Professor  40    72500
## 47    MCCARTNEY, WILLIAM H                        Professor  69   134627
## 48        WILCOX, CLAIRE B     Clinical Associate Professor  68        0
## 49            LEE, ELLIE R     Clinical Assistant Professor  44   108191
## 50      CRAWFORD, THOMAS J               Systems Specialist  55   114698
## 51     HENDERSON, LOUISE M              Assistant Professor  40   109900
## 52      ALVAREZ, HORTENSIA               Clinical Professor  57   109000
## 53      BOUGHTON, DANIEL J                 Business Officer  41    96717
## 54              AN, HONGYU              Assistant Professor  45   100000
## 55      PARROTT, MATTHEW C              Assistant Professor  37    93965
## 56            SHEIKH, ARIF     Clinical Assistant Professor  46    81755
## 57              YUAN, HONG     Research Assistant Professor  41    85000
## 58                GAO, WEI              Assistant Professor  32    29000
## 59            WU, ZHANHONG     Research Assistant Professor  39        0
## 60          YAP, PEW THIAN              Assistant Professor  36    26667
## 61       BENEFIELD, THAD S                     Statistician  38    70840
## 62       PETRIN, FERNAND H         Business Systems Analyst  57    68336
## 63       HOLLAND, VICKIE E                     HR Associate  49    44866
## 64     CREIGHTON, ANGELA H         Contracts/Grants Manager  43    61719
## 65           AKER, DIXIE K                  Systems Analyst  45    60000
## 66          USSERY, LISA A               Accounting Manager  50    58602
## 67          KIRK, SHANAH R              Research Specialist  50    55940
## 68 RAMALHO, JORGE MIGUEL P              Research Instructor  40        0
## 69               SHI, FENG  Postdoctoral Research Associate  34    55000
## 70             WU, GUORONG               POST-DOC RES ASSOC  36    55000
## 71      BOWEN, ELIZABETH A              Executive Assistant  44    53987
## 72           MARSH, MARY W               Research Associate  27    50000
## 73           STEED, DOREEN            Research Mammographer  50    48564
## 74           NESBITT, ANNE         Admin Support Specialist  50    48446
## 75         PRICE, CHERIE L                     HR Associate  57    48446
## 76        CLARK, MICHELE L         Admin Support Specialist  45    48349
## 77         KNOP, GABRIEL F   Social/Clinical Research Spec.  30    45000
## 78          ARMAO, DIANE M              Research Instructor  59    43546
## 79     BOOMHOWER, JEREMY D         Admin. Support Associate  38    42593
## 80      CARVER, VIRGINIA B         Admin. Support Associate  39    42593
## 81         HAUSER, JASON M         Admin. Support Associate  41    42593
## 82        HARTMAN, TERRY S   Social/Clinical Research Asst.  26    42168
## 83       MELVILLE, WILMA C      Administrative Secretary II  58    41789
## 84          BARBAL, ISABEL         Admin. Support Associate  57    40061
## 85      PENDER, JENNIFER L            Accounting Technician  39    37690
## 86      BIRDSONG, LAURIE B Public Communications Specialist  40    37681
## 87     FISCHER, MICHELLE C         Admin. Support Associate  25    37142
## 88        HOOTS, TIFFANY N   Social/Clinical Research Asst.  31    36360
##    totalsal
## 1    614176
## 2    375000
## 3    365000
## 4    350000
## 5    345000
## 6    335000
## 7    332000
## 8    322000
## 9    310000
## 10   309750
## 11   300000
## 12   300000
## 13   295000
## 14   295000
## 15   286540
## 16   285000
## 17   275000
## 18   275000
## 19   270000
## 20   270000
## 21   270000
## 22   260000
## 23   255000
## 24   255000
## 25   251304
## 26   251000
## 27   250000
## 28   250000
## 29   250000
## 30   250000
## 31   240000
## 32   235000
## 33   230000
## 34   230000
## 35   220000
## 36   220000
## 37   220000
## 38   220000
## 39   220000
## 40   220000
## 41   196714
## 42   191064
## 43   161000
## 44   150000
## 45   149968
## 46   145000
## 47   142500
## 48   131250
## 49   120000
## 50   114698
## 51   109900
## 52   109000
## 53   101602
## 54   100000
## 55    98910
## 56    97500
## 57    85000
## 58    80000
## 59    80000
## 60    80000
## 61    70840
## 62    68336
## 63    62350
## 64    61719
## 65    60000
## 66    58602
## 67    55940
## 68    55000
## 69    55000
## 70    55000
## 71    53987
## 72    50000
## 73    48564
## 74    48446
## 75    48446
## 76    48349
## 77    45000
## 78    43546
## 79    42593
## 80    42593
## 81    42593
## 82    42168
## 83    41789
## 84    40061
## 85    37690
## 86    37681
## 87    37142
## 88    36360

Make a histogram of Radiology salaries

ggplot(data = radio_dept) +
  geom_histogram(mapping = aes(x = totalsal))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Question 6

Create a data frame called dept_summary whose rows are the departments and whose columns are: department size, mean department salary, median department salary, and maximum salary (using totalsal for salary).

step1<- data %>%
  arrange(dept) %>%
  group_by(dept) 

dept_summary <- summarise(step1,
            mean_dept_sal = mean(totalsal),
            median_dept_sal = median(totalsal),
            maximum_dept_sal = max(totalsal),
            total = n())
  dept_summary
## # A tibble: 304 x 5
##    dept                     mean_dept_sal median_dept_sal maximum_dept_sal total
##    <chr>                            <dbl>           <dbl>            <int> <int>
##  1 Acad Sup Prog Student-A~        55798.          50600            115000    15
##  2 Academic Advising               49985.          45000            109625    42
##  3 Accounting Services             57417.          59342            103306    17
##  4 Ackland Art Museum              51543.          41000            140050    19
##  5 Admissions                      57487.          49000            195700    46
##  6 African Studies Center          35970           35970             43475     2
##  7 African, Afri-Amer & Di~        65170.          68000            135608    23
##  8 AHEC Support-Comm Med C~        69789.          64533            135193    26
##  9 Airport                         47351           47351             47351     1
## 10 Alcohol Studies Center          49232.          49180.            84685    16
## # ... with 294 more rows

Order the departments by highest mean salary and print the 10 highest paid departments.

high_mean <- arrange(dept_summary, desc(mean_dept_sal))
head(high_mean)
## # A tibble: 6 x 5
##   dept           mean_dept_sal median_dept_sal maximum_dept_sal total
##   <chr>                  <dbl>           <dbl>            <int> <int>
## 1 Neurosurgery         380058.          395550           621722    17
## 2 Provost              273790           240080           445000     4
## 3 Urology              216291.          237500           520569    20
## 4 Orthopaedics         216205.          240000           554559    31
## 5 Surgery              201917.          176083           677310   121
## 6 Anesthesiology       187177.          222645           620040    94

Order the departments by highest median salary and print the 10 highest paid departments.

high_median <- arrange(dept_summary, desc(median_dept_sal))
head(high_median)
## # A tibble: 6 x 5
##   dept            mean_dept_sal median_dept_sal maximum_dept_sal total
##   <chr>                   <dbl>           <dbl>            <int> <int>
## 1 Neurosurgery          380058.          395550           621722    17
## 2 Provost               273790           240080           445000     4
## 3 Orthopaedics          216205.          240000           554559    31
## 4 Urology               216291.          237500           520569    20
## 5 Anesthesiology        187177.          222645           620040    94
## 6 Carolina Counts       182160           182160           182160     1

Question 7

How many departments have at least 10 employees?

dept_summary_employees <- data %>%
  group_by(dept)  %>%
  tally()%>%
  filter(n >= 10)
dept_summary_employees
## # A tibble: 194 x 2
##    dept                               n
##    <chr>                          <int>
##  1 Acad Sup Prog Student-Athletes    15
##  2 Academic Advising                 42
##  3 Accounting Services               17
##  4 Ackland Art Museum                19
##  5 Admissions                        46
##  6 African, Afri-Amer & Diaspora     23
##  7 AHEC Support-Comm Med Care        26
##  8 Alcohol Studies Center            16
##  9 Allied Health Sciences           101
## 10 American Studies                  18
## # ... with 184 more rows

There are 194 deptartments with at least 10 employees.

Question 8

Which department hired the most people in 2010? Hint: you probably have to modify hiredate.

data_2010 <- mutate(data, hiredate = parse_date_time(hiredate, "%Y%m%d")) %>% 
  filter(year(hiredate) == 2010) %>%
  arrange(dept) %>%
  group_by(dept)
nrow(data_2010)
## [1] 645

There were 645 employees hired in 2010

Question 9

Make a list of all the department names and sort this list alphabetically. What is the 42nd department in this list?

department_names <- data %>%
  select(dept) %>%
  arrange(dept)
head(department_names)
##                             dept
## 1 Acad Sup Prog Student-Athletes
## 2 Acad Sup Prog Student-Athletes
## 3 Acad Sup Prog Student-Athletes
## 4 Acad Sup Prog Student-Athletes
## 5 Acad Sup Prog Student-Athletes
## 6 Acad Sup Prog Student-Athletes
department_names <- data %>%
  select(dept) %>%
  arrange(dept) %>%
  slice(42)#hint given in class
department_names
##                dept
## 1 Academic Advising

Question 10

Plot number of people hired by the CS dept per year vs. year

data <-  mutate(data, hiredate = parse_date_time(hiredate, "%Y%m%d"))
data_expand <- mutate(data, year = year(hiredate))

CS_data <- data_expand %>% 
  filter(dept == "Computer Science") %>% 
  group_by(year) %>% 
  summarise(n = n())
CS_data
## # A tibble: 27 x 2
##     year     n
##    <dbl> <int>
##  1  1964     1
##  2  1967     1
##  3  1978     1
##  4  1981     1
##  5  1984     1
##  6  1985     1
##  7  1987     1
##  8  1989     2
##  9  1991     3
## 10  1993     3
## # ... with 17 more rows
ggplot(data = CS_data)+
  geom_point(mapping = aes(x = year, y = n)) +
  geom_smooth(mapping = aes(x= year, y = n), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

It looks as if there was a spike in hires in the early nineties, with an exponential growth starting in mid 2000s.

Now add STOR, Math, Biostatistics, SILS and Physics to the above plot

CS_data_exp <- data_expand %>% 
  group_by(dept, year) %>% 
  select(dept, year, position) %>% 
 filter(dept == "Computer Science" | dept == "School of Info & Libr Science" | dept == "Biostatistics"| dept ==  "Mathematics"| dept ==  "Physics-Astronomy"| dept == "Statistics and Operations Res") %>% 
  arrange(dept) %>% 
  tally()

CS_data_exp
## # A tibble: 149 x 3
## # Groups:   dept [6]
##    dept           year     n
##    <chr>         <dbl> <int>
##  1 Biostatistics  1967     2
##  2 Biostatistics  1972     1
##  3 Biostatistics  1973     1
##  4 Biostatistics  1980     1
##  5 Biostatistics  1981     1
##  6 Biostatistics  1982     1
##  7 Biostatistics  1985     2
##  8 Biostatistics  1987     1
##  9 Biostatistics  1990     4
## 10 Biostatistics  1991     2
## # ... with 139 more rows
ggplot(data = CS_data_exp) +
  geom_point(mapping = aes(x = year , y = n, color = dept))

#Self-Guided Exploration

This was the part of the project where we basically made our own analyses. My writing has greatly improved from those days haha

age_and_sal <- data %>% 
  filter(dept == "Computer Science" | dept == "School of Info & Libr Science" | dept == "Biostatistics"| dept ==  "Mathematics"| dept ==  "Physics-Astronomy"| dept == "Statistics and Operations Res") %>% 
  select(age, dept, totalsal, hiredate) %>% 
  group_by(dept) %>% 
  summarise(avg_total_sal = mean(totalsal), 
            avg_age = mean(age))
age_and_sal
## # A tibble: 6 x 3
##   dept                          avg_total_sal avg_age
##   <chr>                                 <dbl>   <dbl>
## 1 Biostatistics                       102439.    47.8
## 2 Computer Science                    103307.    47.3
## 3 Mathematics                          93061.    49.7
## 4 Physics-Astronomy                    86372.    47.1
## 5 School of Info & Libr Science        82971.    47.4
## 6 Statistics and Operations Res       100471.    50.2
ggplot(data = age_and_sal) +
  geom_point(mapping = aes(x=avg_age, y = avg_total_sal, color = dept))

The above scatterplot shows the relationship between average age and average salary by the six departments of Computer Science, Biostatistics, Mathematics, Physics, SILS, and STOR at UNC. It seems to be the case that the faculty in the Mathematics and STOR departments tend to be significantly older than the rest. This could be because of the number of younger individuals that come to study Information and Library Science, Physics, Biostatistics or Computer Science at UNC, thus skewing their average. These departments may have more graduate student employees. It could also be reflective of the STOR and Mathematics departments requiring more detail-oriented work in their field, thus taking more time to receive a position.

Psychology <- data %>% 
  filter(dept == "Psychology") %>% 
  select(age, position, totalsal)%>%
  arrange(desc(totalsal))
head(Psychology)
##   age                      position totalsal
## 1  56                      DIRECTOR   196415
## 2  50 Kenan Distinguished Professor   171899
## 3  50                     Professor   165000
## 4  51                     Professor   160000
## 5  45                     Professor   155000
## 6  65       Distinguished Professor   152110
nrow(Psychology)
## [1] 89
ggplot(data = Psychology)+
  geom_point(mapping = aes(x = age, y = totalsal))+
  geom_smooth(mapping = aes(x = age, y = totalsal))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The above graph shows within the psychology department one’s age generally has an effect on how much they are paid. However, looking at the curve I can see that the standard error is fairly high, showing that age may not necessarily be a factor in one’s salary. This leads me to question whether or not the position has a factor on how much someone makes in psychology.

UNC_age_sal <- data %>% 
  select(age, totalsal)

ggplot(data = UNC_age_sal)+
  geom_point(mapping = aes(x = age, y = totalsal))+
  geom_smooth(mapping = aes(x = age, y = totalsal))
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

The above graph shows the relationship between a person’s age and their total salary across UNC as a whole. This graph leads me to believe that there is no true relationship between age and salary across UNC. There are many other factors that influence what salary an employee of the university is likely to receive. It seems that factors such as position, department, and fulltime status show a much stronger representation of what causes an individual to have a higher salary at UNC. The questions asked in the problem set are certainly more important for understanding what type of salary a UNC employee receives.