Task 1: Import the csv data set “Professorial Salaries” into R.

Upload the dataset from your computer

file = file.choose()
salary = read.csv(file)

Task 2: Examine the overall relationship between variables in the data set using “GGally” package. Interpret the graph.

library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
vars = salary[, c("Rank", "Discipline", "Yrs.since.phd", "Yrs.service", "NPubs", "Ncits", "Sex", "Salary")]
ggpairs(data = vars, mapping = aes(color = Sex))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Task 3: Describe the study population

library(table1)
## 
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
## 
##     units, units<-
table1(~ Rank + Discipline + Yrs.since.phd + Yrs.service + NPubs + Ncits + Salary | Sex, data = salary)
Female
(N=39)
Male
(N=358)
Overall
(N=397)
Rank
AssocProf 10 (25.6%) 54 (15.1%) 64 (16.1%)
AsstProf 11 (28.2%) 56 (15.6%) 67 (16.9%)
Prof 18 (46.2%) 248 (69.3%) 266 (67.0%)
Discipline
A 18 (46.2%) 163 (45.5%) 181 (45.6%)
B 21 (53.8%) 195 (54.5%) 216 (54.4%)
Yrs.since.phd
Mean (SD) 16.5 (9.78) 22.9 (13.0) 22.3 (12.9)
Median [Min, Max] 17.0 [2.00, 39.0] 22.0 [1.00, 56.0] 21.0 [1.00, 56.0]
Yrs.service
Mean (SD) 11.6 (8.81) 18.3 (13.2) 17.6 (13.0)
Median [Min, Max] 10.0 [0, 36.0] 18.0 [0, 60.0] 16.0 [0, 60.0]
NPubs
Mean (SD) 20.2 (14.4) 17.9 (13.9) 18.2 (14.0)
Median [Min, Max] 18.0 [1.00, 50.0] 13.0 [1.00, 69.0] 13.0 [1.00, 69.0]
Ncits
Mean (SD) 40.7 (16.2) 40.2 (17.0) 40.2 (16.9)
Median [Min, Max] 36.0 [14.0, 70.0] 35.0 [1.00, 90.0] 35.0 [1.00, 90.0]
Salary
Mean (SD) 101000 (26000) 115000 (30400) 114000 (30300)
Median [Min, Max] 104000 [62900, 161000] 108000 [57800, 232000] 107000 [57800, 232000]

Task 4: Describe the study population with the median (IQR) presented for continuous variables

library(table1)
table1(~ Rank + Discipline + Yrs.since.phd + Yrs.service + NPubs + Ncits + Salary| Sex, data = salary, render.continuous = c(. = "Median [Q1, Q3]"))
Female
(N=39)
Male
(N=358)
Overall
(N=397)
Rank
AssocProf 10 (25.6%) 54 (15.1%) 64 (16.1%)
AsstProf 11 (28.2%) 56 (15.6%) 67 (16.9%)
Prof 18 (46.2%) 248 (69.3%) 266 (67.0%)
Discipline
A 18 (46.2%) 163 (45.5%) 181 (45.6%)
B 21 (53.8%) 195 (54.5%) 216 (54.4%)
Yrs.since.phd
Median [Q1, Q3] 17.0 [10.0, 23.5] 22.0 [12.0, 33.0] 21.0 [12.0, 32.0]
Yrs.service
Median [Q1, Q3] 10.0 [4.00, 17.5] 18.0 [7.00, 27.0] 16.0 [7.00, 27.0]
NPubs
Median [Q1, Q3] 18.0 [11.0, 27.0] 13.0 [8.00, 25.8] 13.0 [8.00, 26.0]
Ncits
Median [Q1, Q3] 36.0 [28.5, 54.5] 35.0 [28.0, 50.0] 35.0 [28.0, 50.0]
Salary
Median [Q1, Q3] 104000 [77300, 117000] 108000 [92000, 135000] 107000 [91000, 134000]

Task 5: Compare female professors with male professors. Fill in the following table and interpret the findings.

library(compareGroups)
createTable(compareGroups(Sex ~ Rank + Discipline + Yrs.since.phd + Yrs.service + NPubs + Ncits + Salary, data = salary))
## 
## --------Summary descriptives table by 'Sex'---------
## 
## _____________________________________________________ 
##                   Female          Male      p.overall 
##                    N=39          N=358                
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## Rank:                                         0.014   
##     AssocProf   10 (25.6%)     54 (15.1%)             
##     AsstProf    11 (28.2%)     56 (15.6%)             
##     Prof        18 (46.2%)    248 (69.3%)             
## Discipline:                                   1.000   
##     A           18 (46.2%)    163 (45.5%)             
##     B           21 (53.8%)    195 (54.5%)             
## Yrs.since.phd  16.5 (9.78)    22.9 (13.0)    <0.001   
## Yrs.service    11.6 (8.81)    18.3 (13.2)    <0.001   
## NPubs          20.2 (14.4)    17.9 (13.9)     0.352   
## Ncits          40.7 (16.2)    40.2 (17.0)     0.851   
## Salary        101002 (25952) 115090 (30437)   0.003   
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Overall, the statistic show that the data contains a lot less data for women compare to men. Though women tend to have lower salaries and higher publications/citations, they have far less years of service and years since phd compare to men. This highlights that it does not necessary mean that there is unfair pay in the workforce, but rather that pay may be based on experience and how long someone is within the field rather than the publications or citations.