TRM Practical Data Analysis - Basic level

Lecture 4. Analysis of difference - Numeric data

Task 1. Determine whether salaries differed between male and female professors

1.1 Study design:

The cross-sectional investigation of 396 professors to determine whether salaries differed between male and female professors.
Null hypothesis: salaries did not differ between male and female professors.
Alternative hypothesis: salaries differed between male and female professors.

1.2 Import the “Professorial Salaries” and name this dataset “salary”

salary = read.csv("C:\\Thach\\UTS\\Teaching\\TRM\\Practical Data Analysis\\2024_Autumn semester\\Data\\Professorial Salaries.csv")

1.3 Describe characteristics of the study sample by sex

library(table1)

## 
## Attaching package: 'table1'

## The following objects are masked from 'package:base':
## 
##     units, units<-

table1(~ Rank + Discipline + Yrs.since.phd + Yrs.service + NPubs + Ncits + Salary | Sex, data = salary)

	Female (N=39)	Male (N=358)	Overall (N=397)
Rank
AssocProf	10 (25.6%)	54 (15.1%)	64 (16.1%)
AsstProf	11 (28.2%)	56 (15.6%)	67 (16.9%)
Prof	18 (46.2%)	248 (69.3%)	266 (67.0%)
Discipline
A	18 (46.2%)	163 (45.5%)	181 (45.6%)
B	21 (53.8%)	195 (54.5%)	216 (54.4%)
Yrs.since.phd
Mean (SD)	16.5 (9.78)	22.9 (13.0)	22.3 (12.9)
Median [Min, Max]	17.0 [2.00, 39.0]	22.0 [1.00, 56.0]	21.0 [1.00, 56.0]
Yrs.service
Mean (SD)	11.6 (8.81)	18.3 (13.2)	17.6 (13.0)
Median [Min, Max]	10.0 [0, 36.0]	18.0 [0, 60.0]	16.0 [0, 60.0]
NPubs
Mean (SD)	20.2 (14.4)	17.9 (13.9)	18.2 (14.0)
Median [Min, Max]	18.0 [1.00, 50.0]	13.0 [1.00, 69.0]	13.0 [1.00, 69.0]
Ncits
Mean (SD)	40.7 (16.2)	40.2 (17.0)	40.2 (16.9)
Median [Min, Max]	36.0 [14.0, 70.0]	35.0 [1.00, 90.0]	35.0 [1.00, 90.0]
Salary
Mean (SD)	101000 (26000)	115000 (30400)	114000 (30300)
Median [Min, Max]	104000 [62900, 161000]	108000 [57800, 232000]	107000 [57800, 232000]

1.4 Check the distribution of professors’ salaries

library(ggplot2)
p = ggplot(data = salary, aes(x = Salary))
p1 = p + geom_histogram(aes(y = ..density..), color = "white", fill = "blue")
p2 = p1 + geom_density(col="red")
p2 + ggtitle("Distribution of professors' salaries") + theme_bw()

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Describe the differences in salaries between male and female professors (optional)

p = ggplot(data = salary, aes(x = Sex,  y = Salary, fill = Sex, col = Sex))
p1 = p + geom_boxplot(col = "black") + geom_jitter(alpha = 0.05) 
p1 + labs(x = "Sex", y = "Professors' salaries (USD)") + ggtitle("Professors' salaries by sex") + theme_bw()

1.5 Student’s t-test to determine whether salaries were different between male and female professors

t.test(Salary ~ Sex, data = salary)

## 
##  Welch Two Sample t-test
## 
## data:  Salary by Sex
## t = -3.1615, df = 50.122, p-value = 0.002664
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -23037.916  -5138.102
## sample estimates:
## mean in group Female   mean in group Male 
##             101002.4             115090.4

Interpretation: There is evidence (P= 0.003) that the salaries of male professors were $14,088 higher than female professors, ranging from $5,138 to $23,037.

Task 2. Determine whether the number of publications differed between male and female professors in a subgroup of 26 associate professors in the Theoretical discipline

2.1 Study design:

The cross-sectional investigation of 26 associate professors in the Theoretical discipline to determine whether the number of publications differed between male and female professors.
Null hypothesis: the number of publications did not differ between male and female professors.
Alternative hypothesis: the number of publications differed between male and female professors.

2.2 Select a subgroup of associate professors in the Theoretical discipline

Assoc.A = subset(salary, Rank == "AssocProf" & Discipline == "A")
dim(Assoc.A)

## [1] 26  9

2.3 Describe characteristics of associate professors in Theoretical discipline by sex

library(table1)
table1(~ Rank + Discipline + Yrs.since.phd + Yrs.service + NPubs + Ncits + Salary | Sex, data = Assoc.A)

	Female (N=4)	Male (N=22)	Overall (N=26)
Rank
AssocProf	4 (100%)	22 (100%)	26 (100%)
Discipline
A	4 (100%)	22 (100%)	26 (100%)
Yrs.since.phd
Mean (SD)	18.5 (8.19)	17.7 (12.2)	17.8 (11.5)
Median [Min, Max]	19.0 [10.0, 26.0]	12.5 [8.00, 49.0]	13.0 [8.00, 49.0]
Yrs.service
Mean (SD)	15.5 (8.70)	13.1 (12.3)	13.5 (11.7)
Median [Min, Max]	15.0 [8.00, 24.0]	8.00 [1.00, 49.0]	8.00 [1.00, 49.0]
NPubs
Mean (SD)	10.0 (4.97)	21.6 (14.2)	19.8 (13.8)
Median [Min, Max]	10.0 [4.00, 16.0]	16.0 [3.00, 48.0]	16.0 [3.00, 48.0]
Ncits
Mean (SD)	38.5 (18.5)	44.3 (15.2)	43.4 (15.5)
Median [Min, Max]	37.5 [19.0, 60.0]	47.0 [24.0, 69.0]	47.0 [19.0, 69.0]
Salary
Mean (SD)	72100 (6400)	85000 (10600)	83100 (11100)
Median [Min, Max]	74100 [62900, 77500]	82400 [70000, 108000]	81900 [62900, 108000]

2.4 Check the distribution of number of publications among associate professors in Theoretical discipline

p = ggplot(data = Assoc.A, aes(x = NPubs))
p1 = p + geom_histogram(aes(y = ..density..), color = "white", fill = "blue")
p2 = p1 + geom_density(col="red")
p2 + ggtitle("Distribution of number of publications among associate professors in Theoretical discipline") + theme_bw()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Describe the differences in number of publications between male and female associate professors in Theoretical discipline (optional)

p = ggplot(data = Assoc.A, aes(x = Sex,  y = NPubs, fill = Sex, col = Sex))
p1 = p + geom_boxplot(col = "black") + geom_jitter(alpha = 0.05) 
p1 + labs(x = "Sex", y = "Number of publications") + ggtitle("Number of publications by sex") + theme_bw()

2.5 Mann-Whitney non-parametric test to determine whether number of publications differed between male and female associate professors in Theoretical discipline

wilcox.test(NPubs ~ Sex, data = Assoc.A)

## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  NPubs by Sex
## W = 21.5, p-value = 0.1168
## alternative hypothesis: true location shift is not equal to 0

2.6 Bootstrap to determine whether number of publications differed between male and female associate professors in Theoretical discipline

Differences in the MEAN number of publications

library(simpleboot)

## Warning: package 'simpleboot' was built under R version 4.3.2

## Simple Bootstrap Routines (1.1-7)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

male = Assoc.A %>% filter(Sex == "Male")
female = Assoc.A %>% filter(Sex == "Female")
set.seed(1234)

b.means = two.boot(male$NPubs, female$NPubs, mean, R = 1000)
hist (b.means$t, breaks = 20)

quantile(b.means$t, probs=c(0.025, 0.50, 0.975))

##      2.5%       50%     97.5% 
##  4.976136 11.568182 18.273295

Differences in the MEDIAN number of publications

set.seed(1234)

b.medians = two.boot(male$NPubs, female$NPubs, median, R = 1000)
hist (b.medians$t, breaks = 20)

quantile(b.medians$t, probs=c(0.025, 0.50, 0.975))

##    2.5%     50%   97.5% 
## -0.5000  7.0000 19.5125

TRM Practical Data Analysis - Basic level

Thach Tran

2024-02-10

TRM Practical Data Analysis - Basic level

Lecture 4. Analysis of difference - Numeric data

Task 1. Determine whether salaries differed between male and female professors

1.1 Study design:

1.2 Import the “Professorial Salaries” and name this dataset “salary”

1.3 Describe characteristics of the study sample by sex

1.4 Check the distribution of professors’ salaries

Describe the differences in salaries between male and female professors (optional)

1.5 Student’s t-test to determine whether salaries were different between male and female professors

Task 2. Determine whether the number of publications differed between male and female professors in a subgroup of 26 associate professors in the Theoretical discipline

2.1 Study design:

2.2 Select a subgroup of associate professors in the Theoretical discipline

2.3 Describe characteristics of associate professors in Theoretical discipline by sex

2.4 Check the distribution of number of publications among associate professors in Theoretical discipline

Describe the differences in number of publications between male and female associate professors in Theoretical discipline (optional)

2.5 Mann-Whitney non-parametric test to determine whether number of publications differed between male and female associate professors in Theoretical discipline

2.6 Bootstrap to determine whether number of publications differed between male and female associate professors in Theoretical discipline

Differences in the MEAN number of publications

Differences in the MEDIAN number of publications

Task 3. Save your work and upload it to your Rpubs account