The Current Population Survey (CPS) is used to supplement census information between census years. These data consist of a random sample of 29 persons from the CPS, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. We wish to determine (i) whether wages are related to these characteristics and (ii) whether there is a gender gap in wages.
EDUCATION: Number of years of education.
SOUTH: Indicator variable for Southern Region (1=Person lives in South, 0=Person lives elsewhere).
SEX: Indicator variable for sex (1=Female, 0=Male).
EXPERIENCE: Number of years of work experience.
UNION: Indicator variable for union membership (1=Union member, 0=Not union member).
WAGE: Wage (dollars per hour).
AGE: Age (years).
RACE: Race (1=Other, 2=Hispanic, 3=White).
OCCUPATION: Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other).
SECTOR: Sector (0=Other, 1=Manufacturing, 2=Construction).
MARR: Marital Status (0=Unmarried, 1=Married)
library("openxlsx")
data <- read.xlsx(file.choose())
data
## Education South Sex Experience Union Wage Age Sector Married
## 1 8 0 P 21 0 5.1 35 1 1
## 2 9 0 P 42 0 4.95 57 1 1
## 3 12 0 L 1 0 6.67 19 1 0
## 4 12 0 L 4 0 4 22 0 0
## 5 12 0 L 17 0 7.5 35 0 1
## 6 13 0 L 9 1 13.07 28 0 0
## 7 12 0 L 9 0 19.47 27 0 0
## 8 16 0 L 11 0 13.28 33 1 1
## 9 12 0 L 9 0 8.75 27 0 0
## 10 12 0 L 17 1 11.35 35 0 1
## 11 12 0 L 19 1 11.5 37 1 0
## 12 12 0 L 37 0 7.3 55 1 1
## 13 12 0 L 26 1 22.2 44 1 1
## 14 11 0 L 16 0 3.65 33 0 0
## 15 12 0 L 33 0 20.55 51 0 1
## 16 12 0 P 16 1 5.71 34 1 1
## 17 7 0 L 42 1 7 55 1 1
## 18 12 0 L 9 0 3.75 27 0 0
## 19 12 0 L 23 0 9.56 41 0 1
## 20 12 0 L 8 0 9.36 26 1 1
## 21 10 0 L 30 0 6.5 46 0 1
## 22 12 0 P 8 0 3.35 26 1 1
## 23 10 1 L 27 0 4.45 43 0 0
## 24 8 1 L 27 0 6.5 41 0 1
## 25 9 1 L 30 1 6.25 45 0 0
## 26 9 1 L 29 0 19.98 44 0 1
## 27 7 1 L 44 0 8 57 0 1
## 28 11 1 L 14 0 4.5 31 0 1
## 29 6 1 L 45 0 5.75 57 1 1
head(data)
## Education South Sex Experience Union Wage Age Sector Married
## 1 8 0 P 21 0 5.1 35 1 1
## 2 9 0 P 42 0 4.95 57 1 1
## 3 12 0 L 1 0 6.67 19 1 0
## 4 12 0 L 4 0 4 22 0 0
## 5 12 0 L 17 0 7.5 35 0 1
## 6 13 0 L 9 1 13.07 28 0 0
str(data)
## 'data.frame': 29 obs. of 9 variables:
## $ Education : num 8 9 12 12 12 13 12 16 12 12 ...
## $ South : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Sex : chr "P" "P" "L" "L" ...
## $ Experience: num 21 42 1 4 17 9 9 11 9 17 ...
## $ Union : num 0 0 0 0 0 1 0 0 0 1 ...
## $ Wage : chr "5.1" "4.95" "6.67" "4" ...
## $ Age : num 35 57 19 22 35 28 27 33 27 35 ...
## $ Sector : num 1 1 1 0 0 0 0 1 0 0 ...
## $ Married : num 1 1 0 0 1 0 0 1 0 1 ...
data[, c(1, 4, 6)] <- lapply(data[, c(1, 4, 6)], as.numeric)
H0 : Data berdistribusi normal multivariat
H1 : Data tidak berdistribusi normal multivariat
alpha = 5% = 0,05
#Statistik Uji
library(MVN)
## Warning: package 'MVN' was built under R version 4.4.1
test = mvn(data[1:29,c(1,4,6)], mvnTest = "mardia", univariateTest = "SW", multivariatePlot = "qq")
test
## $multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 13.2822084762488 0.208318454443323 YES
## 2 Mardia Kurtosis -0.214329504809508 0.830290111083336 YES
## 3 MVN <NA> <NA> YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Education 0.8590 0.0012 NO
## 2 Shapiro-Wilk Experience 0.9451 0.1365 YES
## 3 Shapiro-Wilk Wage 0.8277 0.0003 NO
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th Skew
## Education 29 10.827586 2.172375 12 6.00 16.0 9.0 12.00 -0.3918675
## Experience 29 21.482759 12.740881 19 1.00 45.0 9.0 30.00 0.3390871
## Wage 29 8.965517 5.436784 7 3.35 22.2 5.1 11.35 1.1815078
## Kurtosis
## Education -0.0565370
## Experience -1.0901924
## Wage 0.2018184
Mardia Skewness : 0.20831845
Mardia Kurtosis : 0.830290111
Pada Mardia’s Test Skewness, tolak H0 jika p-value < alpha
Pada Mardia’s Test Kurtosis, tolak H0 hika p-value < alpha
Berdasarkan hasil perhitungan diatas diketahui bahwa data berdistribusi normal multivariat dikarenakan p value dari kurtosis dan skewness > apha
H0 : Matriks kovarians sama
H1 : Matriks kovarians tidak sama
alpha = 5% = 0,05
library(biotools)
## Warning: package 'biotools' was built under R version 4.4.1
## Loading required package: MASS
## ---
## biotools version 4.2
grup <- data$Sex;grup
## [1] "P" "P" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "L" "P" "L" "L" "L"
## [20] "L" "L" "P" "L" "L" "L" "L" "L" "L" "L"
head(grup)
## [1] "P" "P" "L" "L" "L" "L"
boxM(data = data[1:29,c(1,4,6)], grouping = grup)
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: data[1:29, c(1, 4, 6)]
## Chi-Sq (approx.) = 5.2636, df = 6, p-value = 0.5105
Tolak H0, jika p-value < alpha, terima dalam hal lainnya
Karena nilai p-value (0,5105) > alpha (0,05) maka terima H0, sehingga Sex memenuhi asumsi homogenitas karena memiliki kovarians grup yang sama
H0 : Faktor Sex tidak berpengaruh terhadap variabel X
H1 : Faktor Sex berpengaruh terhadap variabel X
owm = manova(cbind(data$Education, data$Experience, data$Wage)~data$Sex)
summary(owm)
## Df Pillai approx F num Df den Df Pr(>F)
## data$Sex 1 0.09884 0.91401 3 25 0.4484
## Residuals 27
Tolak H0 jika p-value < alpha, terima dalam hal lainnya
Dikarenakan Pvalue (0,4484) > alpha (0,05) maka terima H0, sehingga variabel sex tidak berpengaruh gaji seseorang