Determinants of Wages Data (CPS 1985)

Data Description

Cross-section data originating from the May 1985 Current Population Survey by the US Census Bureau

loading the dataset

wagedata <- read.csv("CPS1985.csv", header=T, sep = ",")
dim(wagedata)
## [1] 534  11
head(wagedata)
##    wage education experience age ethnicity region gender occupation
## 1  5.10         8         21  35  hispanic  other female     worker
## 2  4.95         9         42  57      cauc  other female     worker
## 3  6.67        12          1  19      cauc  other   male     worker
## 4  4.00        12          4  22      cauc  other   male     worker
## 5  7.50        12         17  35      cauc  other   male     worker
## 6 13.07        13          9  28      cauc  other   male     worker
##          sector union married
## 1 manufacturing    no     yes
## 2 manufacturing    no     yes
## 3 manufacturing    no      no
## 4         other    no      no
## 5         other    no     yes
## 6         other   yes      no
str(wagedata)
## 'data.frame':    534 obs. of  11 variables:
##  $ wage      : num  5.1 4.95 6.67 4 7.5 ...
##  $ education : int  8 9 12 12 12 13 10 12 16 12 ...
##  $ experience: int  21 42 1 4 17 9 27 9 11 9 ...
##  $ age       : int  35 57 19 22 35 28 43 27 33 27 ...
##  $ ethnicity : chr  "hispanic" "cauc" "cauc" "cauc" ...
##  $ region    : chr  "other" "other" "other" "other" ...
##  $ gender    : chr  "female" "female" "male" "male" ...
##  $ occupation: chr  "worker" "worker" "worker" "worker" ...
##  $ sector    : chr  "manufacturing" "manufacturing" "manufacturing" "other" ...
##  $ union     : chr  "no" "no" "no" "no" ...
##  $ married   : chr  "yes" "yes" "no" "no" ...

The Dataset has 11 variables and 534 obervations. There are 4 numeric variables and 7 categorical variables.

stats <- function (columnN){
  Mean <- mean(wagedata[, columnN], na.rm=TRUE)
  Median <- median(wagedata[, columnN], na.rm=TRUE)
  Min <- min(wagedata[, columnN], na.rm=TRUE)
  Max <- max(wagedata[, columnN], na.rm=TRUE)
  output <- data.frame(Mean, Median, Min, Max)
  return(output)
}
wage <- stats(1)
Variable <- "Wage"
Var1 <- cbind(Variable, wage)
Educ <- stats(2)
Variable <- "Education"
Var2 <- cbind(Variable, Educ)
Exp <- stats(3)
Variable <- "Experience"
Var3 <- cbind(Variable, Exp)
Age <- stats(4)
Variable <- "Age"
Var4 <- cbind(Variable, Age)

Summary Statistics of the Numeric Variables

rbind(Var1, Var2, Var3, Var4)
##     Variable      Mean Median Min  Max
## 1       Wage  9.024064   7.78   1 44.5
## 2  Education 13.018727  12.00   2 18.0
## 3 Experience 17.822097  15.00   0 55.0
## 4        Age 36.833333  35.00  18 64.0

Ethnicity Vs. Sectoral Work

table <- ftable(ethnicity ~ sector, wagedata)
table
##               ethnicity cauc hispanic other
## sector                                     
## construction              21        0     3
## manufacturing             81        4    14
## other                    338       23    50

From the above table we have two interesting observations:

  1. Construction sector does not have any Hispanic workers.
  2. Most of the Caucasian people works in sectors other than construction and manufacturing.
plot(wagedata$experience, wagedata$wage,  main="Relatiobship between wage rate and experience",
   xlab="Experience", ylab="Wage rate ($) ", pch=19)

Relationship between experience and wage rate

From the above scatter plot, we can conclude that experience of the wokers does not have any effect on their wage rate.