Exploratory Data Analysis in R.

Wages in Belgium

Description: a cross-section from 1994
number of observations : 1472
observation : individuals
country : Belgium
Usage: data(Bwages)

Format

A dataframe containing :

  • wage: gross hourly wage rate in euro
  • educ: education level from 1 [low] to 5 [high]
  • exper: years of experience
  • sex: a factor with levels (males,female)

Source: European Community Household Panel.
References: Verbeek, Marno (2004) A guide to modern econometrics, John Wiley and Sons, http://www.econ.kuleuven.ac.be/GME, chapter 3.

First we load the data

x <- data.frame(Ecdat::Bwages)

Now we can review the summary information of the dataset

str(x)
## 'data.frame':    1472 obs. of  4 variables:
##  $ wage : num  7.78 4.82 10.56 7.04 7.89 ...
##  $ educ : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ exper: int  23 15 31 32 9 15 26 23 13 22 ...
##  $ sex  : Factor w/ 1 level "1='male';0='female'": NA NA NA NA NA NA NA NA NA NA ...
summary(x)
##       wage             educ           exper      
##  Min.   : 2.191   Min.   :1.000   Min.   : 0.00  
##  1st Qu.: 8.113   1st Qu.:3.000   1st Qu.: 9.00  
##  Median :10.127   Median :3.000   Median :16.50  
##  Mean   :11.051   Mean   :3.378   Mean   :17.22  
##  3rd Qu.:12.755   3rd Qu.:4.000   3rd Qu.:24.00  
##  Max.   :47.576   Max.   :5.000   Max.   :47.00  
##                   sex      
##  1='male';0='female':   0  
##  NA's               :1472  
##                            
##                            
##                            
## 

The below histogram shows we have a left skewed distribution with the highest frequency of observations falling between 5 and 15 gross euro hourly wage rate.

Histogram

The below scatterplot helps to illustrate a common misconception that years of experience has a large impact on the hourly wages. We can see that there is no clear indication that years of experience has a distinct impact on hourly wages. While we see more observations with higher wages as years of experience increases, there remains a clustering around 10 euros as the years of experience increases. The absense of a discernable relationship is as useful information as finding an easily observable relationship.

Scatterplot

The below boxplot provides a decent illustration of the possible impact that education has on hourly wages. The median does not increase as great as one might expected but we do see modest increases as the education level rises. The most important illustration is that while wages may not increase dramatically with education we do see the increase of the range. The wage opportunties with those that have a level 1 education is low and very narrow whereas those with an education value of 5 have a greater range in wages.

Boxplot