In this Assignment we have used a data set from CANSIM tables. We are going to work on the data in Table 202-0101 : Distribution of earnings, by sex, in 2011 constant dollars. This table contains 2100 series, with data for years 1976 - 2011 (not all combinations necessarily have data for all years), and was last released on 2013-06-27.
This table contains data described by the following dimensions (Not all combinations are available):
We have used only a part of this data set, containing only 240 observations through 7 variables.
ErnDat <- read.csv("EarningDistribution.csv")
str(ErnDat)
## 'data.frame': 240 obs. of 7 variables:
## $ Year : int 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 ...
## $ Province : Factor w/ 5 levels "Atlantic provinces",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ SEX : Factor w/ 2 levels "Females","Males": 2 2 2 2 2 2 2 2 2 2 ...
## $ Income : int 39600 40600 39800 39400 39500 37900 37900 38200 37000 37700 ...
## $ EarnersCount: int 677 681 676 658 647 652 648 636 642 637 ...
## $ FYFTEarning : int 49100 49900 49800 50400 51000 48000 48500 47900 46300 47500 ...
## $ FYFTCount : int 375 374 370 351 336 330 328 327 330 333 ...
We have selected the past 24-years data (1988-2011) and restricted ourselves to the following parameters:
library(ggplot2)
In this part we want to depict the empirical distribution of income for both sexes separately and compare these distributions.
ggplot(ErnDat, aes(x = Income, color = SEX)) + geom_density() + facet_wrap(~Province) +
xlab("average total income (dollars)")
As we see, in all the provinces, men generally earn higher incomes compared to women.
To compare the average of earnings of an FYFT worker among Provinces, we can use the following simple diagram.
ggplot(ErnDat, aes(reorder(Province, FYFTEarning), FYFTEarning)) + geom_point() +
geom_jitter(position = position_jitter(width = 0.1)) + facet_wrap(~SEX) +
ylab("Earnings of a Full-Year Full-Time worker (dollars)")
We see that FYFT workers in Ontario have the highest average earnings among all the provinces (no matter what the sex type is). We see that the highest level of earnings of female FYFT workers ,across different Provinces in Canada, is not as much as the lowest level of earning of male workers.
ggplot(ErnDat, aes(Year, FYFTCount, color = SEX)) + geom_point() + geom_line() +
facet_wrap(~Province) + ylab("Number of Full-Year Full-Time workers") +
xlab("Provinces")
As we see there is more or less a gap between the number of Full-Year Full-Time men workers and that of women. As an example, this gap has been vanishing in Atlantic Provinces.
We want to show the relation between the earning of a full-year full-time person and her/his average income.We have depicted different sexes with different colors.
ggplot(ErnDat, aes(x = Income, y = FYFTEarning, col = SEX)) + geom_point() +
geom_smooth(method = "lm") + xlab("Average total income (dollars)") + ylab("Average earnings of full-year full-time workers (dollars)")
We see that there is a close relationship between the total income and the average earnings of a full-time worker, across countries in different years.