library(haven)
## Warning: package 'haven' was built under R version 3.5.3
DVLFS <- read_dta("C:/Users/Saira Rasul/Desktop/data-viz/DVLFS.dta")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
library(MASS)
library(plyr)
## Warning: package 'plyr' was built under R version 3.5.3
1-The Overall impact of wages with respect to the gender comparison.
2-The comparison of wages among Gender (Female/Males) at provincial levels as well at regional level.
3-The overall labor involvement at Provincial as well as regional level with respect to the gender.
4-The Impact of formal and non-formal education on wages.
5-The contribution of Labor as per their age in the Labor Force of Pakistan.
DVLFS[, c("wages", "gender")]
## # A tibble: 27,345 x 2
## wages gender
## <dbl> <dbl>
## 1 82.5 1
## 2 60.2 1
## 3 73.8 1
## 4 65.4 1
## 5 96.2 1
## 6 248. 1
## 7 265. 1
## 8 248. 1
## 9 85.9 1
## 10 95.6 1
## # ... with 27,335 more rows
DVLFS$gender<-factor(DVLFS$gender)
DVLFS$gender<-revalue(DVLFS$gender, c("0"="Female", "1"="Male"))
p<-ggplot(DVLFS, aes(x=wages, colour=gender)) + geom_density()
p+labs(title="Comparison of wages among gender", x="Wage",y="Density")
DVLFS[, c("wages", "gender")]
## # A tibble: 27,345 x 2
## wages gender
## <dbl> <fct>
## 1 82.5 Male
## 2 60.2 Male
## 3 73.8 Male
## 4 65.4 Male
## 5 96.2 Male
## 6 248. Male
## 7 265. Male
## 8 248. Male
## 9 85.9 Male
## 10 95.6 Male
## # ... with 27,335 more rows
DVLFS$gender<-factor(DVLFS$gender)
DVLFS$gender<-revalue(DVLFS$gender, c("0"="Female", "1"="Male"))
## The following `from` values were not present in `x`: 0, 1
p<-ggplot(DVLFS, aes(x=Age, y=wages,colour=gender)) + geom_point()
p
p+labs(title="Age Wage ditribution accordingly gender", x="Age in years",y="Wages")
sps<- ggplot(DVLFS,aes(x=Age,y=wages,color=DVLFS$gender))+geom_point()+scale_color_brewer(palette = "Set1")
sps+geom_point()+geom_line()
By looking at the above graph, we come to know that such graphs are the bad representation for this type of data, as such graphs need greater insight and attention of the viewer. So we move towards the Bar charts, as Horizontal Bar Chart have much more clear analysis. We get the clear picture showing that males get more wages as compared to the females, as of the males have more contribution compared to females in the market.
DVLFS[, c("wages", "gender")]
## # A tibble: 27,345 x 2
## wages gender
## <dbl> <fct>
## 1 82.5 Male
## 2 60.2 Male
## 3 73.8 Male
## 4 65.4 Male
## 5 96.2 Male
## 6 248. Male
## 7 265. Male
## 8 248. Male
## 9 85.9 Male
## 10 95.6 Male
## # ... with 27,335 more rows
DVLFS$gender<-factor(DVLFS$gender)
DVLFS$gender<-revalue(DVLFS$gender, c("1"="Male", "2"="Female"))
## The following `from` values were not present in `x`: 1, 2
p<-ggplot(data=DVLFS, aes(x=gender, y=lnwages))+geom_bar(stat="identity")
p + scale_fill_brewer(palette="Greens") + theme_minimal()+labs(title="Bar plot Gender wise")
DVLFS[, c("wages", "PROVINCE")]
## # A tibble: 27,345 x 2
## wages PROVINCE
## <dbl> <dbl+lbl>
## 1 82.5 1 [KHYBER PAKHTUNKHWA]
## 2 60.2 1 [KHYBER PAKHTUNKHWA]
## 3 73.8 1 [KHYBER PAKHTUNKHWA]
## 4 65.4 1 [KHYBER PAKHTUNKHWA]
## 5 96.2 1 [KHYBER PAKHTUNKHWA]
## 6 248. 1 [KHYBER PAKHTUNKHWA]
## 7 265. 1 [KHYBER PAKHTUNKHWA]
## 8 248. 1 [KHYBER PAKHTUNKHWA]
## 9 85.9 1 [KHYBER PAKHTUNKHWA]
## 10 95.6 1 [KHYBER PAKHTUNKHWA]
## # ... with 27,335 more rows
DVLFS$PROVINCE<-factor(DVLFS$PROVINCE)
DVLFS$PROVINCE<-revalue(DVLFS$PROVINCE, c("1"="KPK", "2"="Punjab" , "3"="Sindh" , "4"= "Baloch"))
p<-ggplot(DVLFS, aes(x=PROVINCE, y=lnwages, fill=gender))+geom_bar(stat="identity")
p + scale_fill_brewer(palette="Greens") + theme_minimal()+labs(title="Bar plot Province wise")
By Considering the Provinces, it is also analyzed through the violen plot. Where the results are more clear as follows:
qplot(PROVINCE, lnwages, data = DVLFS, geom = "violin")
Now, we are considering the regional impact on wages. Which is as follows:
DVLFS[, c("wages", "Region")]
## # A tibble: 27,345 x 2
## wages Region
## <dbl> <dbl+lbl>
## 1 82.5 2 [Urban]
## 2 60.2 2 [Urban]
## 3 73.8 2 [Urban]
## 4 65.4 2 [Urban]
## 5 96.2 2 [Urban]
## 6 248. 2 [Urban]
## 7 265. 2 [Urban]
## 8 248. 2 [Urban]
## 9 85.9 2 [Urban]
## 10 95.6 2 [Urban]
## # ... with 27,335 more rows
DVLFS$Region<-factor(DVLFS$Region)
DVLFS$Region<-revalue(DVLFS$Region, c("1"="Rural", "2"="Urban"))
p<-ggplot(data=DVLFS, aes(x=Region, y=lnwages, fill=gender))+geom_bar(stat="identity")
p + scale_fill_brewer(palette="Greens") + theme_minimal()+labs(title="Bar plot Region wise")
By Considering the Regions, it is also analyzed through the violen plot. Where the results are clearer as follows:
qplot(Region, lnwages, data = DVLFS, geom = "violin")
p<-ggplot(DVLFS, aes(x = PROVINCE, y = lnwages, color=gender)) + geom_point()
p+labs(title="At Provincial level,Gender based comparison of wages")
p<-ggplot(DVLFS , aes(x = Region, y = lnwages, color=gender)) + geom_point()
p+labs(title="At Regional level,Gender based comparison of wages")
DVLFS[, c("wages", "nfe")]
## # A tibble: 27,345 x 2
## wages nfe
## <dbl> <dbl>
## 1 82.5 0
## 2 60.2 0
## 3 73.8 0
## 4 65.4 0
## 5 96.2 0
## 6 248. 0
## 7 265. 0
## 8 248. 0
## 9 85.9 0
## 10 95.6 1
## # ... with 27,335 more rows
DVLFS$nfe<-factor(DVLFS$nfe)
DVLFS$nfe<-revalue(DVLFS$nfe, c("0"="noForEdu", "1"="ForEdu"))
p<-ggplot(DVLFS, aes(x=nfe, y=Age,fill=nfe)) + geom_boxplot()+ labs(x = "formal and non formal education", y = "compete year of Age", title = "boxplot for Age and education type")
p
p <- ggplot(DVLFS, aes(x = Age, colors="green")) + geom_histogram()
p +labs(title = "Age based distribution")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
By looking at the whole graphical analysis, it has been observed that in Pakistan majority of the labor force consists of males especially in the Punjab. Overall, people from rural regions have greater contribution as viewed from the recent labor force survey data of Pakistan. Active labor force is of 40 years after that 50 years of age it is declining. Still the impact of formal vs non-formal education have clear cut difference between the overall earnings of the people.