#R Bridge Final Project

In my final project, I am studying the data of the wages of 4360 males from the years 1980 to 1987. Many attributes were recorded but I chose to focus on Ethnicity, Marital Status and Years of Experience. I ultimately wanted to discover how these attributes correlate to wages.

r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
install.packages("tidyverse")
## Installing package into 'C:/Users/NCC-1701D/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\NCC-1701D\AppData\Local\Temp\RtmpYlqbfP\downloaded_packages
library(readr)
Raw_Data <- read.csv("https://raw.githubusercontent.com/johnnyboy1287/Wages/main/Males.csv")

Raw_Data_DF <- data.frame(Raw_Data)

RemovingNegativeWages <- subset(Raw_Data_DF, wage > 0)
require("knitr")
## Loading required package: knitr
require("ggplot2")
## Loading required package: ggplot2
names(RemovingNegativeWages)[10] <- paste("HourlyWageinDollars")



Ethnicity <- RemovingNegativeWages[1:4360,c(7,10)]

names(Ethnicity)[1] <- paste("Ethnicity")



Marital_Status <- RemovingNegativeWages[1:4360,c(8,10)]

names(Marital_Status)[1] <- paste("Marital Status")



Experience <- RemovingNegativeWages[1:4350,c(5,10)]

names(Experience)[1] <- paste("YearsofExperience")
head(Marital_Status)
##   Marital Status HourlyWageinDollars
## 1             no            1.197540
## 2             no            1.853060
## 3             no            1.344462
## 4             no            1.433213
## 5             no            1.568125
## 6             no            1.699891
head(Experience)
##   YearsofExperience HourlyWageinDollars
## 1                 1            1.197540
## 2                 2            1.853060
## 3                 3            1.344462
## 4                 4            1.433213
## 5                 5            1.568125
## 6                 6            1.699891
head(Ethnicity)
##   Ethnicity HourlyWageinDollars
## 1     other            1.197540
## 2     other            1.853060
## 3     other            1.344462
## 4     other            1.433213
## 5     other            1.568125
## 6     other            1.699891
mean(RemovingNegativeWages$HourlyWageinDollars)
## [1] 1.671902
mean(RemovingNegativeWages[RemovingNegativeWages$ethn == 'black', 'HourlyWageinDollars'])
## [1] 1.557653
mean(RemovingNegativeWages[RemovingNegativeWages$ethn == 'hisp', 'HourlyWageinDollars'])
## [1] 1.638204
mean(RemovingNegativeWages[RemovingNegativeWages$ethn == 'other', 'HourlyWageinDollars'])
## [1] 1.697145
mean(RemovingNegativeWages[RemovingNegativeWages$maried == 'yes', 'HourlyWageinDollars'])
## [1] 1.784073
mean(RemovingNegativeWages[RemovingNegativeWages$maried == 'no', 'HourlyWageinDollars'])
## [1] 1.583309

As we can see from the data above we can see the differences in wages depending on ethnicity and marital status.

To summarize, here are the means by category:

Universal Mean 1.671902

Black 1.557653

Hispanic 1.638204

Other 1.697145

Married 1.784073

Not Married 1.583309

#boxplot of wages

ggplot(RemovingNegativeWages, aes(x=HourlyWageinDollars)) + geom_boxplot(fill="slateblue", alpha=0.2) + xlab("wages")

#Histogram of Ethnicity
ggplot(Ethnicity, aes(x = Ethnicity)) + geom_bar()

#ScatterPlot of Wages
ggplot(Experience, aes(x=YearsofExperience, y=HourlyWageinDollars)) +
    geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 33 rows containing non-finite values (stat_smooth).
## Warning: Removed 33 rows containing missing values (geom_point).

In Conclusion, it seems that the most contributing factor for wages is marital status with almost a $.20 difference in those who are married as opposed to those who aren’t which is the largest difference between our variables. Black and Hispanic males seem to get paid less than those who chose other. Surprisingly, experience did not correlate positively as much with wage as I previously would’ve thought as those with 9 years experience being the highest paid.