library(tidyverse)
setwd("/Users/mikea/Desktop/Datasets")
df <- read_csv("Salary_Data.csv")
str(df)
## spc_tbl_ [6,704 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Age : num [1:6704] 32 28 45 36 52 29 42 31 26 38 ...
## $ Gender : chr [1:6704] "Male" "Female" "Male" "Female" ...
## $ Education Level : chr [1:6704] "Bachelor's" "Master's" "PhD" "Bachelor's" ...
## $ Job Title : chr [1:6704] "Software Engineer" "Data Analyst" "Senior Manager" "Sales Associate" ...
## $ Years of Experience: num [1:6704] 5 3 15 7 20 2 12 4 1 10 ...
## $ Salary : num [1:6704] 90000 65000 150000 60000 200000 55000 120000 80000 45000 110000 ...
## - attr(*, "spec")=
## .. cols(
## .. Age = col_double(),
## .. Gender = col_character(),
## .. `Education Level` = col_character(),
## .. `Job Title` = col_character(),
## .. `Years of Experience` = col_double(),
## .. Salary = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
This dataset includes 6 different types of variables
The first variable called age is a numeric variable that records the different ages in this dataset.
The Gender variable is a character variable.
Education Levelis also a character variable that records the level of education a employee has.
Job Title is a character variable that shows the job position someone holds.
Years of experience is a numeric variable shows the amount of years someone has worked in that area.
Salary is a numeric variable that shows the amount of money earned.
df <- df %>%
na.omit(df)
summary(df)
## Age Gender Education Level Job Title
## Min. :21.00 Length:6698 Length:6698 Length:6698
## 1st Qu.:28.00 Class :character Class :character Class :character
## Median :32.00 Mode :character Mode :character Mode :character
## Mean :33.62
## 3rd Qu.:38.00
## Max. :62.00
## Years of Experience Salary
## Min. : 0.000 Min. : 350
## 1st Qu.: 3.000 1st Qu.: 70000
## Median : 7.000 Median :115000
## Mean : 8.095 Mean :115329
## 3rd Qu.:12.000 3rd Qu.:160000
## Max. :34.000 Max. :250000
sd(df$Salary)
## [1] 52789.79
sd(df$`Years of Experience`)
## [1] 6.060291
sd(df$Age)
## [1] 7.615784
df <- df %>%
mutate(Gender = as.factor(Gender)) %>%
mutate(`Education Level`= factor(`Education Level`)) %>%
mutate(`Job Title` = factor(`Job Title`))
df <- df %>%
mutate(`Education Level` = ifelse(
grepl("bachelor", tolower(`Education Level`)), "Bachelors's Degree",
ifelse(grepl("master", tolower(`Education Level`)), "Master's Degree",
ifelse(grepl("phd", tolower(`Education Level`)), "PhD",
ifelse(grepl("high school", tolower(`Education Level`)), "High School",
`Education Level`)))))
addmargins(table(df$Gender))
##
## Female Male Other Sum
## 3013 3671 14 6698
prop.table(table(df$Gender))
##
## Female Male Other
## 0.449835772 0.548074052 0.002090176
#table(df$Gender, df$`Job Title`)
addmargins(table(df$Gender, df$`Education Level`))
##
## Bachelors's Degree High School Master's Degree PhD Sum
## Female 1198 251 1068 496 3013
## Male 1823 185 790 873 3671
## Other 0 12 2 0 14
## Sum 3021 448 1860 1369 6698
education_counts <- table(df$`Education Level`)
barplot(education_counts, main = "Education Level Distribution", xlab = "Education Level", ylab = "Count")
gender_counts <- table(df$Gender)
pie(gender_counts, labels = names(gender_counts), main = "Gender Distribution")
hist(df$Salary, main = "Salary", xlab = "Salary Range", ylab = "Counts")
boxplot(df$Salary, main = "Salary", ylab = "Salary Range")
hist(df$`Years of Experience`, main = "Years of Experience", xlab = "Years")
boxplot(df$`Years of Experience`, main = "Years of Experience", ylab = "Years")
This data is comprised of of 6698 observations after removing NA’s and has 6 variables. The six variables include, age, gender, education level, job title, years of experience, and salary. It would seem that there is 3671 males, 3013 females, and 14 who identify as other. After I compared the education levels between genders to see if there are any noticeable differences.
I discovered that more females have have a master degree compared to men. However, men seem to have more PhD’s than females. In terms of high school and bachelors there was not that much difference the two genders. However, it would seem a majority would fall into the bachelors level of education. I also made different charts to examine years of experience and salary ranges within this dataset. For example years of experience go from 0- 35. Salary ranges go from 350-250000 and that most people typically fall between 50,000 - 200,000 range.