Dr. Rich Huebner
January 31, 2018
In this project for the Week 3 assignment in the Coursera course, Developing Data Products, we are asked to create a plot using the plotly package.
Use a custom HR data set, available on Kaggle. This was a data set developed by me and my colleage, Dr. Carla Patalano, as a teaching data set for HR students at New England College of Business. The data set contains numerous attributes, including pay rate, race, titles, departments, etc.
There is no missing data in the data set and the data set is already cleansed. Typically we use this particular data set for data visualization purposes, which is why the data is already cleansed.
To do any machine learning algorithms, some of the features would need to be scaled.
The data set can be retrieved from: https://www.kaggle.com/rhuebner/human-resources-data-set/data
library(plotly)## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
hr <- read.csv('HRDataset_v6.csv', sep=',')
# Statistics for the Pay Rate field
summary(hr$Pay.Rate)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.00 20.00 24.00 31.28 45.31 80.00
# Statistics for the Age field.
summary(hr$Age)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25.00 32.00 37.00 38.87 44.00 67.00
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
After reviewing the above plotly graph, we can see that most of the production staff are paid lower than the other departments. Is this to be expected? Perhaps, since these staff are production/manufacturing workers, who work on an assembly line type of setup.