Developing Data Products - Week 3 - Presentation using Plotly

Dr. Rich Huebner

January 31, 2018

Introduction

In this project for the Week 3 assignment in the Coursera course, Developing Data Products, we are asked to create a plot using the plotly package.

Approach

Use a custom HR data set, available on Kaggle. This was a data set developed by me and my colleage, Dr. Carla Patalano, as a teaching data set for HR students at New England College of Business. The data set contains numerous attributes, including pay rate, race, titles, departments, etc.

There is no missing data in the data set and the data set is already cleansed. Typically we use this particular data set for data visualization purposes, which is why the data is already cleansed.

To do any machine learning algorithms, some of the features would need to be scaled.

The data set can be retrieved from: https://www.kaggle.com/rhuebner/human-resources-data-set/data

Load libraries

library(plotly)

## Loading required package: ggplot2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

Grab the data

hr <- read.csv('HRDataset_v6.csv', sep=',')

# Statistics for the Pay Rate field
summary(hr$Pay.Rate)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.00   20.00   24.00   31.28   45.31   80.00

# Statistics for the Age field.
summary(hr$Age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   25.00   32.00   37.00   38.87   44.00   67.00

Next, Create a plots with plotly.

## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter

## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

After reviewing the above plotly graph, we can see that most of the production staff are paid lower than the other departments. Is this to be expected? Perhaps, since these staff are production/manufacturing workers, who work on an assembly line type of setup.