Data Found Via R-Data 2008-2009 Academic Salaries for Professors of all Level - 397 Observations and 6 Variables

Introduction

For this project, I Lee Joey Noel have intended to use the statistical methods and tools that I learned from LIS4317 - Visual Analytics to explore the selected dataset that I have found and chosen from R-Data for my final project for the course. The selected data I have is the Salaries data set which pertains to Salaries for Professors.

Project Goals

My goals for this project are to conduct Multiple Data Visualizations using the same Dataset. The visualizations that I will be conducting are corresponded from the module lesson from this semester.

Projected Visual Approaches are:

Loading Selected Packages

library(ggplot2) # Main packaged used for most of the visualiztions
library(gganimate) # For Animation
library(hrbrthemes) #Implemented with Animation Analysis
library(corrgram) #Conducting with Correlation Analysis
library(ggcorrplot) #Conducting with Correlation Analysis
library(reshape2) #Conducting with Correlation Analysis Specifically to melt my dataframe.

Loading The Dataset

Before conducting any sort of analysis, I must load the dataset. Luckily there wont be any sort of hardcore data cleaning to be done instead of just factoring out the sex, discipline and rank so all we would need to do first is pull the Salaries dataset via CSV.

SalariesData <- data.frame(read.csv('E:/Visual Analytics/Final Project/data/Salaries.csv'))

#Factor the selected columns
SalariesData$rank <- as.factor(SalariesData$rank)
SalariesData$discipline <- as.factor(SalariesData$discipline)
SalariesData$sex <- as.factor(SalariesData$sex)

head(SalariesData)
##        rank discipline yrs.since.phd yrs.service  sex salary
## 1      Prof          B            19          18 Male 139750
## 2      Prof          B            20          16 Male 173200
## 3  AsstProf          B             4           3 Male  79750
## 4      Prof          B            45          39 Male 115000
## 5      Prof          B            40          41 Male 141500
## 6 AssocProf          B             6           6 Male  97000

Dataset Details

Multivariate Analysis

This is a more straight forward visualization using a multivariate from the dataset. First I will be conducting a visualization displaying the Academic salary by rank, years of service, and years since obtaining their PhD. Than will conduct one with representing a different color depending on Sex, and for the last one will be a different color depending on the professors department via either Theoretical or Applied.

SalaryPlotRank <- ggplot(SalariesData,
                     aes(x = yrs.since.phd,
                         y = salary,
                         color = rank,
                         size = yrs.service)) +
  geom_point(alpha = .7) +
  labs(title = "Salaries of Professors split by rank, years of service, and years since obtaining PhD.")+
  xlab("Years Since Obtaining PhD") +
  ylab("Salary") 
SalaryPlotRank + theme_dark()

1st Multivariate Analysis (By Rank)

You can already tell how straight forward this visualization is. You can see from the beginning that most professor with the lower end salary are just starting off from obtaining their PhD and just recently earned the rank of Assistant Professor, and as time goes on down the road since obtaining their PhD they gain more experience hence why the points get bigger due to years of service and eventually become an Adj Professor with higher salary.

SalaryPlotSex <- ggplot(SalariesData,
                     aes(x = yrs.since.phd,
                         y = salary,
                         color = sex,
                         size = yrs.service)) +
  geom_point(alpha = .7) +
  labs(title = "Salaries of Professors split by opposite sex, over the years of service, and years since obtaining PhD.") +
  xlab("Years Since Obtaining PhD") +
  ylab("Salary")
SalaryPlotSex + theme_dark()

2nd Multivariate Analysis (By Sex)

Its kinda weird seeing this. I would assume see more Woman Professors to be on the more higher end when. You can also see form this analysis that during that time (2008 - 2009) It was extremely male dominant with professor working in macadamia. I also see there are alot of Big Red Dots (Representing Female Professor that have more than 30 years of service) that are still in the lower end of salary even though its been 20-30+ years since obtaining their PhD making me assume during that time not much female professors were working in Universities but most likely in community/stage colleges.

SalaryPlotDept <- ggplot(SalariesData,
                     aes(x = yrs.since.phd,
                         y = salary,
                         color = discipline,
                         size = yrs.service)) +
  geom_point(alpha = .7) +
  labs(title = "Salaries of Professors split by Department(A = Theoretical, B = Applied), years of service, and years since obtaining PhD.") +
  xlab("Years Since Obtaining PhD") +
  ylab("Salary")
SalaryPlotDept + theme_dark()

3rd Multivariate Analysis (By Department)

I Found this one to be the most interesting from the other two. You can tell from this one that Professors affiliated in the applied studies department were expected to make more than theoretical department professors. I can assume because the intensity level of studying and research compared to the theoretical department or due to most courses related to the Applied department having to deal with more math/science related studies.

Correlation Analysis

For this section, I will be conducting correlation analysis. I went ahead and wanted to use 3 sepate package related correlation analysis to display each result and explain the approaches of the visualization behind it.

Had a bit of trouble figuring out how I would melt and correlate my dataset but I figured out that you can select a subset of all numeric variables select a subset of all numeric variables in the salaries dataset using the unlist & lapply is.numeric functions as shown in my code below:,

corsSalaries <- melt(cor(SalariesData[, unlist(lapply(SalariesData, is.numeric))]))
ggplot(corsSalaries, aes(x = Var1, y = Var2, fill = value)) + 
  geom_tile() + 
  ggtitle("Heatmap of Profesor Salaries", ) +
  scale_fill_gradient(low = "white", high = "purple")

corsSalaries2 <- cor(SalariesData[, unlist(lapply(SalariesData, is.numeric))])
corrgram(corsSalaries2, order = TRUE,
         lower.panel = panel.shade,
         upper.panel = panel.pie,
         text.panel = panel.txt,
         main = "Corelogram of Professor Salaries")

ggcorrplot(corsSalaries2, type = "upper",
           method = "square",
           lab = TRUE,
           title = "ggcorrplot chart of Professor Salaries")

Correlation Analysis Reflection

After observing through these 3 correlation visualizations. You can see what they all have in common with their results are that years since PhD and Years of service has an extremely high correlation coefficient (0.91). Which would perfect since because your pay would most definitely increase over time working, just like how your years grow since you first obtained your PhD. Furthermore it seems from the results that years since obtaining PhD and Years of service has a gradual increase in salary hence due to it taking a year or a performance review to even get raises. Preferably. My favorite method was using ggcorrplot() function which personally made it easier for me to read and understand what the data was explaining.

Animation with gganimate package

AnimatedPlotRank <- ggplot(data = SalariesData, aes(yrs.since.phd, salary, colour = rank)) +
  geom_line() +
  scale_color_manual(values = c("saddlebrown","royalblue1","violetred")) +
  theme_ipsum() +
  ggtitle("Salary Growth of professors of different rank over the \n years since obtaining their PhD") +
  xlab("Years Since Obtaining PhD") +
  ylab("Salary") +
  transition_reveal(yrs.since.phd)

AnimatedPlotRank

AnimatedPlotSex <- ggplot(data = SalariesData, aes(yrs.since.phd, salary, colour = sex)) +
  geom_line() +
  scale_color_manual(values = c("red","royalblue1")) +
  theme_ipsum() +
  ggtitle("Salary Growth of professors of the \n opposite sex over the \n years since obtaining their PhD") +
  xlab("Years Since Obtaining PhD") +
  ylab("Salary") +
  transition_reveal(yrs.since.phd)

AnimatedPlotSex

AnimatedPlotDept <- ggplot(data = SalariesData, aes(yrs.since.phd, salary, colour = discipline)) +
  geom_line() +
  scale_color_manual(values = c("Orange","blue")) +
  theme_ipsum() +
  ggtitle("Salary Growth of professors from \n theoretical/applied discplines over the \n years since obtaining their PhD") +
  xlab("Years Since Obtaining PhD") +
  ylab("Salary") +
  transition_reveal(yrs.since.phd)

AnimatedPlotDept

Animation Analysis Reflection

As the same as the explanation I had during the analysis of the multivariate section of this project. Just displaying the visualization through the use of gganimate with the transitional_reveal() Function.

Reflection

After conducting multiple visualizations in R I found it to be truly amazing for how much this language can achieve. At first when I began using it, I thought it would only be a simple language for conducting basic command prompt related statical computing. But I was most definitely wrong. I am looking forward to whats next in my Data Science journey, and really looking forward on taking predictive analytics as well.

-cheers.

Lee J Noel

Data Found Via R-Data 2008-2009 Academic Salaries for Professors of all Level - 397 Observations and 6 Variables