PAF 514 Code-Through

Introduction

In this tutorial, we will visualize the Top 10 Departments by Avergae Salary using a dataset that includes salary information by department and salary. We will clean the data and use ggplot2 to create visualizations that compare average salaries by departments.

Step 1: Load Required Libraries and Dataset

First, we need to load the required libraries and the dataset. We are using a sample dataset that includes calendar year, full name, job description, department description, salary, and full-time equivalence (FTE).

# Load required libraries
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data_url <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
salary_data <- read.csv(data_url)

Step 2: Inspect the Dataset

Next, we inspect the dataset to understand its structure. The dataset includes columns such as calendar year, full name, job description, department description, salary, and FTE. This helps us identify if there are any necessary cleaning steps for the Salary column, such as converting it to numeric or normalizing it based on FTE.

head(salary_data)

##   Calendar.Year                           Full.Name            Job.Description
## 1          2020                    ABBASI, Mohammad     Research/Lab Assistant
## 2          2020 ARQUIZA, Jose Maria Reynaldo Apollo                   Lecturer
## 3          2020                      Aaberg, Kelsea Student Support Specialist
## 4          2020                    Abadjivor, Enyah            Project Manager
## 5          2020                   Abayesu, Precious          Management Intern
## 6          2020                        Abbas, James            Assoc Professor
##           Department.Description      Salary FTE
## 1 Sch Biological & Hlth Sys Engr  $35,090.00 100
## 2 Sch Biological & Hlth Sys Engr  $71,400.00 100
## 3             Admission Services  $36,000.00 100
## 4                   CASGE  Tempe  $64,000.00 100
## 5 Health & Clinical Partnerships  $20,800.00  50
## 6 Sch Biological & Hlth Sys Engr $107,195.00 100

We can see that the Salary column contains dollar signs and commas, so we will clean these before proceeding with further analysis.

Step 3: Clean the Data

We will clean the Salary column by removing the dollar signs and commas and then convert it to numeric. We will also normalize salaries by dividing by FTE (full-time equivalence) to account for different working hours.

salary_data <- salary_data %>%
  mutate(Salary = gsub("[\\$,]", "", Salary),  
         Salary = as.numeric(Salary),          
         FTE = as.numeric(FTE),               
         Salary_FTE = Salary / FTE) %>%   
  filter(!is.na(Salary_FTE))        

head(salary_data)

##   Calendar.Year                           Full.Name            Job.Description
## 1          2020                    ABBASI, Mohammad     Research/Lab Assistant
## 2          2020 ARQUIZA, Jose Maria Reynaldo Apollo                   Lecturer
## 3          2020                      Aaberg, Kelsea Student Support Specialist
## 4          2020                    Abadjivor, Enyah            Project Manager
## 5          2020                   Abayesu, Precious          Management Intern
## 6          2020                        Abbas, James            Assoc Professor
##           Department.Description Salary FTE Salary_FTE
## 1 Sch Biological & Hlth Sys Engr  35090 100     350.90
## 2 Sch Biological & Hlth Sys Engr  71400 100     714.00
## 3             Admission Services  36000 100     360.00
## 4                   CASGE  Tempe  64000 100     640.00
## 5 Health & Clinical Partnerships  20800  50     416.00
## 6 Sch Biological & Hlth Sys Engr 107195 100    1071.95

Step 4: Visualize the Top 10 Departments by Average Salary

In this step, we aim to visualize the departments with the highest average salaries (normalized by full-time equivalence or FTE). We’ll first calculate the average salary for each department, then filter the top 10 departments with the highest average salary, and finally create a horizontal bar chart for better readability.

top_10_departments <- salary_data %>%
  group_by(Department.Description) %>%
  summarize(Avg_Salary_FTE = mean(Salary_FTE, na.rm = TRUE)) %>%
  top_n(10, Avg_Salary_FTE)

ggplot(top_10_departments, aes(x = reorder(Department.Description, Avg_Salary_FTE), y = Avg_Salary_FTE)) +
  geom_bar(stat = "identity", fill = "#56B4E9") +
  labs(title = "Top 10 Departments by Average Salary (Normalized by FTE)",
       x = "Department", y = "Average Salary (FTE)") +
  coord_flip() +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 8))

If your having trouble always makesure to check your column names to makesure you are inputting the correct names when creating your codes

colnames(salary_data)

## [1] "Calendar.Year"          "Full.Name"              "Job.Description"       
## [4] "Department.Description" "Salary"                 "FTE"                   
## [7] "Salary_FTE"

Step 5: Further Customization of the Bar Chart

In this step, we’ll explore additional customizations to improve the overall presentation of the graph. We will:

1.  Add labels showing the exact salary amounts on the bars.
2.  Customize the color palette for a more visually appealing chart.
3.  Adjust the title, axis labels, and other aesthetics for clarity.

ggplot(top_10_departments, aes(x = reorder(Department.Description, Avg_Salary_FTE), y = Avg_Salary_FTE)) +
  geom_bar(stat = "identity", fill = "#0073C2FF") + 
  geom_text(aes(label = round(Avg_Salary_FTE, 2)), 
            hjust = -0.1, size = 3) +            
  labs(title = "Top 10 Departments by Average Salary (Normalized by FTE)",
       subtitle = "Showing departments with the highest average salaries",
       x = "Department", y = "Average Salary (FTE)") +
  coord_flip() +  
  theme_minimal() +
  theme(axis.text.y = element_text(size = 8), 
        plot.title = element_text(face = "bold", size = 14),
        plot.subtitle = element_text(size = 10))

**Here is an Explanation of the Customizations made*:**

1.Custom color: We use a custom blue color (#0073C2FF) to give the chart a more professional appearance.
2.Labels on bars: We add the salary values as text labels on the bars using geom_text(). The labels are rounded to two decimal places and positioned slightly outside the bars (hjust = -0.1).
3.Title and subtitle: We include a subtitle to provide additional context, and we bold the title and adjust its size for emphasis.
4.Readability improvements: The text size for department names (axis.text.y) is adjusted to ensure they are readable, and we make slight adjustments to the title and subtitle text sizes.

PAF 514 Code-Through

Brittany Benally

2024-10-11