Task 1: Reflection

For Data Viz Challenge 7, I picked a dataset about Data Science Salaries. I chose this because I want to eventually work in the Data Science field. I am interested to see what the potential salary would be if I did. For the interactive plot, I chose to create a boxplot showing the salary based on experience level. When you hover, you can see the exact position title. For the other two graphs, I chose to go for something simpler. I created a ridgeline plot to show the density of salary based on employment status. I also created a stacked bar chart showing the remote ratio depending on the company size. Overall, I think this was an interesting dataset to look into. The flexdash board was a bit confusing to navigate at first, but I think it was a good skill to practice as it is a cool tool to now know.

Task 2: Interactive plots

library(tidyverse)
library(scales)
library(plotly)

# Load data here
salary <- read_csv("C:/Users/calkz/OneDrive/Documents/A_ISTA320/DVChallenge7/data/DataScience_salaries_2024.csv")
salary

This dataset is found on Kaggle created by Yusuf Delikkaya. This dataset includes information on the year, experience level (Junior, Mid-Level, Senior, Expert), employment type (Full-Time, Part-Time, Contract), job title, salary, residence, remote ratio, location, company size (small, medium, large)

Do the following:

  1. Make a plot. Any kind of plot will do (though it might be easiest to work with geom_point()).

  2. Make the plot interactive with ggplotly().

  3. Make sure the hovering tooltip is more informative than the default.

Good luck and have fun!

p <- ggplot(salary, aes(x = experience_level, y = salary_in_usd)) + 
  geom_boxplot() + 
  geom_jitter(color = 'black', size = 0.6, alpha = 0.4) +
  labs(title = 'Salary based on Experience Level', x = 'Experience Level', subtitle = 'Data Science Positions', y = 'Salary (in USD)') + 
  scale_y_continuous(labels = comma) + theme_minimal()

p

Upon creating this first version of the plot, there are way too many points. It’s a bit noisy so I will be using data strictly from 2024, and those who are full-time employees.

salary_2024 <- salary %>%
  filter(work_year == '2024') %>%
  filter(employment_type == 'FT') 
salary_2024
p_2024 <- ggplot(salary_2024, aes(x = experience_level, y = salary_in_usd, text = paste('Job Title:', job_title))) + 
  geom_boxplot() + 
  geom_jitter(size = 0.8, alpha = 0.4) + 
  labs(title = 'Salary based on Experience Level', x = 'Experience Level', subtitle = 'Data Science Positions in 2024', y = 'Salary (in USD)') + 
  scale_y_continuous(labels = comma) + theme_minimal()

interactive_plot <- ggplotly(p_2024, tooltip = 'text')
interactive_plot
htmlwidgets::saveWidget(interactive_plot, 'fancy.html')

Click to view Interactive plot

Here is the final boxplot where we are looking strictly at the data from 2024 and employees that work full time. We see that entry level workers make the least amount. What I find interesting is that there are the most observations of Senior level employees. Experienced workers make the most on average, and Middle level employees make almost the same as Entry Level positions.

Task 3:

Install the {flexdashboard} package and create a new R Markdown file in your project by going to File > New File… > R Markdown… > From Template > Flexdashboard.

Using the documentation for {flexdashboard} online, create a basic dashboard that shows a plot (static or interactive) in at least three chart areas. Play with the layout if you’re feeling brave.