Rather than representing a company, the purpose of the case study is to explore salaries for entry-level Data Analysts based in the United States. Therefore, the pertinent stakeholders are not just prospective junior data scientists, but also business recruiters. The data obtained in the case study was pulled from one of RANDOMARNAB’s Kaggle dataset (cited at the end of the study) which was a list of worldwide available data science-related positions as of March 2023. The dataset contained some information that wasn’t needed in the study, such as jobs located in countries outside the United States, or positions that required more experience. The key insights for this project are “What keywords should I use when job-hunting?” and “What kind of salary can I expect from (relatively) market-matching companies?”
As previously mentioned, the dataset used came from a public dataset on Kaggle(cited below). The data is clearly current since it was made in March of 2023, just 3 months prior to this study. Admittedly, the data is not as current as the creator would prefer, however it is under their impression that it is current enough to give a gist of the current job market and market salary. The data was filtered to only contain information pertinent to the study, US-based Entry-Level and Virtual positions. The reason for each specific filter was to keep the data country specific for the stakeholders, but general enough for it to be accessible country-wide. The process of cleaning and transforming data is exemplified below in the R coding chunks included.
Notes: I loaded the necessary packages, “flexdashboard”, “tinytex”, “rmarkdown”, “tidyverse”, “ggplot2”, “tidyr”, “readr”, “dplyr”, “skimr”,“janitor”’, “here”, and “formatR”
library("flexdashboard")
library("tinytex")
library("rmarkdown")
library("tidyverse")
library("ggplot2")
library("tidyr")
library("readr")
library("dplyr")
library("skimr")
library("janitor")
library("here")
library("formatR")
library("readxl")
library("packrat")
library("rsconnect")
data_science_salaries <- read_csv(here("ds_salaries.csv"))
Based on the study, job-hunting, a smart move is searching for job titles of “Data Analyst”, “Data Engineer”, “Data Scientist”, and “Machine Learning Engineer” since those terms are most populous in job descriptions. Once a person has secured an interview, in the steps going forward, it is wise for them to know that the market salary as of March 2023 is 80,000USD. These pieces of information will not only help candidates find jobs, but also prevent them from being sold short in salary discussions.
RANDOMARNAB, (2023). Data Science Salaries 2023 Kaggle. https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023/code))