SPS DATA 607 Week 8 Group Project3
Approach
Introduction
Using data, investigate the question:
Which data science skills are most valued?
About Dataset
Dataset: https://www.kaggle.com/datasets/asaniczka/data-science-job-postings-and-skills?select=job_postings.csv
LinkedIn is a popular professional networking platform with millions of job postings across various industries.
This dataset provides a raw dump of data science-related job postings collected from LinkedIn. It includes information about job titles, companies, locations, search parameters, and other relevant details.
The main objective of this dataset is not only to provide insights into the data science job market and the skills required by professionals in this field but also to offer users an opportunity to practice their data cleaning skills.
Data Acquisition and Storage
The dataset used in this project was obtained from Kaggle: Data Science Job Postings and Skills. It contains job postings collected from LinkedIn and includes information such as job titles, company names, locations, and required skills for data-related positions. The dataset was downloaded as CSV files (e.g., job_postings.csv) and stored locally for analysis using R and RStudio.
Data Preparation and Cleaning
Before analysis, the dataset required preprocessing to ensure data quality. This process included removing missing or duplicate records, standardizing column names, and cleaning text fields such as job titles and skill lists. Additional transformations were performed to convert certain variables into appropriate formats and to extract relevant keywords for skill analysis. These steps ensured the dataset was consistent and ready for further analysis.
Exploratory Data Analysis
Exploratory Data Analysis (EDA) was conducted to better understand the structure and patterns within the dataset. Summary statistics and frequency counts were used to examine common job titles, locations, and skills mentioned in job postings. The analysis also explored relationships between variables, such as the distribution of skills across different data science roles and geographic locations.
Visualization
Data visualization techniques were used to present insights in a clear and interpretable way. Charts such as bar plots, word frequency charts, and geographic comparisons were used to highlight the most frequently requested skills and job roles. Visualizations helped identify trends in the data science job market and allowed us to compare skill demand across different job categories and locations.
Research Questions / Objectives
Which skills are most frequently required in data science job postings?
How do the most in-demand skills vary across different data-related roles?
Is there a correlation between job location and the skills required for data science positions?
Can we identify patterns or trends in skill demand that could inform career development or hiring strategies?
Expected Findings
The analysis will identify the most in-demand skills by analyzing keyword frequency across different job postings and visualizing the most promising skills across various data-related roles. We will also explore whether there is any correlation between job location and the required skill sets, providing insights into geographic trends in data science skill demand.
Communication
Email
Teams
Github
Rstudio