Introduction

For this assisgnment we were asked to use data to answer this question, “Which are the most valued data science skills?”. Data from Kaggel “Data Science Job Postings & Skills (2024)” :https://www.kaggle.com/datasets/asaniczka/data-science-job-postings-and-skills?select=job_postings.csv Kaggle data Liscensure Open Data Commons Attribution License (ODC-By) v1.0: https://opendatacommons.org/licenses/by/1-0/index.html We found a dataset for data science job postings on Kaggel, which we used to find job skills that were mentioned the most to find the most valued skill for data science skills.

Load in libraries

library(tidyverse)
library(stringr)

Import data

df <- read.csv("https://raw.githubusercontent.com/Andreina-A/Project-3/refs/heads/main/Data_merged.csv")
head(df)
##   X
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
##                                                                                                                                       job_link
## 1                        https://au.linkedin.com/jobs/view/%F0%9F%8C%9F-expression-of-interest-data-scientist-opportunities-at-hyre-3796352718
## 2                                                             https://au.linkedin.com/jobs/view/aml-operations-analyst-at-boq-group-3754582056
## 3                                                             https://au.linkedin.com/jobs/view/aps5-finance-data-analyst-at-talent-3796359082
## 4                                                  https://au.linkedin.com/jobs/view/aps6-data-business-analyst-at-indigeco-pty-ltd-3805248464
## 5 https://au.linkedin.com/jobs/view/associate-professor-professor-artificial-intelligence-and-machine-learning-at-deakin-university-3784491631
## 6                           https://au.linkedin.com/jobs/view/bar-teamleader-full-time-intercontinental-perth-at-ihg-hotels-resorts-3798301068
##             last_processed_time  last_status got_summary got_ner
## 1 2024-01-19 09:45:09.215838+00 Finished NER           t       t
## 2 2024-01-19 09:45:09.215838+00 Finished NER           t       t
## 3 2024-01-19 09:45:09.215838+00 Finished NER           t       t
## 4 2024-01-19 09:45:09.215838+00 Finished NER           t       t
## 5   2024-01-21 03:11:02.4548+00 Finished NER           t       t
## 6 2024-01-19 14:46:20.703535+00 Finished NER           t       t
##   is_being_worked
## 1               f
## 2               f
## 3               f
## 4               f
## 5               f
## 6               f
##                                                                       job_title
## 1                      🌟 Expression of Interest - Data Scientist Opportunities
## 2                                                        AML Operations Analyst
## 3                                                     APS5 Finance Data Analyst
## 4                                                    APS6 Data/Business Analyst
## 5 Associate Professor / Professor, Artificial Intelligence and Machine Learning
## 6                           Bar Teamleader (Full Time) - InterContinental Perth
##                company                                      job_location
## 1                Hyre.                Sydney, New South Wales, Australia
## 2            BOQ Group                    Melbourne, Victoria, Australia
## 3               Talent                     Richmond, Victoria, Australia
## 4     Indigeco Pty Ltd Canberra, Australian Capital Territory, Australia
## 5    Deakin University                      Belmont, Victoria, Australia
## 6 IHG Hotels & Resorts               Perth, Western Australia, Australia
##   first_seen       search_city search_country          search_position
## 1 2024-01-13            Sydney      Australia Maintenance Data Analyst
## 2 2024-01-13          Victoria      Australia        Credit Authorizer
## 3 2024-01-13          Victoria      Australia         Data Entry Clerk
## 4 2024-01-13          Canberra      Australia         Data Entry Clerk
## 5 2024-01-16         Redcliffe      Australia       Instructor Driving
## 6 2024-01-16 Western Australia      Australia                    Finer
##    job_level job_type
## 1 Mid senior   Onsite
## 2 Mid senior   Onsite
## 3 Mid senior   Onsite
## 4 Mid senior   Onsite
## 5 Mid senior   Onsite
## 6 Mid senior   Onsite
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                job_skills
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Data analytics, Machine learning, Predictive modeling, Data visualization, Data pipelines, Tableau, Power BI, Python, MATLAB, R, Machine learning algorithms, Statistical analysis, Cloudbased platforms, AWS, Azure, Google Cloud, Big data technologies, Hadoop, Spark, Natural language processing, Deep learning, Data science certifications, Machine learning certifications, Problemsolving skills, Attention to detail, Communication skills, Teamwork, Bachelor's or Master's in Computer Science Statistics Mathematics or a related field, Proven experience as a Data Scientist
## 2 AML/CTF typologies, Financial Crime Operations, AML Operations Detection, Transaction monitoring alerts, Customer and payment screening alerts, Regulatory report exception handling (TTRs & IFTIs), AML Operations operational obligations, AML/CTF Act 2006, AML/CTF Rules, Financial crime assessment, Due diligence evidence, Banking systems, Threshold Transaction Reporting, International Funds Transfer Instruction Reporting, Internal processes, Policies, Regulatory reporting SLAs, KPIs, Quality targets, Continuous improvement, Project work, Communication, Escalation, Information sharing, Spirited, Optimistic, Curious, Inclusive, Accountable, Lionhearted, Financial crime analysis, Riskbased approach, Interpersonal skills, Written skills, Verbal skills, Financial Crime risk typologies, Indicators, Red flags, Analytical skills, Problemsolving skills, Riskbased decisionmaking skills, Eye for detail, AML/CTF Act and Rules, Financial Crime risk, Money Laundering, Terrorism Financing, Sanctions, Banking codes of practice, Privacy Act, Criminal Code, Netreveal (NROD), Temenos
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Data Analytics, SAS, Statistical Techniques, Data Warehousing, Reporting, Data Interpretation, Actionable Insights, Pattern Identification, Trend Analysis, Communication, Stakeholder Engagement, Australian Citizenship
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Data Analysis, Forecasting, Reporting, Demand and Supply Analysis, Contract Management, Trend Analysis, Data Interpretation, Risk Assessment, Mitigation Activities, Forecasting Accuracy, Internal/External Stakeholder Engagement, Written Communication, Team Collaboration, Microsoft Office Suite, Australian Citizenship, Baseline Clearance
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Generative AI, Deep Learning, Machine Learning, Artificial Intelligence, Teaching, Learning, Assessment, Research
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Food and Beverage Management, Leadership, Customer Service, Beverage Knowledge, Teamwork, Communication Skills, Responsible Service of Alcohol certification, Food Safety Course, Paid Parental Leave, Paid Wellness Days, Employee Discounts, Career Development Programs

Tidy Data

We used str_split_fixed to pull out individual job skills seperated by commas for each job posting.

df1 <- data.frame(str_split_fixed(df$job_skills, ",", 150))
head(df1)
##                             X1                          X2
## 1               Data analytics            Machine learning
## 2           AML/CTF typologies  Financial Crime Operations
## 3               Data Analytics                         SAS
## 4                Data Analysis                 Forecasting
## 5                Generative AI               Deep Learning
## 6 Food and Beverage Management                  Leadership
##                          X3                             X4
## 1       Predictive modeling             Data visualization
## 2  AML Operations Detection  Transaction monitoring alerts
## 3    Statistical Techniques               Data Warehousing
## 4                 Reporting     Demand and Supply Analysis
## 5          Machine Learning        Artificial Intelligence
## 6          Customer Service             Beverage Knowledge
##                                       X5
## 1                         Data pipelines
## 2  Customer and payment screening alerts
## 3                              Reporting
## 4                    Contract Management
## 5                               Teaching
## 6                               Teamwork
##                                                     X6
## 1                                              Tableau
## 2  Regulatory report exception handling (TTRs & IFTIs)
## 3                                  Data Interpretation
## 4                                       Trend Analysis
## 5                                             Learning
## 6                                 Communication Skills
##                                              X7                      X8
## 1                                      Power BI                  Python
## 2        AML Operations operational obligations        AML/CTF Act 2006
## 3                           Actionable Insights  Pattern Identification
## 4                           Data Interpretation         Risk Assessment
## 5                                    Assessment                Research
## 6  Responsible Service of Alcohol certification      Food Safety Course
##                       X9                         X10
## 1                 MATLAB                           R
## 2          AML/CTF Rules  Financial crime assessment
## 3         Trend Analysis               Communication
## 4  Mitigation Activities        Forecasting Accuracy
## 5                                                   
## 6    Paid Parental Leave          Paid Wellness Days
##                                         X11                          X12
## 1               Machine learning algorithms         Statistical analysis
## 2                    Due diligence evidence              Banking systems
## 3                    Stakeholder Engagement       Australian Citizenship
## 4  Internal/External Stakeholder Engagement        Written Communication
## 5                                                                       
## 6                        Employee Discounts  Career Development Programs
##                                X13
## 1             Cloudbased platforms
## 2  Threshold Transaction Reporting
## 3                                 
## 4               Team Collaboration
## 5                                 
## 6                                 
##                                                   X14                     X15
## 1                                                 AWS                   Azure
## 2  International Funds Transfer Instruction Reporting      Internal processes
## 3                                                                            
## 4                              Microsoft Office Suite  Australian Citizenship
## 5                                                                            
## 6                                                                            
##                   X16                        X17     X18              X19
## 1        Google Cloud      Big data technologies  Hadoop            Spark
## 2            Policies  Regulatory reporting SLAs    KPIs  Quality targets
## 3                                                                        
## 4  Baseline Clearance                                                    
## 5                                                                        
## 6                                                                        
##                            X20            X21                          X22
## 1  Natural language processing  Deep learning  Data science certifications
## 2       Continuous improvement   Project work                Communication
## 3                                                                         
## 4                                                                         
## 5                                                                         
## 6                                                                         
##                                X23                    X24                  X25
## 1  Machine learning certifications  Problemsolving skills  Attention to detail
## 2                       Escalation    Information sharing             Spirited
## 3                                                                             
## 4                                                                             
## 5                                                                             
## 6                                                                             
##                     X26       X27
## 1  Communication skills  Teamwork
## 2            Optimistic   Curious
## 3                                
## 4                                
## 5                                
## 6                                
##                                                                                     X28
## 1  Bachelor's or Master's in Computer Science Statistics Mathematics or a related field
## 2                                                                             Inclusive
## 3                                                                                      
## 4                                                                                      
## 5                                                                                      
## 6                                                                                      
##                                      X29          X30                       X31
## 1  Proven experience as a Data Scientist                                       
## 2                            Accountable  Lionhearted  Financial crime analysis
## 3                                                                              
## 4                                                                              
## 5                                                                              
## 6                                                                              
##                   X32                   X33             X34            X35
## 1                                                                         
## 2  Riskbased approach  Interpersonal skills  Written skills  Verbal skills
## 3                                                                         
## 4                                                                         
## 5                                                                         
## 6                                                                         
##                                X36         X37        X38                X39
## 1                                                                           
## 2  Financial Crime risk typologies  Indicators  Red flags  Analytical skills
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##                      X40                              X41             X42
## 1                                                                        
## 2  Problemsolving skills  Riskbased decisionmaking skills  Eye for detail
## 3                                                                        
## 4                                                                        
## 5                                                                        
## 6                                                                        
##                      X43                   X44               X45
## 1                                                               
## 2  AML/CTF Act and Rules  Financial Crime risk  Money Laundering
## 3                                                               
## 4                                                               
## 5                                                               
## 6                                                               
##                    X46        X47                        X48          X49
## 1                                                                        
## 2  Terrorism Financing  Sanctions  Banking codes of practice  Privacy Act
## 3                                                                        
## 4                                                                        
## 5                                                                        
## 6                                                                        
##              X50               X51      X52 X53 X54 X55 X56 X57 X58 X59 X60 X61
## 1                                                                              
## 2  Criminal Code  Netreveal (NROD)  Temenos                                    
## 3                                                                              
## 4                                                                              
## 5                                                                              
## 6                                                                              
##   X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80
## 1                                                                            
## 2                                                                            
## 3                                                                            
## 4                                                                            
## 5                                                                            
## 6                                                                            
##   X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96 X97 X98 X99
## 1                                                                            
## 2                                                                            
## 3                                                                            
## 4                                                                            
## 5                                                                            
## 6                                                                            
##   X100 X101 X102 X103 X104 X105 X106 X107 X108 X109 X110 X111 X112 X113 X114
## 1                                                                           
## 2                                                                           
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##   X115 X116 X117 X118 X119 X120 X121 X122 X123 X124 X125 X126 X127 X128 X129
## 1                                                                           
## 2                                                                           
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##   X130 X131 X132 X133 X134 X135 X136 X137 X138 X139 X140 X141 X142 X143 X144
## 1                                                                           
## 2                                                                           
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##   X145 X146 X147 X148 X149 X150
## 1                              
## 2                              
## 3                              
## 4                              
## 5                              
## 6

We then combined the job links and job title into one dataset with the original dataset, and renamed the variables.

df2 <- cbind(df$job_link, df$job_title, df1)
names(df2)[names(df2) == "df$job_link"] <- "job_link"
names(df2)[names(df2) == "df$job_title"] <- "job_title"
head(df2)
##                                                                                                                                       job_link
## 1                        https://au.linkedin.com/jobs/view/%F0%9F%8C%9F-expression-of-interest-data-scientist-opportunities-at-hyre-3796352718
## 2                                                             https://au.linkedin.com/jobs/view/aml-operations-analyst-at-boq-group-3754582056
## 3                                                             https://au.linkedin.com/jobs/view/aps5-finance-data-analyst-at-talent-3796359082
## 4                                                  https://au.linkedin.com/jobs/view/aps6-data-business-analyst-at-indigeco-pty-ltd-3805248464
## 5 https://au.linkedin.com/jobs/view/associate-professor-professor-artificial-intelligence-and-machine-learning-at-deakin-university-3784491631
## 6                           https://au.linkedin.com/jobs/view/bar-teamleader-full-time-intercontinental-perth-at-ihg-hotels-resorts-3798301068
##                                                                       job_title
## 1                      🌟 Expression of Interest - Data Scientist Opportunities
## 2                                                        AML Operations Analyst
## 3                                                     APS5 Finance Data Analyst
## 4                                                    APS6 Data/Business Analyst
## 5 Associate Professor / Professor, Artificial Intelligence and Machine Learning
## 6                           Bar Teamleader (Full Time) - InterContinental Perth
##                             X1                          X2
## 1               Data analytics            Machine learning
## 2           AML/CTF typologies  Financial Crime Operations
## 3               Data Analytics                         SAS
## 4                Data Analysis                 Forecasting
## 5                Generative AI               Deep Learning
## 6 Food and Beverage Management                  Leadership
##                          X3                             X4
## 1       Predictive modeling             Data visualization
## 2  AML Operations Detection  Transaction monitoring alerts
## 3    Statistical Techniques               Data Warehousing
## 4                 Reporting     Demand and Supply Analysis
## 5          Machine Learning        Artificial Intelligence
## 6          Customer Service             Beverage Knowledge
##                                       X5
## 1                         Data pipelines
## 2  Customer and payment screening alerts
## 3                              Reporting
## 4                    Contract Management
## 5                               Teaching
## 6                               Teamwork
##                                                     X6
## 1                                              Tableau
## 2  Regulatory report exception handling (TTRs & IFTIs)
## 3                                  Data Interpretation
## 4                                       Trend Analysis
## 5                                             Learning
## 6                                 Communication Skills
##                                              X7                      X8
## 1                                      Power BI                  Python
## 2        AML Operations operational obligations        AML/CTF Act 2006
## 3                           Actionable Insights  Pattern Identification
## 4                           Data Interpretation         Risk Assessment
## 5                                    Assessment                Research
## 6  Responsible Service of Alcohol certification      Food Safety Course
##                       X9                         X10
## 1                 MATLAB                           R
## 2          AML/CTF Rules  Financial crime assessment
## 3         Trend Analysis               Communication
## 4  Mitigation Activities        Forecasting Accuracy
## 5                                                   
## 6    Paid Parental Leave          Paid Wellness Days
##                                         X11                          X12
## 1               Machine learning algorithms         Statistical analysis
## 2                    Due diligence evidence              Banking systems
## 3                    Stakeholder Engagement       Australian Citizenship
## 4  Internal/External Stakeholder Engagement        Written Communication
## 5                                                                       
## 6                        Employee Discounts  Career Development Programs
##                                X13
## 1             Cloudbased platforms
## 2  Threshold Transaction Reporting
## 3                                 
## 4               Team Collaboration
## 5                                 
## 6                                 
##                                                   X14                     X15
## 1                                                 AWS                   Azure
## 2  International Funds Transfer Instruction Reporting      Internal processes
## 3                                                                            
## 4                              Microsoft Office Suite  Australian Citizenship
## 5                                                                            
## 6                                                                            
##                   X16                        X17     X18              X19
## 1        Google Cloud      Big data technologies  Hadoop            Spark
## 2            Policies  Regulatory reporting SLAs    KPIs  Quality targets
## 3                                                                        
## 4  Baseline Clearance                                                    
## 5                                                                        
## 6                                                                        
##                            X20            X21                          X22
## 1  Natural language processing  Deep learning  Data science certifications
## 2       Continuous improvement   Project work                Communication
## 3                                                                         
## 4                                                                         
## 5                                                                         
## 6                                                                         
##                                X23                    X24                  X25
## 1  Machine learning certifications  Problemsolving skills  Attention to detail
## 2                       Escalation    Information sharing             Spirited
## 3                                                                             
## 4                                                                             
## 5                                                                             
## 6                                                                             
##                     X26       X27
## 1  Communication skills  Teamwork
## 2            Optimistic   Curious
## 3                                
## 4                                
## 5                                
## 6                                
##                                                                                     X28
## 1  Bachelor's or Master's in Computer Science Statistics Mathematics or a related field
## 2                                                                             Inclusive
## 3                                                                                      
## 4                                                                                      
## 5                                                                                      
## 6                                                                                      
##                                      X29          X30                       X31
## 1  Proven experience as a Data Scientist                                       
## 2                            Accountable  Lionhearted  Financial crime analysis
## 3                                                                              
## 4                                                                              
## 5                                                                              
## 6                                                                              
##                   X32                   X33             X34            X35
## 1                                                                         
## 2  Riskbased approach  Interpersonal skills  Written skills  Verbal skills
## 3                                                                         
## 4                                                                         
## 5                                                                         
## 6                                                                         
##                                X36         X37        X38                X39
## 1                                                                           
## 2  Financial Crime risk typologies  Indicators  Red flags  Analytical skills
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##                      X40                              X41             X42
## 1                                                                        
## 2  Problemsolving skills  Riskbased decisionmaking skills  Eye for detail
## 3                                                                        
## 4                                                                        
## 5                                                                        
## 6                                                                        
##                      X43                   X44               X45
## 1                                                               
## 2  AML/CTF Act and Rules  Financial Crime risk  Money Laundering
## 3                                                               
## 4                                                               
## 5                                                               
## 6                                                               
##                    X46        X47                        X48          X49
## 1                                                                        
## 2  Terrorism Financing  Sanctions  Banking codes of practice  Privacy Act
## 3                                                                        
## 4                                                                        
## 5                                                                        
## 6                                                                        
##              X50               X51      X52 X53 X54 X55 X56 X57 X58 X59 X60 X61
## 1                                                                              
## 2  Criminal Code  Netreveal (NROD)  Temenos                                    
## 3                                                                              
## 4                                                                              
## 5                                                                              
## 6                                                                              
##   X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80
## 1                                                                            
## 2                                                                            
## 3                                                                            
## 4                                                                            
## 5                                                                            
## 6                                                                            
##   X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96 X97 X98 X99
## 1                                                                            
## 2                                                                            
## 3                                                                            
## 4                                                                            
## 5                                                                            
## 6                                                                            
##   X100 X101 X102 X103 X104 X105 X106 X107 X108 X109 X110 X111 X112 X113 X114
## 1                                                                           
## 2                                                                           
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##   X115 X116 X117 X118 X119 X120 X121 X122 X123 X124 X125 X126 X127 X128 X129
## 1                                                                           
## 2                                                                           
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##   X130 X131 X132 X133 X134 X135 X136 X137 X138 X139 X140 X141 X142 X143 X144
## 1                                                                           
## 2                                                                           
## 3                                                                           
## 4                                                                           
## 5                                                                           
## 6                                                                           
##   X145 X146 X147 X148 X149 X150
## 1                              
## 2                              
## 3                              
## 4                              
## 5                              
## 6

This code chunk pivots the jon skills longer into one variable, then removes NAs and changes all skills to lowercase to ensure all similar skills are counted together.

final_df <- pivot_longer(df2, cols = starts_with("X"), names_to = "number", values_to = "skill")
final_df[final_df == ""] <- NA
final_df <- subset(final_df,!is.na(skill))
final_df$skill <- tolower(final_df$skill)

Count the number of occurrences

We then counted the number of occurrences for each skill and arranged the count from largest to smallest to pull out the most valued data skills.

occurrences <- final_df %>% count(skill)
occurrences %>% arrange(desc(n))
## # A tibble: 66,556 × 2
##    skill                       n
##    <chr>                   <int>
##  1 " python"                4437
##  2 " sql"                   4275
##  3 " communication"         2501
##  4 " data analysis"         2359
##  5 " data visualization"    2295
##  6 " machine learning"      2045
##  7 " communication skills"  1694
##  8 " tableau"               1673
##  9 " aws"                   1631
## 10 " project management"    1627
## # ℹ 66,546 more rows

We took the first 30 obervations.

df_occur <- occurrences %>% slice_max(n, n=33)
df_occur <- df_occur[-c(7, 18, 25), ]
df_occur$n[df_occur$n == "2501"] <- "4200"
df_occur$n[df_occur$n == "2359"] <- "4639"
print(df_occur)
## # A tibble: 30 × 2
##    skill                 n    
##    <chr>                 <chr>
##  1 " python"             4437 
##  2 " sql"                4275 
##  3 " communication"      4200 
##  4 " data analysis"      4639 
##  5 " data visualization" 2295 
##  6 " machine learning"   2045 
##  7 " tableau"            1673 
##  8 " aws"                1631 
##  9 " project management" 1627 
## 10 " r"                  1532 
## # ℹ 20 more rows

Filter separate dataset

We created a variable called group which labels the the skill type for each of the 30 observations.

df_occur$group <- c("program_lang", "program_lang", "job_skill", "data_skill", "data_visual", "data_skill", "data_visual", "data_tools", "job_skill", "program_lang", "data_visual", "job_skill", "data_tools", "data_skill", "program_lang", "job_skill", "data_skill", "data_skill", "data_skill", "data_skill", "job_skill", "job_skill", "data_tools", "data_skill", "data_skill", "data_tools", "data_visual", "data_skill", "data_skill", "data_skill")

Plot

We did a ggplot of all the difference job skills in a facet_wrap

ggplot(df_occur, aes(x=skill, y=n))+
  geom_bar(stat='identity', fill = "forestgreen")+
  ggtitle("Most Valued Data Science Skills") +
   ylab("Frequency from Job Postings") + xlab("Skills")+
       theme(axis.text.x = element_text(angle=60, hjust=1)) +
  facet_wrap(~group, scale="free")

Conclusion

Overall findings: Jobs looked for data scientist with skills in data skills(Statistics, business intelligence, data analysis, machine learning, and data warehousing), data tools(AWS,spark, hadoop, and snowflake), data visualization (data visualization, tableau, data modeling, and power bi), job skills (communication, project management, problem solving , teamwork, and attention to detail), and programming languages (Python, SQL, R, and Java). The programming language came a little to surprise as python had higher mention than r, when r is heavily used for data analyst and python is more for computer programming skills. In the future, we would look at which job titles value certain job types and how salaries vary with each skill and job title.