Project 3 - Analyzing Valued Skills in Data Science: An Examination of LinkedIn Job Postings

Objective

We are attempting to use data in an effort to answer the question: “Which are the most valued data science skills?”

Our strategy, generally speaking, is to look at a dataset with thousands of LinkedIn job postings. We can then filter this dataset to include only data science (or data science-adjacent roles). Then, from the description of these jobs, we can extract the skills that the employers are looking for.

To be a bit more specific, we will attempt to answer two questions:

Overall, what are the skills that employers are lookign for in their data scientists?
What are the skills that the most desirable data science jobs require/prefer?

It’s worth pointing out, we originally wanted to look at another question, namely how salaries of different jobs might be associated with different skills. However, there was a lot of salary data missing, and this approach was just not feasible.

Setting up

library(DBI)
library(RMySQL)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)

We start by establishing a connection to the Amazon RDS database:

con <- dbConnect(RMySQL::MySQL(), 
                 dbname = "linkedin_job_postings_2023", 
                 host = "database-1-instance-1.cngfftmjinac.us-east-2.rds.amazonaws.com", 
                 port = 3306, 
                 user = "admin", 
                 password = "DDbna}rt2R0h")
dbListTables(con)

## [1] "benefits"             "cleaned"              "companies"           
## [4] "company_industries"   "company_specialities" "employee_counts"     
## [7] "job_industries"       "job_postings"         "job_skills"

Loading, cleaning the tables

suppressWarnings({
query <- "SELECT * FROM benefits"
benefits <- dbGetQuery(con, query)

query <- "SELECT * FROM cleaned"
cleaned <- dbGetQuery(con, query)

query <- "SELECT * FROM companies"
companies <- dbGetQuery(con, query)

query <- "SELECT * FROM company_industries"
company_industries <- dbGetQuery(con, query)

query <- "SELECT * FROM company_specialities"
company_specialities <- dbGetQuery(con, query)

query <- "SELECT * FROM employee_counts"
employee_counts <- dbGetQuery(con, query)

query <- "SELECT * FROM job_industries"
job_industries <- dbGetQuery(con, query)

query <- "SELECT * FROM job_postings"
job_postings <- dbGetQuery(con, query)

query <- "SELECT * FROM job_skills"
job_skills <- dbGetQuery(con, query)

#closing connection
dbDisconnect(con)

})

## [1] TRUE

The job_postings table is of special interest. However, we are only interested in the postings for data science (or at least data-related) roles. We’ll use the grep1 function to include all titles that have the word “data” in them.

ds_jobs <- job_postings %>%
  filter(grepl("Data", title, ignore.case = TRUE))

head(ds_jobs)

##       job_id company_id                             title
## 1 3586162459   69642092                Teradata Developer
## 2 3690692186      61242 Seasonal Payroll/Data Entry Clerk
## 3 3691795980    7573454                     Data Engineer
## 4 3692302089      37768  Data Scientist/ Product Analyst 
## 5 3692363778    2474970         Data Analytics Consultant
## 6 3693028967   89721371             Datacenter Technician
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Months Overview of should be able to develop and support application solutions with a focus on Teradata for a Financial conceptual understanding of the context and business Should be able to understand the business produce Data and design and implement code following the best to perform data quality checks methodically to understand how to accurately utilize client to communicate results and methodology with the project team and Should be able to work in to meet deadlines and thrive in a banking solutions for applications involving large and complex data and provides reconciliation and test Required SkillsShould be and be able to well under stringent deadlinesFinancial Domain KnowledgeExperience in Mainframes and interacting with business understanding requirements and providing quick have excellent exposure to project to learn and adapt to changes Desired Certifications
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Universal Screen specialty is marketing home and much more direct to the consumer through our websites and social This results in a massive holiday rush and hundreds of seasonal jobs to be filled in our warehouse operations very We are looking for a Team Member who is enthusiastic and dynamic to help meet our HRIS needs during this This position will be responsible for updating multiple HR systems with a high volume of employment status including new and Additional responsibilities include weekly review and update of time processing of internal and temporary and attendance tracking and This position is seasonal in We are looking for the right candidate to start in August with an anticipated end to the assignment in in high volume data entryAccuracy and attention to detailExceptional communication skillsCreative problem solving skillsA Professional and Positive attitudeAbility to and adjust to frequently changing prioritiesExperience in payroll processing and HRIS systems preferredHighly organizedEffectively interact with people at all levelsThe ability to work in Microsoft Excel and Outlook Apply today for immediate
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Job and launch extremely efficient and reliable data pipelines to move data and to provide intuitive analytics to our partner data more discoverable and easy to use for Data Scientists and Analysts across the service operations with other engineers and Data Scientists to discover the best and solve issues in our existing data pipelines and envision and build their understanding of one or more of the or JavaStrong understanding of SQLBroad knowledge of the data infrastructure ecosystemExperience with Hadoop or other working with large data volumesExperience in building Data Warehouses and data with any of the following is a or HiveInterest in learning Data Bricks
## 4 Looking for candidates with experience in data science or product analyst roles with a strong background in data statistical modelling and specific experience working with Note Need some one who can work on Data Product Analyst San CA to Months Project Overview Drive analysis for Fitbit product teams focused on and Analyses include optimizing user lifetime improving usage and evaluating content and suggesting future service innovations for Deliver effective presentations of findings and recommendations to multiple levels of creating visual displays of quantitative Overall Responsibilities Use causal inferential methods to quantify impact on product deliveries when experimentation is not Collaborate with stakeholders to formulate and complete full cycle analysis that includes data ongoing scaled deliverables and Help Google focus on key decisions to improve products and Top Daily Responsibilities Deep dive on core product user and relation to user satisfaction and Conduct data analysis to make business recommendations impact Develop and automate iteratively build and prototype dashboards to provide insights at solving for business Mandatory Skills Degree or equivalent experience years of experience Business acumen Knowledge of structured communicating risk considers business leverages for consumer Ability to extract relevant information from reading code in one or more core languages and frameworks and ability to leverage the code as a resource to create work output for users or Experience working with highly unstructured messy datasets and ability to clean and derive Communication and active Ability to clearly explain stats or domain knowledge to people not familiar with the subject matter or who lack a quant This includes the ability to explain reasons in a logical way that is very easy to understand by Leverage communication skills and active listening to manage stakeholders and to set proper technical direction for teams or Data analysis and Ability to analyze draw generate alternatives and and evaluate This includes the ability to use data to add value to business planning and Data and Ability to extract data and validate raw data to ensure it is valid and reliable and ability to clean data based on validation criteria and prepare for further Ability to define and rationalize appropriate create pipelines and dashboards that tell a Ability to measure the success of a given Knowledge of different ability to select the appropriate approach for the problem understanding of the broader context of and increasing the value of the business and revenue Modeling Ability to apply multiple approaches and select the right analysis for the Understanding of the mathematical and statistical concepts underpinning and Knowledge of essential statistical methods used to analyze data descriptive ability to identify and conduct appropriate basic statistical analysis to determine the basic parameters of a set of data and solve Product analysis Ability to interact and respectfully with especially senior to and clarify concerns or issues regarding an existing or ability to effectively address difficult handle pushback from a and maintain a professional demeanor while engaging in challenging or sometimes In addition to influencing this includes actively managing priorities across and Project execution and Ability to proactively communicate insights and influence stakeholders and subject matter experts to inform Ability to convert and uncover real world problems within the business context into trackable metrics or a structured as well as the ability to generate insights from data analysis in a way that is meaningful to the Ability to prepare effective presentations in content and and to speak competently to the level of the Ability to identify and debug product issues and user including the ability to carry out root cause analysis and quantitatively assess critical user think about and Please rate yourself on a scale of being the Please list the of years of experience you have with that particular Skills Rating Years of Experience with Skill Business acumen intuition extraction Communication and active listening Data analysis and synthesis Data and cleaning Analysis Modeling design Product analysis leadership Project execution and influence Provide me below information Name of the Candidate Location Address Number ID Rate on or Offers in Pipeline Availability Availability Status Warm Zainab Saba Talent Acquisition Specialist US XTN
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 About the CompanyDiLytics is a leading Information Technology Services Provider completely focused on Business Data ETL Data Integration and Enterprise Performance Management We have been growing for years with global offices in the Canada and Data Analytics CA implementing Analytics in one or more of the following Supply Human Sales Order implementing Analytics on one or more of the following data sources Oracle JD implementing Analytics on one or more of the following tools Power Oracle Analytics implementing Analytics on one or more of the Data Warehouse platforms Oracle Azure Data implementing ETL Data Integration in one or more of the following tools Oracle Data Azure Data years of IT to do offshore coordination working during hours that overlap with India time zone to a reasonable Travel whenever
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 GA About the We are looking for a dedicated Datacenter Technician to join our dynamic This is a contract position with the potential to transition to a role based on The ideal candidate is expected to be available at all times of the Key and unracking servers and other equipment in the data remote hands for troubleshooting and maintaining the networking tasks to ensure smooth communication within the with the IT team for the effective functioning of servers and promptly to any technical Qualifications and be based in or willing to relocate to experience as a Datacenter with racking and unracking understanding of networking protocols and experience with Integrated Lights Out is a to work flexibly throughout the communication both written and Contract This position starts as a contract with the opportunity for placement based on performance and mutual
##   max_salary med_salary min_salary pay_period formatted_work_type
## 1          0          0          0                       Contract
## 2          0          0          0                      Temporary
## 3          0          0          0                       Contract
## 4         80          0         70     HOURLY            Contract
## 5          0          0          0                      Full-time
## 6         50          0         15     HOURLY            Contract
##            location applies original_listed_time remote_allowed views
## 1     United States       1                    0              1    56
## 2        Hudson, OH       5                    0                  325
## 3     United States       5                    0              1   101
## 4 San Francisco, CA       7                    0                   37
## 5    Sacramento, CA       7                    0              1    26
## 6    Alpharetta, GA       1                    0                   34
##                                                         job_posting_url
## 1 https://www.linkedin.com/jobs/view/3586162459/?trk=jobs_biz_prem_srch
## 2 https://www.linkedin.com/jobs/view/3690692186/?trk=jobs_biz_prem_srch
## 3 https://www.linkedin.com/jobs/view/3691795980/?trk=jobs_biz_prem_srch
## 4 https://www.linkedin.com/jobs/view/3692302089/?trk=jobs_biz_prem_srch
## 5 https://www.linkedin.com/jobs/view/3692363778/?trk=jobs_biz_prem_srch
## 6 https://www.linkedin.com/jobs/view/3693028967/?trk=jobs_biz_prem_srch
##                                                                                                               application_url
## 1                                                                                                                            
## 2 https://recruiting.paylocity.com/recruiting/jobs/Details/1859929/Universal-Screen-Arts-Inc/Seasonal-PayrollData-Entry-Clerk
## 3                                                                                                                            
## 4                                                                                                                            
## 5                                                                                                                            
## 6                                                                                                                            
##     application_type expiry closed_time formatted_experience_level skills_desc
## 1 ComplexOnsiteApply                  0                                       
## 2       OffsiteApply                  0                                       
## 3 ComplexOnsiteApply                  0                                       
## 4 ComplexOnsiteApply                  0           Mid-Senior level            
## 5 ComplexOnsiteApply                  0                                       
## 6 ComplexOnsiteApply                  0                                       
##   listed_time posting_domain sponsored work_type currency compensation_type
## 1           0                        0  CONTRACT                           
## 2           0                        0 TEMPORARY                           
## 3           0                        0  CONTRACT                           
## 4           0                        0  CONTRACT      USD       BASE_SALARY
## 5           0                        0 FULL_TIME                           
## 6           0                        0  CONTRACT      USD       BASE_SALARY

Analysis

Part 1: A look at Views

Soon, we will take a look at all data science-related roles. In this section, we take a slightly differennt approach. After all, we are looking for the most desirable data science skills. Perhaps, then, we can look at the most desirable jobs and see which skills are associated with those jobs. There are different ways to rank the desirability of jobs. We opted to think about views–where presumably a job is more desirable if the posting received more views.

The LinkedIn dataset is sparse; despite the “skills_desc” column, it is not entirely obvious what skills are associated with most jobs. As such, we will work to extract the skills from the postings’ descriptions. To accomplish this, we need to match words in the descriptions to words that refer to skills. We took part of an html document from a website that discusses resume skills: https://enhancv.com/resume-skills/

library(rvest)

html_content <- read_html('https://raw.githubusercontent.com/hbedros/data607_prj3/gss/enhancv_excerpt.html')
li_items <- html_content %>%
  html_nodes("li") %>%
  html_text() %>% 
  str_squish()   #because there is otherwise \n\n at the end of lines

li_items = tolower(li_items)

sample(li_items, 10)

##  [1] "goal setting"                                 
##  [2] "leadership"                                   
##  [3] "understanding different perspectives"         
##  [4] "angular"                                      
##  [5] "preparation"                                  
##  [6] "fostering a sense of ownership and confidence"
##  [7] "cloud infrastructure"                         
##  [8] "influence"                                    
##  [9] "critical thinking:"                           
## [10] "confidence"

Now we have this collection of skills. We need a function to extract matches from individual job postings:

extract_skills <- function(description) {
  description <- tolower(description) #to match li_items case 
  skills <- unlist(strsplit(description, "\\s+"))  #splitting using \\s, turn to vector
  skills <- skills[skills %in% li_items]  #filtering for skills in skills_list
  return(paste(unique(skills), collapse = ", "))  #combine matches into one string
}

Now we can apply this function, and create a new column in the dataframe with the extracted skills. While we’re at it, we will remove columns from the dataframe that aren’t as relevant to our analysis and add the company name for clarity.

ds_jobs$skills <- sapply(ds_jobs$description, extract_skills, USE.NAMES = FALSE)
ds_jobs_clean <- ds_jobs[c("job_id", "company_id", "title", "views", "skills")] %>%
  left_join(select(companies, company_id, name), by = "company_id") %>% 
  relocate(name, .after = 'company_id') %>% 
  rename(company = name) %>% 
   select(-company_id)

head(ds_jobs_clean, 10)

##        job_id                     company
## 1  3586162459                Sapience Inc
## 2  3690692186  Universal Screen Arts Inc.
## 3  3691795980                        <NA>
## 4  3692302089 Milestone Technologies Inc.
## 5  3692363778                    DiLytics
## 6  3693028967              CatalystVM LLC
## 7  3693040943       TekValue IT Solutions
## 8  3693041960            Publicis Sapient
## 9  3693043664       Premier International
## 10 3693043742            Publicis Sapient
##                                             title views
## 1                              Teradata Developer    56
## 2               Seasonal Payroll/Data Entry Clerk   325
## 3                                   Data Engineer   101
## 4                Data Scientist/ Product Analyst     37
## 5                       Data Analytics Consultant    26
## 6                           Datacenter Technician    34
## 7                             Azure Data Engineer   534
## 8                            Senior Data Engineer    86
## 9  Senior Oracle HCM Data & Technology Consultant   116
## 10                           Senior Data Engineer    86
##                                                                     skills
## 1                                                            understanding
## 2                                                     communication, excel
## 3                                                            understanding
## 4                      communication, understanding, influence, leadership
## 5                                                                         
## 6                                 networking, communication, understanding
## 7                                                                       go
## 8                                                                      sql
## 9  respect, coaching, leadership, excel, communication, go, accountability
## 10                                                                     sql

A peek at the dataframe reveals a wide range of skills. Again, however, we are only interested in the postings that had a lot of views–these reflect the most desirable roles which plausibly might demand the most desirable skills. Let’s look at only the top 25% of listings

upper_quart <- quantile(ds_jobs_clean$views, 0.75, na.rm = TRUE)

many_views <- ds_jobs_clean %>% 
  filter(views >= upper_quart)

many_views

##        job_id                         company
## 1  3690692186      Universal Screen Arts Inc.
## 2  3693040943           TekValue IT Solutions
## 3  3693043839            Publix Super Markets
## 4  3693044837                  Insight Global
## 5  3693045284                   Roth Staffing
## 6  3693045603      Vision Technology Services
## 7  3693045756   Irvine Technology Corporation
## 8  3693045914                 The Judge Group
## 9  3693046487                      SSi People
## 10 3693046569                The CARIAN Group
## 11 3693046734                        LCG Inc.
## 12 3693047014                       MethodHub
## 13 3693047232                     LTIMindtree
## 14 3693047544                 Ztek Consulting
## 15 3693047702                  Sud Recruiting
## 16 3693048239                  Insight Global
## 17 3693048249                            Noom
## 18 3693049654                  Insight Global
## 19 3693049706                       LS Direct
## 20 3693050917                         Akkodis
## 21 3693051072                The CARIAN Group
## 22 3693051129                         Maxonic
## 23 3693051482                       Sterrofox
## 24 3693052150                         Amentum
## 25 3693053211                      AccessHope
## 26 3693053263               US Tech Solutions
## 27 3693055046                     Healthfirst
## 28 3693055168       Zillion Technologies Inc.
## 29 3693056161                 The Judge Group
## 30 3693056244                          Prosum
## 31 3693056342                         AnyRoad
## 32 3693062969                  Insight Global
## 33 3693063985                  ASK Consulting
## 34 3693065764                      HomeSphere
## 35 3693067708                     Solari Inc.
## 36 3693069014             Prime Team Partners
## 37 3693069309              Signify Technology
## 38 3693070035                         AMBRION
## 39 3693070192                    ROR Partners
## 40 3693071184                     Cost.U.Less
## 41 3693073454                   Dynamite Jobs
## 42 3693074204                   Community.com
## 43 3693586591             The Intersect Group
## 44 3694103473                 Multi Media LLC
## 45 3697342482                 Backpack Talent
## 46 3697353487                 City of Atlanta
## 47 3697356241                 Performant Corp
## 48 3697363377                     Crunchyroll
## 49 3697381530                            Lyft
## 50 3697382316                            Lyft
## 51 3697386529                        Phreesia
## 52 3697388794             Booz Allen Hamilton
## 53 3697390350                      Mastercard
## 54 3697394852                  Frontdoor Inc.
## 55 3697395737                        Spectrum
## 56 3697397232                      GE Digital
## 57 3699063216                            ALKU
## 58 3699074702                          Google
## 59 3699075688                          Google
## 60 3699075689                          Google
## 61 3699077610                          Google
## 62 3699077628                          Google
## 63 3699078392                          Google
## 64 3699079423                          Google
## 65 3701154588                           GRAIL
## 66 3701198908              Veracity Solutions
## 67 3701300757        Software Technology Inc.
## 68 3701300819                 YUPRO Placement
## 69 3701302398     Lendmark Financial Services
## 70 3701304425                          Huxley
## 71 3701306205                  LeadStack Inc.
## 72 3701307307 Quantum World Technologies Inc.
## 73 3701308059                          Phaxis
## 74 3701308803                 Geomagical Labs
## 75 3701309803                  ActiveSoft Inc
## 76 3701312798                    DISH Network
## 77 3701313682                         Equifax
## 78 3701314004                   Agile Datapro
## 79 3701315406       Davis Polk & Wardwell LLP
## 80 3701315556        Solve IT Strategies Inc.
## 81 3701315560                       Snowflake
## 82 3701318091                       Snowflake
## 83 3701318959                     Levy Search
## 84 3701318976          springheadtechnologies
## 85 3701319581                        Randstad
## 86 3701320040                  AccruePartners
## 87 3701321316                    ConnectiCare
## 88 3701322385             Pinnacle Group Inc.
## 89 3701323706                 NBC Sports Next
## 90 3701323737             Pinnacle Group Inc.
## 91 3701325020                        DocuSign
## 92 3701325300                  RBW Consulting
## 93 3701369746                        LinQuest
##                                                                                                                                 title
## 1                                                                                                   Seasonal Payroll/Data Entry Clerk
## 2                                                                                                                 Azure Data Engineer
## 3                                                                              Data Architect â€“ Enterprise Architecture Team-REMOTE
## 4                                                                                                               Business Data Analyst
## 5                                                                                                 Project Manager (Data Center Move) 
## 6                                                                                                                      Data Scientist
## 7                                                                                                          Data Governance Specialist
## 8                                                                                                                 DataStage Engineer 
## 9                                                                                                                        Data Analyst
## 10                                                                                                        Data Scientist â€“ (Remote)
## 11                                                                                                                 Database Developer
## 12                                                                                                                       Data Analyst
## 13                                                                                                                 Big Data Developer
## 14                                                                                                    Data Scientists / AIML Engineer
## 15                                                                                                 Marketing Sciences, Data Scientist
## 16                                                                                                             Data Analytics Manager
## 17                                                                                                                Lead Data Scientist
## 18                                                                                                                      Data Engineer
## 19                                                                                                                       Data Analyst
## 20                                                                                                                       Data Analyst
## 21                                                                                                 Power BI Data Analyst â€“ (Remote)
## 22                                                                                             Global Data Insights Analyst (Only W2)
## 23                                                                                                                 AWS Data Engineer 
## 24                                                                                                                     Data Scientist
## 25                                                                                                           Oncology Data Abstractor
## 26                                                                                                                      Data Engineer
## 27                                                                                                          IT Data Analytics Analyst
## 28                                                    Text Data Labeling Analyst - Machine Learning (Hybrid Role 1 Day a week Onsite)
## 29                                                                                                            Data Analytics Engineer
## 30                                                                                                UX/UI Designer - Data Intelligence 
## 31                                                                                             Senior Backend Software Engineer, Data
## 32                                                                                                                   Sr Data Engineer
## 33                                                                                    Associate Data Analyst - Analytics and Insights
## 34                                                                                                     Senior Azure Database Engineer
## 35                                                                                                        Manager, Data and Reporting
## 36                                                                                                          Data Analytics Specialist
## 37                                                                                                 Senior Data Engineer (Scala/Spark)
## 38                                                                                                                Senior Data Analyst
## 39                                                                                                            Principal Data Engineer
## 40                                                                                                                       Data Analyst
## 41                                                                                                                Senior Data Analyst
## 42                                                                                                               Senior Data Engineer
## 43                                                                                                                      Data Engineer
## 44                                                                                                               Product Data Analyst
## 45                                                                                            Environmental Engineer & Data Scientist
## 46                                                                                                             Data/Reporting Analyst
## 47                                                                                                   Database Administrator II (MSDA)
## 48                                                                                                    Senior Data Analyst - eCommerce
## 49                                                                                                 Senior Data Scientist - Algorithms
## 50                                                                                 Data Analyst, Real-Time Supply Management (Hybrid)
## 51                                                                                              Senior Product Manager, Data Products
## 52                                                                                                                     Data Scientist
## 53                                                                                       Director, Data Strategy, Retail and Commerce
## 54                                                                                                                Senior Data Analyst
## 55                                                                                                          Supply Chain Data Analyst
## 56                                                                                                       Implementation Data Engineer
## 57 Data Privacy analyst (NOT ATTORNEY) with 1-3 years of experience ONLY not a BI/technical role please no data architects/developers
## 58                                                                                                              Senior Data Scientist
## 59                                                                                                              Senior Data Scientist
## 60                                                                                                              Senior Data Scientist
## 61                                                                                                              Senior Data Scientist
## 62                                                                                                         Data Engineer, Google Nest
## 63                                                                                                              Senior Data Scientist
## 64                                                                                                              Senior Data Scientist
## 65                                                                                                    Eligibility Data Analyst # 3219
## 66                                                                                                                  Database Engineer
## 67                                                                                                                      Data Engineer
## 68                                                                                                          Data Optimization Analyst
## 69                                                                                                             Credit Data Analyst II
## 70                                                                                              Software Data Engineer (Scala/Python)
## 71                                                                                                                Azure Data Engineer
## 72                                                                      100% Remote _ Azure Data Architect  _ Immediate Interview !!!
## 73                                                                                                  Data Integration Engineer- REMOTE
## 74                                                                                                   Data Analyst - Remote / Contract
## 75                                                                                                                      Data Engineer
## 76                                                                                                         New Grad - Data Analyst II
## 77                                                                                                                      Data Engineer
## 78                                                                                                  Data Science & AI - ML Internship
## 79                                                                                RPA Developer, Business Operations & Data Analytics
## 80                                                                                                            Data Engineer (W2 ONLY)
## 81                                                                               Software Engineer - Database Engineering (San Mateo)
## 82                                                                                 Software Engineer - Database Engineering (Seattle)
## 83                                                                                                              Senior Data Scientist
## 84                                                                                                       Senior Data Analyst (Onsite)
## 85                                                                                                  Call Center Data Entry Specialist
## 86                                                                                                               Pricing Data Analyst
## 87                                                                                                    Healthcare Business Data Leader
## 88                                                                                                                      Data Analyst 
## 89                                                                                                 Data Engineer II - NBC Sports Next
## 90                                                                                                  Business Analyst/Data Remediation
## 91                                                                                                                      Data Engineer
## 92                                                                                                       Senior Clinical Data Manager
## 93                                                                                       Data Scientist / Operations Research Analyst
##    views
## 1    325
## 2    534
## 3    291
## 4    974
## 5    337
## 6    347
## 7    311
## 8    273
## 9    350
## 10  1334
## 11   199
## 12   193
## 13   209
## 14   333
## 15   525
## 16   196
## 17   956
## 18   568
## 19   724
## 20   271
## 21  1459
## 22   411
## 23   378
## 24   215
## 25   235
## 26   612
## 27   308
## 28   195
## 29   508
## 30   386
## 31   496
## 32   289
## 33   256
## 34   283
## 35   458
## 36   237
## 37   232
## 38   216
## 39   271
## 40   307
## 41   462
## 42   323
## 43   213
## 44   297
## 45   272
## 46   241
## 47   214
## 48   386
## 49   514
## 50  1038
## 51   293
## 52   248
## 53   330
## 54   197
## 55   260
## 56   276
## 57   234
## 58   258
## 59   224
## 60   279
## 61   219
## 62   945
## 63   299
## 64   256
## 65   301
## 66   559
## 67   516
## 68   237
## 69   244
## 70   275
## 71   906
## 72   263
## 73   630
## 74  1138
## 75   362
## 76   822
## 77   316
## 78   566
## 79   196
## 80   249
## 81   507
## 82   790
## 83   305
## 84   365
## 85   408
## 86   195
## 87   375
## 88   514
## 89   301
## 90   214
## 91   419
## 92   323
## 93   215
##                                                                                            skills
## 1                                                                            communication, excel
## 2                                                                                              go
## 3                                                                        mentoring, understanding
## 4                                                                                   mentoring, go
## 5                                                        leadership, communication, understanding
## 6                                                       vision, leadership, understanding, python
## 7                                                               python, leadership, communication
## 8                                                                                     python, sql
## 9                                                                                   documentation
## 10                                                                                  understanding
## 11                                                                                            sql
## 12                                                              sql, understanding, communication
## 13                                                                                               
## 14                                                   containerization, understanding, reliability
## 15                                                                           influence, curiosity
## 16                                                                                          excel
## 17                                                                                         python
## 18                                                                             understanding, sql
## 19                                                                   documentation, communication
## 20                                                                                            sql
## 21                                                                             understanding, sql
## 22                                                     research, sql, communication, storytelling
## 23                                                                          understanding, python
## 24                                                                  presentation, tableau, matlab
## 25                                        leadership, documentation, integrity, honesty, research
## 26                                                                                            sql
## 27           communication, understanding, sql, tableau, excel, preparation, presentation, vision
## 28                                                                                               
## 29                                          python, sql, understanding, networking, communication
## 30                                                                   research, figma, prototyping
## 31                                                             understanding, communication, ruby
## 32                                                                             understanding, sql
## 33                                                                           sql, python, tableau
## 34                                              sql, influence, prototyping, presentation, vision
## 35                                                   tableau, sql, compassion, leadership, vision
## 36                                                  vision, sql, python, integrity, communication
## 37                                                                                               
## 38                                                      tableau, presentation, preparation, excel
## 39                                              understanding, documentation, sql, python, docker
## 40                                                             organization, excel, communication
## 41                                                                     research, clarity, respect
## 42 organization, leadership, integrity, confidence, python, understanding, tableau, communication
## 43                                                                                  communication
## 44                                                                          research, sql, python
## 45                                                                                               
## 46                                                                organization, leadership, excel
## 47                                       sql, understanding, integrity, leadership, communication
## 48                                             reasoning, understanding, sql, excel, transparency
## 49         organization, understanding, collaboration, python, communication, vision, flexibility
## 50                                   influence, research, sql, communication, vision, flexibility
## 51                            understanding, leadership, vision, sql, communication, organization
## 52                                               research, leadership, understanding, flexibility
## 53                                          organization, understanding, communication, integrity
## 54                                      understanding, sql, python, communication, creativity, go
## 55                                                                                          excel
## 56                                  sql, tableau, leadership, communication, initiative, research
## 57                                                                   understanding, communication
## 58                                                                                     evaluation
## 59                                                                                     evaluation
## 60                                                                                     evaluation
## 61                                                                                     evaluation
## 62                                                      communication, presentation, organization
## 63                                                                                     evaluation
## 64                                                                                     evaluation
## 65                           understanding, communication, evaluation, agility, influence, vision
## 66                                                                             sql, python, excel
## 67                                                                                               
## 68                                                                   communication, collaboration
## 69                                                         preparation, excel, communication, sas
## 70                                                                           organization, python
## 71                                                      python, sql, understanding, communication
## 72                                                                                  communication
## 73                                                                                 integrity, sql
## 74                                                   integrity, sql, understanding, communication
## 75                                                                                            sql
## 76                                                 organization, communication, creativity, excel
## 77                                   organization, vision, understanding, sas, sql, communication
## 78                                                                          python, understanding
## 79                                                                     sql, python, communication
## 80                                                                                            sql
## 81                                                                                            sql
## 82                                                                                            sql
## 83                                                                                         python
## 84                                                                                   sql, tableau
## 85                                                            excel, communication, collaboration
## 86                                                      confidence, excel, communication, tableau
## 87                                      leadership, documentation, sql, excel, sas, communication
## 88                                                    research, understanding, leadership, vision
## 89                                              influence, python, understanding, sql, initiative
## 90                    research, communication, presentation, collaboration, documentation, vision
## 91                                                                    communication, transparency
## 92                                                                        research, documentation
## 93                                                   research, understanding, flexibility, vision

Let’s see which skills appear the most using tidytext:

library(tidytext)
skills_count <- many_views %>%
  unnest_tokens(word, skills) %>%
  count(word, sort = TRUE)

skills_count

##                word  n
## 1     communication 36
## 2               sql 36
## 3     understanding 31
## 4            python 20
## 5             excel 14
## 6        leadership 13
## 7            vision 13
## 8          research 12
## 9      organization 10
## 10          tableau  9
## 11    documentation  7
## 12       evaluation  7
## 13        integrity  7
## 14     presentation  6
## 15        influence  5
## 16    collaboration  4
## 17      flexibility  4
## 18               go  3
## 19      preparation  3
## 20              sas  3
## 21       confidence  2
## 22       creativity  2
## 23       initiative  2
## 24        mentoring  2
## 25      prototyping  2
## 26     transparency  2
## 27          agility  1
## 28          clarity  1
## 29       compassion  1
## 30 containerization  1
## 31        curiosity  1
## 32           docker  1
## 33            figma  1
## 34          honesty  1
## 35           matlab  1
## 36       networking  1
## 37        reasoning  1
## 38      reliability  1
## 39          respect  1
## 40             ruby  1
## 41     storytelling  1

The results are telling. In the most viewed (and therefore, perhaps, desirable) data science-related postings, many of the skills are not technical at all! In fact, of the top 3 skills, 2 are “communication” and “understanding.” We can visualize this nicely with a wordcloud as well.

library(wordcloud)

## Loading required package: RColorBrewer

wordcloud(words = skills_count$word, freq = skills_count$n, min.freq = 2, scale = c(3, .67))

In the next section, we’ll confirm this is no anomoly; soft skills are extremely desirable for data science-related jobs.

Part 2: Considering all postings

In this section, we are identifying the most frequently occurring skills across all data science jobs.

# Apply the function to all jobs, creating a new column with the extracted skills
ds_jobs$skills <- sapply(ds_jobs$description, extract_skills, USE.NAMES = FALSE)

# Create a dataset with skills from all data science jobs
all_skills <- ds_jobs %>%
  unnest_tokens(word, skills) %>%
  count(word, sort = TRUE)

head(all_skills, 20)

##             word   n
## 1  communication 164
## 2  understanding 142
## 3            sql 101
## 4     leadership  89
## 5       research  69
## 6   organization  60
## 7         python  57
## 8  documentation  49
## 9         vision  43
## 10         excel  35
## 11     integrity  34
## 12  presentation  26
## 13 collaboration  23
## 14     influence  23
## 15   flexibility  22
## 16            go  18
## 17    initiative  17
## 18    evaluation  16
## 19    networking  16
## 20       tableau  16

The output reveals the top skills sought after in data science job postings. Across all data sciecne jobs, non-technical skills like ‘communication’ and ‘understanding’ are highly emphasized, with ‘communication’ topping the list, followed by ‘understanding’. Once again, we can visualize this with a wordcloud.

library(wordcloud)
wordcloud(words = skills_count$word, freq = skills_count$n, min.freq = 2, scale = c(3, .67))

Grpahs

Add visual

library(ggplot2)
ggplot(skills_count, aes(x=word, y = n))+
  geom_bar(stat = "identity")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  ggtitle("Graph 1:Frequency:Count for each word")

Order factors by order in the data frame

skills_count$word = factor(skills_count$word,levels = unique(skills_count$word))

Check the dataframe

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

head(skills_count)

##            word  n
## 1 communication 36
## 2           sql 36
## 3 understanding 31
## 4        python 20
## 5         excel 14
## 6    leadership 13

tail(skills_count)

##            word n
## 36   networking 1
## 37    reasoning 1
## 38  reliability 1
## 39      respect 1
## 40         ruby 1
## 41 storytelling 1

str(skills_count)

## 'data.frame':    41 obs. of  2 variables:
##  $ word: Factor w/ 41 levels "communication",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ n   : int  36 36 31 20 14 13 13 12 10 9 ...

summary(skills_count)

##             word          n         
##  communication: 1   Min.   : 1.000  
##  sql          : 1   1st Qu.: 1.000  
##  understanding: 1   Median : 2.000  
##  python       : 1   Mean   : 6.585  
##  excel        : 1   3rd Qu.: 7.000  
##  leadership   : 1   Max.   :36.000  
##  (Other)      :35

library(lattice)
histogram(~ n | word, data = skills_count, layout= c(1,41))

This is a histogram of the count for each word side-by-side but horitzonal instead of vertical.Visiually we can wee the words that high count vs low and those that have similar count. This is al alternate was for us to visualize the distribution of soft skills mentioned

library(lattice)
histogram(~ n | word, data = skills_count, layout= c(1,10))

Conclusion and recommendations

Summary: Recap the main findings and their implications.

The most viewed skills for higher end salary data science jobs are soft skills that mostly consist of soft skills such as understanding, and communication, then followed by integrity and evaluation .A few notable skills that are technical like SQL that ranked high. We see in the size differences in the word cloud indicating a difference in importance, and it is validated in the bar graphs and histograms that the frequency of each word is not uniform. However a a little over 50% of the words had a count less than 20. The graph is not valida for us to perform a Poisson regression test.

Recommendations: A larger sample size that cross analyzes the count of each word with the average salary for all the jobs that contained that skill would be interesting. However, a more simpler version would just be getting a bigger sample size and cross-analyzing for low level positions vs high level positions. We could probably run a Chi-Square test and see if there are observed differences for skills needed for low level positions vs more experienced/high level positions. In terms of the field of data science, the graphs suggest that there is a desire for growth [within companies] and overall for self improvement. Passing down knowledge in a field that’s very creative would require more than just basic memory skill.Data science is not a one-track field. We conclude that training or classes in social psychology would help current and upcoming scientists gain and refine skill needed to working environment and collaborative projects