W. Edwards Deming said, “In God we trust, all others must bring data.” Please use data to answer the question, “Which are the most valued data science skills?” Consider your work as an exploration; there is not necessarily a “right answer.” Through data exploration and analysis, I explore the domain of employment opportunities in the field of data science, acknowledging that there may not be a single “right” answer. Leveraging data from job listings on Indeed and Glassdoor, I intended to clean and consolidate the data set to ensure a consistent format. This project explores the top skills desired by employers, ultimately shedding light on the ever-evolving landscape of data science.
I encountered various challenges while attempting to use Octoparse for web scraping data from different platforms. One notable example is my attempt to scrape data from LinkedIn, where I encountered issues related to security measures, logins, and other restrictions. Consequently, my primary sources of information using Octoparse became Indeed and Glassdoor. I found these two websites to be more suitable for extracting the information I needed, especially as I aimed to identify the most valuable data science skills in the job market.
#Loading necessary libraries
suppressWarnings({
library(tidyverse)
library(ggplot2)
library(tidyverse)
library(openintro)
library(stringr)
library(shiny)
library(dplyr)
library(tidyr)
library(stringr)
# Load other packages here
})Loading database from Indeed
Indeed_DSjobs <- read.csv("https://raw.githubusercontent.com/lburenkov/Project-3/main/data%20scientist%20indeed.csv", stringsAsFactors = FALSE, encoding = "UTF-8")
Indeed_DSjobs$Description <- str_replace_all(Indeed_DSjobs$Description,"[\n]","")head(Indeed_DSjobs)## Title
## 1 Applied Research Mathematician/ Mathematical Statistician
## 2 Data Scientist
## 3 Senior Predictive Modeler
## 4 AI Developer - Contractor
## 5 Lead Data Scientist
## 6 College Graduate - Data Science (BS/MS)
## Title_URL
## 1 https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0AC5S5KfpcrE62cRuYLg6qW_HWiPjKHP06qk-AGfbwYtOHGk5utQAbhlcSYjmxeK_H4ERVvxIPtHkE9zWaQvmAC-NE9HRKwI58atZmJrUp4jvS88Xb03eYvXFmYqoJtOazozh1wXSFRmw4eF8FyqmcJS02HjBNdkt5N4gT2mJCyScR0S4jIprTiRB5AkjNbKx7pIrazawcRLw5vOvhGGF4I_VWdbcbxP__W8YBHag4H5vKRELDYM3CcmUC1RRcWIw-dliQjdIGB_kj9JmeXiE6CVEL4QT0KxCssdP5h-V1xKWgzNa_hlT5s6S3n2DYJQ-I3_4pqnzMzsqiQLErRZl7L8MvI8RtQsr2fPX7cxNkzyayXym0EQt8nrVkqGvs5QyQ91k1LByFPjHC42Nn9ut4kB6JtVV3vcPEe5Uv7CjOyNrwcyyKnHbAN5ctcnu_TDzcWAKqsJUX4BeHzxV5NyNMR33A8C_5GLRl9LbjgWnul0wN23mnGVR8usG_BANOG7HvnyMgp2Xz6S98bhOpViW5CKORHMAsY3NdKvHqp9xH9SYiwAnR_8wH6D3qBU8gOtXQ6hmf0hEEBtCe3BGIw356cVsDS-ByvVt78XHdNfIQxlP3_jDZFZ1Wj&xkcb=SoDe-_M3JtNQXpRyp50LbzkdCdPP&p=0&fvj=0&vjs=3
## 2 https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0AC5S5KfpcrE62cRuYLg6qW_HWiPjKHP06qk-AGfbwYtGlr3wcSMURH9oqKq1q2FCfUEVrCkUcUvYkaqaQqQ4KG5_TQBvz1oR33pkfr9ubYgwx2u-0l-BfYDz--nCzoVP35cPIFb5Il-B9gGtxFg2TyxVM2s7ixtrfbjhGiCmW1OKXShjyPdVv-f1bnUy-4FT12QDM8rt08rOStJFhfnKuBgaaxIDy3-q9hTBd5WAIbjqayHffzv0auWEaoMnENAAkMihXv2Pw5NF5nFC1KqgqiGmSR5Rme1SHbjMQOkr9lhUYAzWufiiwVbl5_7Pyknf1Ah1UcJoe5BfDhkveEnQPVEUFZ8wmTGWbHhkkCMfTAQrLcgxgCUXk-l4CTzQjwE9iCcPluyO4SkVOcs4_Syjm1rgTJfQkanfRi3uwSWehKMWeO1X1OAsG4gGnJLLVnzEP1mQVcLbQoz2sA-urETV0Kxob-Xr7EJYbR9fN0muOuihdTaZA5Pp3oD8Vq_fdiXm3BmlbjiLx33IGOuLYGpkmNLKTnbByfejr-E_ISz7TK7v83_lYpX_TmrPo5VZRJym3owhp7EGUNI0u3RFfpb3RSxK7ULpVjzXs=&xkcb=SoBq-_M3JtNQXpRyp50KbzkdCdPP&p=1&fvj=0&vjs=3
## 3 https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0C2pP6k9OuB9dVVqjF_PdujusYNLdeVulQvAlytZ7WHPLDUR8wUESqpcdgIq0kFHzHwAmhYBtIxYfzjD7L6Edx9ayLlFNoeQY5NHi5n9DqYViHfYFh1me92fmGT3A0fx0tvZvFbukpezZHVEsjCXUavn0wq9_zWiOvv1ODTalxjQ0BhS756ICL306U6NqBWk0EVckZucDflEVg3MqZ_E5Da7ql1FMOkggbCQ50Cz-KxCql07sRrWG3ybrJnuURLBvbA9dtC1D2UZ8rgYn4AqiDD4DZHyHfEMMa2DGza4KiAnD3FAXaeyusSYnh3j4Aov2tSS8Aq2At47Gkny6uICCKb28RpSi8NqgQ9ILHaeHo-36fqd7Q1VksUeRNJH95UR3UNKYOmciyYAmeiMOjevDQHz1VESZyblvlb7pZ-sBGFo6116Iul4OlEOAgbT5v7nc3x-sw_fAFN_vvFzrdgjbVflQRlLw_ueaTO39SdaAaGyMcJYqlBpWB342rq2XybmclAwEqXjrauYdS0KhGXmuX0V-xjhszq1ZhAaoL5qZyjffdH27ZcChSswgK74HdToxtR0G9m9lqqxOZ2cPQKduLLpMazLFnaWK_r0B5A-rWdIw==&xkcb=SoD3-_M3JtNQXpRyp50JbzkdCdPP&p=2&fvj=1&vjs=3
## 4 https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0DY0o1vBkDF7ijZdphmN8ZpJjzNfoJ8hIIfocg_Q3_4rrU9MZaBYUA8inFsKV3suctbWkEVESkYcNgP7wRdHhF7Bq29ekffgM_hjURF1_Sv1RpsQIkEkeWDWLoEjO-_N-LrOTZHE_I2WrnlUvus4pSU6jSPzJ9CZMa7zMGWPwY1MMl21SPcmZNdAiLBKBSopf2Z-E51te9enSz_iJHeY2GYJK0paJcDuOKM2iItBYfI2E8kqU_8671HFQ8vYAXvynwzhmHowIZnvvOH_WGxs41TvDT30urLX9Da9lhB613D2jr6ik2Gv7Ag28mRQvugbqeePfgbH-wFNoJ2ydjLMnSxFj4C9BfhBIW8WJotlD56eQa5qTaB2T59q3kBKIGtLGEPrV2CiOkOFYmBHH3Tf9yWM8YwcUUZojl8MqU8Eh_Kusi2br2yloNxSWmgpdflVUotrqXWiLbVxYzgg82MAo-OzUbuOeBc_jxKm82pNE-k-SsXG8RLd0zb6gOr2EbgKFnpV-ldGTd9-OFDm4y7mMtdES8YB7lNLNGQ8Fm0NyM2ugVVt7gX24JRJhTAFYcacgcVs-_8lSTfSf1DlCpdwJ7-Y7sxTo3v1bdQpr2qCEAGAT3B4gSj0d06alxyJWvS81IyyfUmg2NI0Kvkvpc0BkJPXUzRtw1Og8Tmx6UpDaDpK1fwneciKsHs9HuxLsJvVIELnlfOXjQRSWCTIfFDHMiaFNiVJO5DfTNhg6BJizRxTAnc1N1PjNLC8WTLe3okhPnByrEk7j6vFw==&xkcb=SoBD-_M3JtNQXpRyp50IbzkdCdPP&p=3&fvj=0&vjs=3
## 5 https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0Cj-KmZPsf9w80C8b1WzNVrlanjD2SXJjxuCbUWHsXPZkFBy4Qr63BQKSyytxWB3SgLm-f0nIvEw1WHNc8pJffuu7UFqUfiETsyrpSNbs-G1mW8HQdUS02o1vSsV1T29ezpc7JQQ9W_096tXWKfLtqsRBEJyDD5OSK6bQtwnMwA4jiAkjmj8TvwD_Fc9m4vqnZteHxPNpx5cNFhYtnp2zJXN3Nfv8gJtfrszdVqIUxj2hHbBlBozOKNss685ZhNXruTaZdFM95BJm9zOYbwR0Twtj65U_aiODnnqsVeBfuPSkbgKxI0WfWNAuopE3N5-LslF77BgAzwe__gGx1h7Hb_k38gzKvbmP9xfYFym3HPmJ6kKoGZgiuXqXxbe_9I4yYV5gGWqu0ieEctsR4Bhof_5SQ4TgNUkUuGFTv9OnLJ2v96Mfx2hObaayaF-ABhbY9uj2JiLtLQvpp9Diiktr21Ph4rd_Pbr4K9M7Sd-AWp5dN-60gQf26B2I0CUZNm4LMpXQUgjIVVLOSg9rm2bbY0WId_GlEagHqL1p381xmu3c-BMStjKUQ4oNmDmxyiOXrtS3M7uKnlDd1rCkJ2sb3LzQAVf1DzDlQF0WgZLy_29NkcvysdYfMS_MGCCCGt2_KwtdK38_qfi-P3pJ5-8OaKSxFt0L_oFA2p-qJXnVtxKlcFveG27bNMRKYaPPK4mbtnGMxVMoEvXjSN28O-WC06SJLPNG_c3VXS8lkzvwbdwzR6Uw4K0yrbcyQMHj74G7gZXooEdu8PnOO6Zzhe8fRDF3dkTbwKYy4bi62uEF9tdVgCJuw50ztvwg5R4ON0IsMWvb4RwIrcYld0dpwaPuIrFa5AsUEXa2Kj9SBk8X0mzkGmDhSmUrbOos0KoT6SVm5GhEaW-MR3ihY43rTrqxcBrqpWFZ2UeqSazCT0u1j4fCNGOxQVBw9xMDcJszavqEQwf3FZiXvKNe9ZTLaw6x4o8sERlkayr8xu5QGyR6m-WKAW2Yt0Fr2usLQkwLgWWWJUXjv4YvFKmWgpTUf2YZlCFMOPCBUPpkfeiCgS7KuhpHbaIVbudejTHS9Rh7XvjpPTgM32PnL0fox_oLqbLwhlceMztOltqrJGKUZyaWYxQkg0CYVVKwP7yPvyEr_snb9xskVuc9Xe649seNGTNRPkthpJ0NHvqFn9FgGQfZWeFXpl1jTWDBpC2gZ3EguDtXKmDlnzmZdoTsdAV5Rm-hVeW-MLjCQuXNxKnSCqW1KFLv4835CJ2oXZ&xkcb=SoDN-_M3JtNQXpRyp50PbzkdCdPP&p=4&fvj=0&vjs=3
## 6 https://www.indeed.com/rc/clk?jk=3c21a0e654ace006&fccid=936367796261bd6e&vjs=3
## Company_name Location
## 1 National Security Agency Fort Meade, MD
## 2 National Security Agency Fort Meade, MD
## 3 Sitewise Analytics Dallas, TX
## 4 EY North Palm Beach, FL
## 5 Comcast Philadelphia, PA 19148 \n(Stadium District area)
## 6 INTEL Santa Clara, CA
## Salary metadata_divcontainsclass_css1ihavw2 Description
## 1 $81,233 - $142,341 a year Monday to Friday\n+2
## 2 $81,233 - $183,500 a year Full-time
## 3 From $90,000 a year Full-time
## 4 $100 - $130 an hour Full-time
## 5 Full-time Full-time
## 6 $105,930 - $158,890 a year Full-time
## Description1 Description2 date Location3
## 1 Posted\nPosted 28 days ago
## 2 Posted\nPosted 21 days ago
## 3 Employer\nActive 3 days ago
## 4 Posted\nPosted 4 days ago
## 5 Posted\nPosted 30+ days ago (Stadium District area)
## 6 Posted\nPosted 30+ days ago
## more_loc_URL
## 1
## 2
## 3
## 4
## 5
## 6 https://www.indeed.com/addlLoc/redirect?tk=1hdd2nrugi47d803&jk=3c21a0e654ace006&dest=%2Fjobs%3Fq%3Ddata%2Bscientist%26l%3DUnited%2BStates%26grpKey%3D8gcGdG5mdGNsuA_7Y6oQGwoJbm9ybXRpdGxlGg5kYXRhIHNjaWVudGlzdA%253D%253D
## more_loc Title1 Title_URL1 Titl1 Location1 css1ihav1
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 View all 20 available locations NA NA NA NA NA
## metadata_divcontainsclass_css1ihav1 Description3 Descriptio1 date1
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## more_loc_URL1 more_loc1 Locatio1 Description4
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
Loading database from Glassdoor
Glassdoor_DSjobs <- read.csv("https://raw.githubusercontent.com/lburenkov/Project-3/main/data%20scientist%20glassdor.csv", encoding = "UTF-8")
head(Glassdoor_DSjobs)## Company
## 1 Millennial Software5 ★
## 2 Quality Insights,Inc3.5 ★
## 3 PL Consulting, Inc.
## 4 innoVet Health, LLC4 ★
## 5 CAPITAL Services4.1 ★
## 6 Quiet Professionals LLC4.5 ★
## Image
## 1 https://media.glassdoor.com/sql/8113263/millennial-software-squareLogo-1682522759057.png
## 2 https://media.glassdoor.com/sql/429733/wvmi-squarelogo-1432625370254.png
## 3 https://media.glassdoor.com/sql/2192416/pl-consulting-va-squarelogo-1639126522050.png
## 4 https://media.glassdoor.com/sql/7575446/innovet-health-squarelogo-1660730851088.png
## 5 https://media.glassdoor.com/sql/2092289/capital-services-squarelogo-1525932299764.png
## 6 https://media.glassdoor.com/sql/1174877/quiet-professionals-squarelogo-1560931964135.png
## employerprofile_employerrating_lq_zl
## 1 5 ★
## 2 3.5 ★
## 3
## 4 4 ★
## 5 4.1 ★
## 6 4.5 ★
## jobcard_seolink_r4hue_URL
## 1 https://www.glassdoor.com/job-listing/machine-learning-data-scientist-ts-sci-with-poly-100k-200k-15-401k-millennial-software-JV_KO0,66_KE67,86.htm?jl=1008841021673
## 2 https://www.glassdoor.com/job-listing/senior-statistician-quality-insights-inc-JV_KO0,19_KE20,40.htm?jl=1008868902322
## 3 https://www.glassdoor.com/job-listing/cyber-data-scientist-pl-consulting-inc-JV_KO0,20_KE21,38.htm?jl=1008869819432
## 4 https://www.glassdoor.com/job-listing/senior-data-scientist-with-ai-and-ml-experience-innovet-health-llc-JV_KO0,47_KE48,66.htm?jl=1008843882991
## 5 https://www.glassdoor.com/job-listing/data-scientist-capital-services-JV_KO0,14_KE15,31.htm?jl=1008882782096
## 6 https://www.glassdoor.com/job-listing/data-scientist-quiet-professionals-llc-JV_KO0,14_KE15,38.htm?jl=1008728306030
## Title
## 1 Machine Learning Data Scientist TS/SCI with Poly $100K- $200k + 15% 401k
## 2 Senior Statistician
## 3 Cyber Data Scientist
## 4 Senior Data Scientist with AI and ML experience
## 5 Data Scientist
## 6 Data Scientist
## Location Salary
## 1 McLean, VA $100K - $200K (Employer est.)
## 2 Charleston, WV $66K - $98K (Glassdoor est.)
## 3 Chantilly, VA $180K - $195K (Employer est.)
## 4 Remote $140K (Employer est.)
## 5 Sioux Falls, SD $86K - $125K (Glassdoor est.)
## 6 Doral, FL $160K (Employer est.)
## Description
## 1 Work will include high technical skill in programming, primarily python, and common data engineering tools, as well as high levels of collaboration,……
## 2 A bachelor’s degree from accredited college or university in statistics or relevant discipline plus 10 years of full-time paid experience in statistical……
## 3 Import and transform data into usable sets for analysis tools used by the customer (e.g., Tableau). Collaborate with RMG teams to develop means to integrate,……
## 4 Please answer 2 if you are a US citizen, 1 if you have a valid green card, 0 if neither. Lead team in the application of operations research techniques and in……
## 5 Effective written and oral communication to both technical and non-technical teams. Completed a four-year degree in a quantitative field (Mathematics, Data……
## 6 Design, develop, and modify software systems, using scientific analysis and mathematical models to predict and measure outcome and consequences of design.…
## Salary1 mlxsm Description2 dflex
## 1 (Employer est.) Easy Apply data 30d+
## 2 (Glassdoor est.) Easy Apply 30d+
## 3 (Employer est.) Easy Apply data 30d+
## 4 (Employer est.) Easy Apply 30d+
## 5 (Glassdoor est.) Easy Apply 30d+
## 6 (Employer est.) Easy Apply 30d+
After loading both datasets from Indeed and Glassdoor, I proceeded to merge them into a single dataset.
combined_data <- bind_rows(Glassdoor_DSjobs, Indeed_DSjobs)Cleaning our data. After unifying the dataset, I proceeded to tidy the data by removing columns that appeared unnecessary for the analysis.
#Dropping unnecessary columns
columns_to_keep <- c("Title", "Company", "Salary", "Location", "Description", "Description2") # Replace with desired column names
new_data <- combined_data %>%
select(all_of(columns_to_keep))head(new_data)## Title
## 1 Machine Learning Data Scientist TS/SCI with Poly $100K- $200k + 15% 401k
## 2 Senior Statistician
## 3 Cyber Data Scientist
## 4 Senior Data Scientist with AI and ML experience
## 5 Data Scientist
## 6 Data Scientist
## Company Salary Location
## 1 Millennial Software5 ★ $100K - $200K (Employer est.) McLean, VA
## 2 Quality Insights,Inc3.5 ★ $66K - $98K (Glassdoor est.) Charleston, WV
## 3 PL Consulting, Inc. $180K - $195K (Employer est.) Chantilly, VA
## 4 innoVet Health, LLC4 ★ $140K (Employer est.) Remote
## 5 CAPITAL Services4.1 ★ $86K - $125K (Glassdoor est.) Sioux Falls, SD
## 6 Quiet Professionals LLC4.5 ★ $160K (Employer est.) Doral, FL
## Description
## 1 Work will include high technical skill in programming, primarily python, and common data engineering tools, as well as high levels of collaboration,……
## 2 A bachelor’s degree from accredited college or university in statistics or relevant discipline plus 10 years of full-time paid experience in statistical……
## 3 Import and transform data into usable sets for analysis tools used by the customer (e.g., Tableau). Collaborate with RMG teams to develop means to integrate,……
## 4 Please answer 2 if you are a US citizen, 1 if you have a valid green card, 0 if neither. Lead team in the application of operations research techniques and in……
## 5 Effective written and oral communication to both technical and non-technical teams. Completed a four-year degree in a quantitative field (Mathematics, Data……
## 6 Design, develop, and modify software systems, using scientific analysis and mathematical models to predict and measure outcome and consequences of design.…
## Description2
## 1 data
## 2
## 3 data
## 4
## 5
## 6
#Data set rows and columns
dim(new_data)## [1] 3615 6
After removing unnecessary columns from the new dataset, I was left with 3615 rows and 6 columns. However, I encountered the issue of duplicate entries, which was somewhat disappointing. Given that I was scraping job listings from websites where recruiters post jobs regularly, it was highly likely that duplicates would be present in the dataset.
#Looking for duplicates
duplicates <- duplicated(new_data)
num_duplicates <- sum(duplicates)duplicates## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [361] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [373] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [409] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [421] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [445] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [457] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [493] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [505] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## [517] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [541] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [553] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [565] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [589] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [601] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [613] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [625] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE
## [637] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [649] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [661] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [673] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [685] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [697] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [709] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [721] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [733] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [745] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
## [757] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [769] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [781] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [793] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [805] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
## [817] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [829] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [841] TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [853] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [865] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [877] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [889] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [901] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [913] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [925] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [937] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [949] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [961] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [973] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [985] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [997] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1009] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1021] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1033] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1045] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1057] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1069] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1081] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1093] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1105] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1117] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1129] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1141] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1153] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1165] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1177] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1189] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1201] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1213] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1225] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1237] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1249] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1261] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1273] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1285] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1297] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1309] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1321] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1333] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1345] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1357] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1369] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1381] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1393] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1405] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1417] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1429] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1441] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1453] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1465] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1477] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1489] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1501] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1513] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1525] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1537] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1549] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1561] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1573] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1585] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1597] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1609] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1621] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1633] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1645] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1657] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1669] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1681] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1693] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1705] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1717] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1729] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1741] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1753] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1765] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1777] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1789] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1801] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1813] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1825] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1837] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## [1849] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1861] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1873] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1885] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1897] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## [1909] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1921] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1933] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1945] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [1957] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## [1969] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1981] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1993] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2005] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [2017] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
## [2029] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2041] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2053] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2065] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [2077] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## [2089] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2101] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2113] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2125] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [2137] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## [2149] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [2173] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2185] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [2197] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
## [2209] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2233] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2245] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
## [2257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [2269] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
## [2281] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
## [2293] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [2305] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [2317] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [2329] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2341] TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
## [2353] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [2365] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2377] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2389] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2401] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## [2413] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2425] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2437] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2449] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2461] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2473] TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [2485] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [2497] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2509] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2521] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
## [2533] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2545] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2557] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2569] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2581] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2593] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [2605] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [2617] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [2629] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2641] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2653] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2665] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2677] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2689] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2701] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2713] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2725] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2737] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2749] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2761] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2773] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2785] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2797] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2809] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2821] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2833] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2845] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2857] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2869] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2881] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## [2893] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2905] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2917] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2929] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2941] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2953] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2965] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2977] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2989] TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
## [3001] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3013] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3025] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3037] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3049] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3061] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3073] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3085] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [3097] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3109] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3133] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3145] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3157] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3169] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3193] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3205] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3217] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3229] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3253] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3265] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3277] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3289] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [3301] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3313] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3325] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3337] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3349] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3361] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3373] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3385] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3397] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3409] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [3421] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3433] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3445] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3457] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3469] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3481] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3493] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3505] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3517] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3529] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3541] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3553] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3565] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3577] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3589] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3601] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3613] TRUE TRUE TRUE
#There are plenty of duplicates from the web scrapping.
data <- unique(new_data)
head(data)## Title
## 1 Machine Learning Data Scientist TS/SCI with Poly $100K- $200k + 15% 401k
## 2 Senior Statistician
## 3 Cyber Data Scientist
## 4 Senior Data Scientist with AI and ML experience
## 5 Data Scientist
## 6 Data Scientist
## Company Salary Location
## 1 Millennial Software5 ★ $100K - $200K (Employer est.) McLean, VA
## 2 Quality Insights,Inc3.5 ★ $66K - $98K (Glassdoor est.) Charleston, WV
## 3 PL Consulting, Inc. $180K - $195K (Employer est.) Chantilly, VA
## 4 innoVet Health, LLC4 ★ $140K (Employer est.) Remote
## 5 CAPITAL Services4.1 ★ $86K - $125K (Glassdoor est.) Sioux Falls, SD
## 6 Quiet Professionals LLC4.5 ★ $160K (Employer est.) Doral, FL
## Description
## 1 Work will include high technical skill in programming, primarily python, and common data engineering tools, as well as high levels of collaboration,……
## 2 A bachelor’s degree from accredited college or university in statistics or relevant discipline plus 10 years of full-time paid experience in statistical……
## 3 Import and transform data into usable sets for analysis tools used by the customer (e.g., Tableau). Collaborate with RMG teams to develop means to integrate,……
## 4 Please answer 2 if you are a US citizen, 1 if you have a valid green card, 0 if neither. Lead team in the application of operations research techniques and in……
## 5 Effective written and oral communication to both technical and non-technical teams. Completed a four-year degree in a quantitative field (Mathematics, Data……
## 6 Design, develop, and modify software systems, using scientific analysis and mathematical models to predict and measure outcome and consequences of design.…
## Description2
## 1 data
## 2
## 3 data
## 4
## 5
## 6
Data set has now 1153 rows.
After removing duplicates, I was left with 1153 unique rows. It was then time to further clean the dataset. I decided to create two additional columns, “City” and “State,” from the “Location” column. My motivation for doing this was to explore which skills were in higher demand in specific locations, such as New York, for example.
#Splitting column Location into two columns
# Separate the "Location" column into "City" and "State"
data1 <- data %>%
separate(Location, into = c("City", "State"), sep = ", ", remove = FALSE)## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 293 rows [4, 8, 13, 21,
## 25, 26, 28, 29, 30, 46, 47, 48, 49, 51, 59, 61, 63, 64, 65, 72, ...].
data1 <- data1[complete.cases(data1$Description), ]
head(data1)## Title
## 1 Machine Learning Data Scientist TS/SCI with Poly $100K- $200k + 15% 401k
## 2 Senior Statistician
## 3 Cyber Data Scientist
## 4 Senior Data Scientist with AI and ML experience
## 5 Data Scientist
## 6 Data Scientist
## Company Salary Location
## 1 Millennial Software5 ★ $100K - $200K (Employer est.) McLean, VA
## 2 Quality Insights,Inc3.5 ★ $66K - $98K (Glassdoor est.) Charleston, WV
## 3 PL Consulting, Inc. $180K - $195K (Employer est.) Chantilly, VA
## 4 innoVet Health, LLC4 ★ $140K (Employer est.) Remote
## 5 CAPITAL Services4.1 ★ $86K - $125K (Glassdoor est.) Sioux Falls, SD
## 6 Quiet Professionals LLC4.5 ★ $160K (Employer est.) Doral, FL
## City State
## 1 McLean VA
## 2 Charleston WV
## 3 Chantilly VA
## 4 Remote <NA>
## 5 Sioux Falls SD
## 6 Doral FL
## Description
## 1 Work will include high technical skill in programming, primarily python, and common data engineering tools, as well as high levels of collaboration,……
## 2 A bachelor’s degree from accredited college or university in statistics or relevant discipline plus 10 years of full-time paid experience in statistical……
## 3 Import and transform data into usable sets for analysis tools used by the customer (e.g., Tableau). Collaborate with RMG teams to develop means to integrate,……
## 4 Please answer 2 if you are a US citizen, 1 if you have a valid green card, 0 if neither. Lead team in the application of operations research techniques and in……
## 5 Effective written and oral communication to both technical and non-technical teams. Completed a four-year degree in a quantitative field (Mathematics, Data……
## 6 Design, develop, and modify software systems, using scientific analysis and mathematical models to predict and measure outcome and consequences of design.…
## Description2
## 1 data
## 2
## 3 data
## 4
## 5
## 6
I also decided to keep Remote as a part of the analysis.
# Fill "City" and "State" with "Remote" when "Location" is "Remote"
data1$City[data1$Location == "Remote"] <- "RE"
data1$State[data1$Location == "Remote"] <- "RE"# Finding rows with NA values in the "State" column
na_rows <- data1[is.na(data1$State), ]
# Displaying rows with missing "State" values
print(na_rows)## Title
## 25 E-commerce Data Scientist - Health and Wellness - California REMOTE
## 28 Spatial Data Scientist
## 83 Data Scientist
## 89 Senior Data Analyst, Regulatory Relations
## 91 Data Scientist
## 103 Data Scientist
## 121 Data Scientist
## 131 Data Scientist
## 212 Data Scientist | Analytics
## 220 Staff Data Scientist
## 224 Data Analytics Program Management Support
## 226 Senior Data Scientist, Platform
## 236 Data Engineer
## 266 Senior Data Scientist
## 301 data scientist - multiple openings
## 305 Consumer Marketing Data Analyst
## 345 Sr. Manager, Biomarker Data Scientist
## 380 Data Scientist
## 384 Senior Data Analyst SME
## 419 Data Scientist
## 422 TSS Data Scientist
## 428 AI Data Scientist
## 463 Senior Data Scientist, Product Growth
## 470 Senior Manager, Statistical Programming
## 489 ESG Jr. Data Management Consultant - Computer Scientist (PT or FT REMOTE)
## 520 Information Systems, IT, Cyber Engineer & Data Science (Recent Grad/Full Time)
## 558 Data Scientist
## 571 Data Scientist
## 579 Data Scientist Engineering Modeler
## 624 Analyst - Analytics and Data Science
## 635 Data Scientist
## 672 Data Scientist
## 687 Data Scientist
## 692 Big Data & AI/ML Engineer
## 696 Sr. Data Scientist
## 705 Senior Data Scientist
## 726 Data Scientist
## 787 Data Scientist
## 806 Associate Decision Scientist, Client Analytics
## 808 Data Scientist II
## 810 Sr. Data Scientist
## 817 Data Scientist 1
## 820 Machine Learning Engineer, Ranking
## 823 Data Scientist, Industry Solutions Engineering
## 847 Data Scientist / Applied Mathematician
## 879 Senior Data Scientist
## 1863 Data Scientist
## 1866 Data Scientist
## 1896 Data Scientist
## 1927 Spatial Data Scientist
## 1930 TSS Data Scientist
## 1934 Staff Data Scientist
## 2013 Data Scientist
## 2017 Senior Associate, Data Scientist
## 2049 AI Data Scientist
## 2102 E-commerce Data Scientist - Health and Wellness - California REMOTE
## 2103 Senior Data Analyst
## 2111 Senior Data Scientist
## 2137 Senior Data Scientist
## 2193 Healthcare Data Analyst Sr.
## 2265 E-commerce Data Scientist - Health and Wellness - California REMOTE
## Company
## 25 Stingray Direct4.5 ★
## 28 The Nature Conservancy4.4 ★
## 83 Frankenmuth Insurance Company4.4 ★
## 89 Osaic3.7 ★
## 91 GloComms4.1 ★
## 103 Global Information Technology3.9 ★
## 121 Peraton3.6 ★
## 131 AIVantage INC
## 212 Machinify3.7 ★
## 220 Linktree3.2 ★
## 224 Modern Technology Solutions, Inc.4.8 ★
## 226 Grammarly, Inc.4.3 ★
## 236 SOSi3.6 ★
## 266 Brigit3.8 ★
## 301 Starbucks3.7 ★
## 305 FICO4 ★
## 345 Bristol Myers Squibb3.8 ★
## 380 Blu Omega LLC4.6 ★
## 384 Modern Technology Solutions, Inc.4.8 ★
## 419 IT Consulting Services4.5 ★
## 422 General Dynamics Information Technology4 ★
## 428 11:59
## 463 Jerry3.7 ★
## 470 Bristol Myers Squibb3.8 ★
## 489 Montrose Environmental Group, Inc.3.4 ★
## 520 Honeywell4 ★
## 558 eimagine3.9 ★
## 571 Peraton3.6 ★
## 579 Capgemini3.7 ★
## 624 Blue Cross Blue Shield of Arizona4 ★
## 635 NYC Careers3.6 ★
## 672 Simulmedia3.7 ★
## 687 Pull Systems3.4 ★
## 692 Mark James Search Ltd4.9 ★
## 696 New Wave Staffing
## 705 McAfee, LLC3.9 ★
## 726 High Tech High3.3 ★
## 787 Endeavor Communications4.5 ★
## 806 Ibotta3.7 ★
## 808 Chatlayer2 ★
## 810 INPOSIA Solutions GmbH3.8 ★
## 817 Public Consulting Group3.8 ★
## 820 Grammarly, Inc.4.3 ★
## 823 Microsoft4.3 ★
## 847 Johns Hopkins Applied Physics Laboratory (APL)4.3 ★
## 879 Microsoft4.3 ★
## 1863 <NA>
## 1866 <NA>
## 1896 <NA>
## 1927 <NA>
## 1930 <NA>
## 1934 <NA>
## 2013 <NA>
## 2017 <NA>
## 2049 <NA>
## 2102 <NA>
## 2103 <NA>
## 2111 <NA>
## 2137 <NA>
## 2193 <NA>
## 2265 <NA>
## Salary Location
## 25 $110K - $130K (Employer est.) California
## 28 $75K - $96K (Employer est.) California
## 83 Michigan
## 89 United States
## 91 Texas
## 103 Texas
## 121 $51K - $82K (Employer est.) Florida
## 131 $100K - $150K (Employer est.) Virginia
## 212 $170K - $230K (Employer est.) California
## 220 $160K - $220K (Employer est.) California
## 224 United States
## 226 $226K - $281K (Employer est.) United States
## 236 $65K - $112K (Glassdoor est.) Redstone Arsenal
## 266 $100K - $170K (Employer est.) California
## 301 $116K - $157K (Employer est.) United States
## 305 $67K - $105K (Employer est.) California
## 345 New Jersey
## 380 $114K - $124K (Employer est.) United States
## 384 United States
## 419 Pennsylvania
## 422 Texas
## 428 $95K - $150K (Employer est.) California
## 463 Georgia
## 470 New Jersey
## 489 $35.00 - $38.00 Per Hour (Employer est.) Texas
## 520 United States
## 558 United States
## 571 $86K - $138K (Employer est.) United States
## 579 Texas
## 624 $70K (Employer est.) United States
## 635 $61K - $85K (Employer est.) Manhattan
## 672 United States
## 687 California
## 692 $65K - $101K (Glassdoor est.) Virgin Island
## 696 Illinois
## 705 $132K - $217K (Employer est.) United States
## 726 $83K - $95K (Employer est.) Point Loma
## 787 Indiana
## 806 $60K - $80K (Employer est.) Colorado
## 808 $125K - $138K (Employer est.) United States
## 810 North Carolina
## 817 $100K - $120K (Employer est.) United States
## 820 $271K - $337K (Employer est.) United States
## 823 $112K - $218K (Employer est.) United States
## 847 $80K - $120K (Glassdoor est.)
## 879 $112K - $218K (Employer est.) United States
## 1863 Pennsylvania
## 1866 Full-time Remote in Michigan
## 1896 $100,000 - $150,000 a year Virginia
## 1927 $75,000 - $96,000 a year California
## 1930 Remote in Louisiana
## 1934 $160,000 - $220,000 a year Hybrid remote in California
## 2013 Full-time Hybrid remote in California
## 2017 $178,200 a year Remote in New York State
## 2049 $95,000 - $150,000 a year Remote in California
## 2102 $110,000 - $130,000 a year California
## 2103 Full-time Hybrid remote
## 2111 $100,000 - $170,000 a year Remote in California
## 2137 Pennsylvania
## 2193 $114,000 - $145,000 a year Remote in Virginia
## 2265 $110,000 - $130,000 a year California
## City State
## 25 California <NA>
## 28 California <NA>
## 83 Michigan <NA>
## 89 United States <NA>
## 91 Texas <NA>
## 103 Texas <NA>
## 121 Florida <NA>
## 131 Virginia <NA>
## 212 California <NA>
## 220 California <NA>
## 224 United States <NA>
## 226 United States <NA>
## 236 Redstone Arsenal <NA>
## 266 California <NA>
## 301 United States <NA>
## 305 California <NA>
## 345 New Jersey <NA>
## 380 United States <NA>
## 384 United States <NA>
## 419 Pennsylvania <NA>
## 422 Texas <NA>
## 428 California <NA>
## 463 Georgia <NA>
## 470 New Jersey <NA>
## 489 Texas <NA>
## 520 United States <NA>
## 558 United States <NA>
## 571 United States <NA>
## 579 Texas <NA>
## 624 United States <NA>
## 635 Manhattan <NA>
## 672 United States <NA>
## 687 California <NA>
## 692 Virgin Island <NA>
## 696 Illinois <NA>
## 705 United States <NA>
## 726 Point Loma <NA>
## 787 Indiana <NA>
## 806 Colorado <NA>
## 808 United States <NA>
## 810 North Carolina <NA>
## 817 United States <NA>
## 820 United States <NA>
## 823 United States <NA>
## 847 <NA>
## 879 United States <NA>
## 1863 Pennsylvania <NA>
## 1866 Remote in Michigan <NA>
## 1896 Virginia <NA>
## 1927 California <NA>
## 1930 Remote in Louisiana <NA>
## 1934 Hybrid remote in California <NA>
## 2013 Hybrid remote in California <NA>
## 2017 Remote in New York State <NA>
## 2049 Remote in California <NA>
## 2102 California <NA>
## 2103 Hybrid remote <NA>
## 2111 Remote in California <NA>
## 2137 Pennsylvania <NA>
## 2193 Remote in Virginia <NA>
## 2265 California <NA>
## Description
## 25 Bachelor’s degree in Mathematics, Statistics, or a related field. Develop processes and tools to monitor and analyze model performance and data accuracy.…
## 28 Master's Degree in science related field and 2 years of experience or equivalent combination of education and experience.…
## 83 Summary: Under direct supervision and following standard procedures, applies advanced analytical techniques to complex data sets to develop distinctive……
## 89 Bachelor’s degree is required, preferably in Finance, Business Administration, Computer Science, or another related field. Analyze and report data trends (20%).…
## 91 STEM Degree holder with 2+ years of Data Science work experience (background in energy or manufacturing preferred).…
## 103 Master's degree in computer science, mathematics, physics, applied science, engineering or similar disciplines with demonstrated research capability with 5+……
## 121 The Defense Mission and Health Solutions team is seeking a Data Scientist to join the FPS2 program to fulfill a high-visibility and critical role to support……
## 131 Works on complex technical projects or business issues requiring state of the art technical or industry knowledge.…
## 212 Help the cross-functional engineering and data science teams prioritize work and make technical decisions based on measured real-world value.…
## 220 Utilize data analysis and modeling techniques to gain deep insights into Visitor and Linker content and behavior, thereby identifying opportunities for……
## 224 Document results of analytic efforts for both technical and senior level decision makers. Strong background in Data fusion, statistical analysis, advanced……
## 226 Continuous Learning: Stay updated with the latest trends and advancements in data science and platform engineering, bringing innovation to the team's……
## 236 In this role, you will construct data analytical infrastructure, data engineering, data mining, exploratory analysis, predictive analysis, and statistical……
## 266 Advanced degree in data science, statistics, computer science, or related field. Brigit is committed to providing equal employment opportunities for all……
## 301 Assist technical and non-technical end users in how to leverage and interpret the analysis. Proficient in communicating effectively with both technical and……
## 305 Design and develop advanced Tableau reports with appropriate KPIs and visualizations from our data warehouse using SQL.…
## 345 Develop, implement, and apply state-of-the-art algorithms to address key business problems and drive the implementation of innovative statistical methods in……
## 380 We are currently seeking a Data Scientist to develop advanced analytics in support of an Intelligence Community client, by providing data science and machine……
## 384 Summarize and Communicate analysis results to senior leaders and technical SME's. The Senior Data Analyst SME will perform data analyses using a wide variety of……
## 419 IT Consulting Services Inc seeks Data Scientist for Fairless, PA office to apply algorithms to models predict outcomes of interest such as health,……
## 422 Leverage large sets of structured and unstructured data to develop tools and techniques for robust data analysis. HOW A DATA SCIENTIST WILL MAKE AN IMPACT.…
## 428 A bachelor's degree in computer science, engineering, or a related field. Communicate effectively with both technical and non-technical stakeholders,……
## 463 You will perform analytical deep dives, develop and analyze experiments, build predictive models, and make recommendations that inform our product roadmap.…
## 470 Provides comprehensive programming leadership and support to clinical project teams and vendors, including deployment of programming strategies, standards,……
## 489 Use advanced database & technical skills for integration to other systems, data load tools & data collection apps. Compensation is a competitive hourly wage.…
## 520 Honeywell is hiring Recent Graduates from bachelor’s and master’s degree programs to join our technology teams. Excellent oral and written communication skills.…
## 558 An ideal candidate will be comfortable working on analysis in ambiguous conditions, enjoy working across disparate questions and data sets, and identify……
## 571 Experience in Business objects or similar tools. Hands-on modeling and design experience. Analyze monthly snapshot data available in ERIS Sybase IQ database to……
## 579 Experience with a wide variety of data science tools ranging from Data Science platforms to specialized tools (Gurobi, Local Solver, CPLex).…
## 624 Blue Cross Blue Shield of Arizona (BCBSAZ) is urgently hiring entry, mid, and senior-level Analytics and Data Science Analysts to provide strategic analysis……
## 635 Conduct research projects with internal stakeholders and external partners; develop and apply post-street improvement project evaluation processes; develop……
## 672 You will spend lots of time doing exploratory data analysis, as well as collaborating with teammates to deliver top-notch forecasting models and optimization……
## 687 Experience working with engineering teams generalizing and integrating statistical models into software applications at scale, using standard engineering……
## 692 4+ years of experience in applying Statistics along with end-to-end ML engineering (design, development & implementation of end-to-end AI/ML models).…
## 696 Substantial experience of data analysis, having good knowledge of relevant reporting and analytical tools (i.e. data visualization and visual data exploration……
## 705 You possess a Master’s or PhD in Statistics, Data Science, Computer Science or other quantitative field. You have strong problem-solving skills that can be……
## 726 Ability to express technical concepts to a non-technical audienceAbility to collaborate with a broad audience to help simply answer complex questions using data……
## 787 Bachelor’s degree in Data Science, Computer Science, Information Systems, or a related field. Microsoft Certified: Data Analyst Associate (Power BI……
## 806 Advanced degree in analytics/mathematics/statistics/data science/computer science/economics or related field. Support projects to build causal inference models ……
## 808 Solve complex business problems using data science approaches to modeling and analytics in order to aid business leaders from groups such as sales, marketing,……
## 810 Model Development: Design, develop, and implement machine learning models and algorithms to solve complex business problems, improve product functionality, and……
## 817 The Health Data Scientist is the team steward of health policy focused data collection, visualizations and reports; and will provide consultative guidance and……
## 820 Promote excellence and best practices across the Machine Learning team with regard to research, implementation, tooling, and system design.…
## 823 3+ years of data science related experience including, but not limited to, any of the following: training models for computer vision, recommendation systems,……
## 847 To address emerging national challenges, we develop software systems that enable the application of data science and artificial intelligence (AI) to a variety……
## 879 OR Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data……
## 1863 Clean and manipulate raw data using statistical software for better and more efficient health data and aid innovations in the pharmaceutical industry.
## 1866 Summary: Under direct supervision and following standard procedures, applies advanced analytical techniques to complex data sets to develop distinctive…
## 1896 Results from multiple sources using a variety of techniques, ranging from simple data aggregation via statistical analysis to complex data mining.
## 1927 As a part of the Conservation Technology team, spatial data scientists are responsible for designing and implementing various analyses (often spatial, but not…
## 1930 Contribute to a variety of complex data-related projects, including machine learning, predictive modeling, and analytics.
## 1934 Utilize data analysis and modeling techniques to gain deep insights into Visitor and Linker content and behavior, thereby identifying opportunities for…
## 2013 Experience in two or more applicable data science disciplines: statistical modeling, machine learning, data mining, time series data analysis, or data…
## 2017 (5) Developing and deploying supervised and unsupervised machine-learning models leveraging Random Forest, XGBoost, and GBM tree models, deep learning, and k…
## 2049
## 2102 Utilize data modeling and machine learning to optimize marketing campaigns, improve customer experience, and increase operational efficiency.
## 2103 Stay current with industry trends, best practices, and advancements in data visualization, DAX language, and program management techniques.
## 2111 We have access to rich, structured data that we can use to derive insights and build complex models.
## 2137 The person needs to have used advanced data mining and modeling to support/drive business decisions and then implemented the findings within a business process…
## 2193 The data to be analyzed is both clinical data and operational data.
## 2265
## Description2
## 25 data
## 28
## 83 data
## 89 data
## 91 Data
## 103
## 121 Data
## 131
## 212 data
## 220 data
## 224 Data
## 226 data
## 236 data
## 266 data
## 301
## 305 data
## 345
## 380 Data
## 384 Data
## 419 Data
## 422 data
## 428
## 463
## 470
## 489 data
## 520
## 558 data
## 571 data
## 579 data
## 624 Data
## 635
## 672 data
## 687
## 692
## 696 data
## 705 Data
## 726
## 787 Data
## 806
## 808 data
## 810
## 817 Data
## 820
## 823 data
## 847 data
## 879 Data
## 1863
## 1866
## 1896
## 1927
## 1930 2+ years of relevant experience.
## 1934
## 2013
## 2017
## 2049
## 2102
## 2103
## 2111 Help our existing engineering and business teams track…
## 2137
## 2193 Be part of a project team influencing the future of clinical data management by helping us…
## 2265
# Replace "State" with "CA" when "City" is "California"
data1$State[data1$City == "California"] <- "CA"
# Replace "State" with "MI" when "City" is "Michigan"
data1$State[data1$City == "Michigan"] <- "MI"
# Replace "State" with "TX" when "City" is "Texas"
data1$State[data1$City == "Texas"] <- "TX"
# Replace "State" with "FL" when "City" is "Florida"
data1$State[data1$City == "Florida"] <- "FL"
# Replace "State" with "VA" when "City" is "Virginia"
data1$State[data1$City == "Virginia"] <- "VA"
# Replace "State" with "AL" when "City" is "Redstone Arsenal"
data1$State[data1$City == "Redstone Arsenal"] <- "AL"
# Replace "State" with "Remote" when "City" is "United States"
data1$State[data1$City == "United States"] <- "RE"
# Replace "State" with "NJ" when "City" is "New Jersey"
data1$State[data1$City == "New Jersey"] <- "NJ"
# Replace "State" with "AK" when "City" is "Anchorage"
data1$State[data1$City == "Anchorage"] <- "AK"
# Replace "State" with "PA" when "City" is "Pennsylvania"
data1$State[data1$City == "Pennsylvania"] <- "PA"
# Replace "State" with "GA" when "City" is "Georgia"
data1$State[data1$City == "Georgia"] <- "GA"
# Replace "State" with "NY" when "City" is "Manhattan"
data1$State[data1$City == "Manhattan"] <- "NY"
# Replace "State" with "VI" when "City" is "Virgin Island"
data1$State[data1$City == "Virgin Island"] <- "VI"
# Replace "State" with "IL" when "City" is "Illinois"
data1$State[data1$City == "Illinois"] <- "IL"
# Replace "State" with "CA" when "City" is "Point Loma"
data1$State[data1$City == "Point Loma"] <- "CA"
# Replace "State" with "IN" when "City" is "Indiana"
data1$State[data1$City == "Indiana"] <- "IN"
# Replace "State" with "CO" when "City" is "Colorado"
data1$State[data1$City == "Colorado"] <- "CO"
# Replace "State" with "NC" when "City" is "North Carolina"
data1$State[data1$City == "North Carolina"] <- "NC"# Renaming remote to "RE" in column State
data1$State <- ifelse(grepl("remote", data1$City, ignore.case = TRUE), "RE", data1$State)# Deleting extra characters in column State
data1$State <- substr(data1$State, 1, 2)# Abbreviation when Remote is in column City
data1$City[is.na(data1$City) | data1$City == ""] <- "Remote"
data1$State[is.na(data1$State) | data1$State == ""] <- "RE"# Replace empty spaces with NA in the "Description" column
data1$Description[data1$Description == ""] <- NAThere was only one row that had no description, therefore I decided to just eliminate it.
# Drop rows where "Description" is NA
df <- data1[complete.cases(data1$Description), ]In the process of conducting preliminary analysis, I opted to compile a roster of skills and keywords pertinent to positions in data science. This inventory encompassed proficiencies such as “Machine learning,” “Data visualization,” “Statistics,” “Python,” “Deep learning,” among others. This step allowed me to scrutinize how often these competencies appeared in job descriptions, which, in turn, facilitated the identification of sought-after skills in the job market.
# List of skills
skills <- c("Machine learning", "Data visualization", "Research", "Data analysis",
"Statistics", "SQL", "Deep learning", "Programming language", "Math",
"Business intelligence", "Coding", "Business", "Analytics", "Problem solving",
"Big data", "Critical thinking", "Databases", "Natural language processing",
"Computing", "Data mining", "Exploratory data analysis", "Decision-making",
"Data warehouse", "Data wrangling")
# Initializing new columns with 0
for (skill in skills) {
df[[skill]] <- 0
}
# Iterating through the rows and set 1 if the skill word is found in the Description
for (i in 1:nrow(df)) {
for (skill in skills) {
if (grepl(skill, df$Description[i], ignore.case = TRUE)) {
df[i, skill] <- 1
}
}
}# Creating a new data frame with additional columns
new_df <- data.frame(df)
# Initializing the new columns with 0
for (skill in skills) {
new_df[[skill]] <- 0
}
# Iterating through the rows and set 1 if the skill word is found in the Description
for (i in 1:nrow(new_df)) {
for (skill in skills) {
if (grepl(skill, new_df$Description[i], ignore.case = TRUE)) {
new_df[i, skill] <- 1
}
}
}# Creating a new data frame with the desired columns and counts
skill_counts <- new_df %>%
select("Machine learning", "Data visualization", "Research", "Data analysis",
"Statistics", "SQL", "Deep learning", "Programming language", "Math",
"Business intelligence", "Coding", "Business", "Analytics", "Problem solving",
"Big data", "Critical thinking", "Databases", "Natural language processing",
"Computing", "Data mining", "Exploratory data analysis", "Decision-making",
"Data warehouse", "Data wrangling") %>%
summarise_all(funs(sum))## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Reshape the data to make it tidy
skill_counts <- skill_counts %>%
gather(key = "Skill", value = "Number of Jobs")
# Print the resulting table
print(skill_counts)## Skill Number of Jobs
## 1 Machine learning 117
## 2 Data visualization 29
## 3 Research 88
## 4 Data analysis 63
## 5 Statistics 148
## 6 SQL 17
## 7 Deep learning 10
## 8 Programming language 10
## 9 Math 132
## 10 Business intelligence 7
## 11 Coding 6
## 12 Business 117
## 13 Analytics 120
## 14 Problem solving 2
## 15 Big data 8
## 16 Critical thinking 2
## 17 Databases 8
## 18 Natural language processing 5
## 19 Computing 3
## 20 Data mining 23
## 21 Exploratory data analysis 4
## 22 Decision-making 2
## 23 Data warehouse 2
## 24 Data wrangling 0
Top 15 skills
# Load the necessary library
library(dplyr)
# Create a new dataframe with the desired columns and counts
skill_counts <- new_df %>%
select("Machine learning", "Data visualization", "Research", "Data analysis",
"Statistics", "SQL", "Deep learning", "Programming language", "Math",
"Business intelligence", "Coding", "Business", "Analytics", "Problem solving",
"Big data", "Critical thinking", "Databases", "Natural language processing",
"Computing", "Data mining", "Exploratory data analysis", "Decision-making",
"Data warehouse", "Data wrangling") %>%
summarise_all(funs(sum))## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Reshaping the data to make it tidy
skill_counts <- skill_counts %>%
gather(key = "Skill", value = "Number of Jobs")
# Arranging the skills in descending order
top_skills <- skill_counts %>%
arrange(desc(`Number of Jobs`))
# Selecting the top 15 skills
top_15_skills <- top_skills[1:15, ]
# Printing the resulting table
print(top_15_skills)## Skill Number of Jobs
## 1 Statistics 148
## 2 Math 132
## 3 Analytics 120
## 4 Machine learning 117
## 5 Business 117
## 6 Research 88
## 7 Data analysis 63
## 8 Data visualization 29
## 9 Data mining 23
## 10 SQL 17
## 11 Deep learning 10
## 12 Programming language 10
## 13 Big data 8
## 14 Databases 8
## 15 Business intelligence 7
In this analysis we can see the top ten most highly regarded skills within this data set.
# Creating a new data frame with the desired columns and counts
skill_counts <- new_df %>%
select("Machine learning", "Data visualization", "Research", "Data analysis",
"Statistics", "SQL", "Deep learning", "Programming language", "Math",
"Business intelligence", "Coding", "Business", "Analytics", "Problem solving",
"Big data", "Critical thinking", "Databases", "Natural language processing",
"Computing", "Data mining", "Exploratory data analysis", "Decision-making",
"Data warehouse", "Data wrangling") %>%
summarise_all(funs(sum))## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Reshaping the data to make it tidy
skill_counts <- skill_counts %>%
gather(key = "Skill", value = "Number of Jobs")
# Arranging the skills in descending order
top_skills <- skill_counts %>%
arrange(desc(`Number of Jobs`))
# Selecting the top 10 skills
top_10_skills <- top_skills[1:10, ]
# Print the resulting table
print(top_10_skills)## Skill Number of Jobs
## 1 Statistics 148
## 2 Math 132
## 3 Analytics 120
## 4 Machine learning 117
## 5 Business 117
## 6 Research 88
## 7 Data analysis 63
## 8 Data visualization 29
## 9 Data mining 23
## 10 SQL 17
# Creating a horizontal bar plot
ggplot(top_10_skills, aes(y = Skill, x = `Number of Jobs`)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Top 10 Skills in Job Listings",
y = "Skill",
x = "Number of Jobs") +
theme_minimal()# Filter the dataset to select rows with "RE" in the "State" column
re_data <- subset(new_df, State == "RE")
# Create a new dataframe with the desired columns and counts
re_skill_counts <- re_data %>%
select("Machine learning", "Data visualization", "Research", "Data analysis",
"Statistics", "SQL", "Deep learning", "Programming language", "Math",
"Business intelligence", "Coding", "Business", "Analytics", "Problem solving",
"Big data", "Critical thinking", "Databases", "Natural language processing",
"Computing", "Data mining", "Exploratory data analysis", "Decision-making",
"Data warehouse", "Data wrangling") %>%
summarise_all(funs(sum))## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Reshape the data to make it tidy
re_skill_counts <- re_skill_counts %>%
gather(key = "Skill", value = "Number of Jobs")
# Arrange the skills in descending order
re_top_skills <- re_skill_counts %>%
arrange(desc(`Number of Jobs`))
# Select the top 10 skills
re_top_10_skills <- re_top_skills[1:10, ]
# Create a horizontal bar plot for the top 10 skills
library(ggplot2)
ggplot(re_top_10_skills, aes(x = `Number of Jobs`, y = Skill)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Top 10 Skills in Job Listings (State: RE)",
x = "Number of Jobs",
y = "Skill") +
theme_minimal()# Filter the dataset to select rows with "NY" in the "State" column
ny_data <- subset(new_df, State == "NY")
# Create a new dataframe with the desired columns and counts
ny_skill_counts <- ny_data %>%
select("Machine learning", "Data visualization", "Research", "Data analysis",
"Statistics", "SQL", "Deep learning", "Programming language", "Math",
"Business intelligence", "Coding", "Business", "Analytics", "Problem solving",
"Big data", "Critical thinking", "Databases", "Natural language processing",
"Computing", "Data mining", "Exploratory data analysis", "Decision-making",
"Data warehouse", "Data wrangling") %>%
summarise_all(funs(sum))## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Reshape the data to make it tidy
ny_skill_counts <- ny_skill_counts %>%
gather(key = "Skill", value = "Number of Jobs")
# Arrange the skills in descending order
ny_top_skills <- ny_skill_counts %>%
arrange(desc(`Number of Jobs`))
# Select the top 10 skills
ny_top_10_skills <- ny_top_skills[1:10, ]
# Create a horizontal bar plot for the top 10 skills
library(ggplot2)
ggplot(ny_top_10_skills, aes(x = `Number of Jobs`, y = Skill)) +
geom_bar(stat = "identity", fill = "darkblue") +
labs(title = "Top 10 Skills in Job Listings (State: NY)",
x = "Number of Jobs",
y = "Skill") +
theme_minimal()To uncover the most desired data science skills in the job market, this analysis embarked on the journey of web scraping and data analysis. It encountered challenges stemming from various websites, including security restrictions and the absence of standardized data sources. Notably, Octoparse emerged as a valuable tool for collecting data from platforms such as Indeed and Glassdoor.
The most in-demand data science skills in the United States include “Machine learning,” “Math,” “Data visualization,” “Statistics,” and “Business.” Conversely, data science skills required for jobs in New York State and remote positions exhibit differences. In remote jobs, skills such as Math, Machine learning, and Research appear to be predominant. This project has the potential for further exploration, particularly with the inclusion of salary indicators, but it also faced limitations such as website security and duplicate job postings across similar websites.