Create a short document, with the names of group members. You should briefly describe your collaboration tool(s) you'll use as a group, including for communication, code sharing, and project documentation. You should have identified your data sources, where the data can be found, and how to load it. And you should have created at least a logical model for your normalized database, and produced an Entity-Relationship (ER) diagram documenting your database design.
Packages needed:
Communication tools:
Slack channel specifically for project 3
Zoom
In addition to slack, we have met over zoom to discuss where we wanted to head with the project, also our immediate plans for getting the project moving.
Zoom and slack will likely be our main sources of communication moving forward as we start the data collection and coding.
Code sharing:
GitHub
For this project we intend to use a GitHub repository and add one of us as a collaborator to a personal repo.
This will allow for the seamless transfer of data sources, code and any other things that come up during the process.
Slack
Slack will again be utilized for sharing links to methods, data sources and helpful videos.
Project Documentation:
Kaggle Link 1 <- https://www.kaggle.com/datasets/discdiver/data-scientist-general-skills-2018-revised Kaggle Link 2 <- https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023/ Google Docs where our raw csv data originated: “https://docs.google.com/spreadsheets/d/1lac1H2IgCDCs9LLTQL6yb6MUPN1u4C5fJv_6YjipIaM/edit#gid=1072460513”
Zip Recruter - Data Scientist Must-Have Resume Skills and Keywords https://www.ziprecruiter.com/career/Data-Scientist/Resume-Keywords-and-Skills
skills <- read.csv("https://raw.githubusercontent.com/jonburns2454/Project-3-DATA607/main/ds_general_skills_revised.csv")
software <- read.csv("https://raw.githubusercontent.com/jonburns2454/Project-3-DATA607/main/Data%20Science%20Career%20Terms%20-%20ds%20software.csv")
glimpse(skills)## Rows: 30
## Columns: 5
## $ Keyword <chr> "machine learning", "analysis", "statistics", "computer sc…
## $ LinkedIn <chr> "5,701", "5,168", "4,893", "4,517", "3,404", "2,605", "1,8…
## $ Indeed <chr> "3,439", "3,500", "2,992", "2,739", "2,344", "1,961", "1,4…
## $ SimplyHired <chr> "2,561", "2,668", "2,308", "2,093", "1,791", "1,497", "1,1…
## $ Monster <chr> "2,340", "3,306", "2,399", "1,900", "2,053", "1,815", "1,2…
## Rows: 42
## Columns: 12
## $ Keyword <chr> "Python", "R", "SQL", "Spark", "Hadoop"…
## $ LinkedIn <chr> "6,347", "4,553", "3,879", "2,169", "2,…
## $ Indeed <chr> "3,818", "3,106", "2,628", "1,551", "1,…
## $ SimplyHired <chr> "2,888", "2,393", "2,056", "1,167", "1,…
## $ Monster <chr> "2,544", "2,365", "1,841", "1,062", "1,…
## $ LinkedIn.. <chr> "74%", "53%", "45%", "25%", "25%", "23%…
## $ Indeed.. <chr> "74%", "60%", "51%", "30%", "31%", "27%…
## $ SimplyHired.. <chr> "75%", "62%", "54%", "30%", "30%", "28%…
## $ Monster.. <chr> "68%", "63%", "49%", "28%", "32%", "27%…
## $ Avg.. <chr> "73%", "60%", "50%", "29%", "30%", "26%…
## $ GlassDoor.Self.Reported...2017 <chr> "72%", "64%", "51%", "27%", "39%", "33%…
## $ Difference <chr> "1%", "-4%", "-1%", "2%", "-9%", "-7%",…
The plan for this project is to utilize a few different data sources to get the best analysis on data science skills. The first kaggle link is to a data set that features overall data science skills from 2018 which has both the skills and software scraped from Monster, Indeed, SimplyHired, and Linkedin. This first data set from JEFF HALE will allow for some solid EDA focusing in specifically on data science skills. The second Kaggle link looks at data science salaries and will provide us with a more up-to-date view into the industry. This specific data set combines experience levels, and job title data which will be helpful in further insights and analysis into how skills and seniority translate to salary level.