PART 1 - Project 1 Description

Create a short document, with the names of group members. You should briefly describe your collaboration tool(s) you’ll use as a group, including for communication, code sharing, and project documentation. You should have identified your data sources, where the data can be found, and how to load it. And you should have created at least a logical model for your normalized database, and produced an Entity-Relationship (ER) diagram documenting your database design


Project 3 Team Members

  • Sanielle Worrell
  • Vladimir Nimchenko
  • Jose Rodriguez
  • Johnny Rodriguez


Project Tools

The team is using R Studio Cloud (https://rstudio.cloud) for collaboration and code development. This allows us to view and share code within the project. We are using R Markdown within RStudio Cloud for project documentation to publish through RPubs (https://rpubs.com) . To create the ERD, the team used Quick Database Diagrams (https://www.quickdatabasediagrams.com). Source CSV and RMD files are saved in a github repo so files can be accessed centrally (https://github.com/johnnydrodriguez/data607_project3) In addition to this, the team communicates over Zoom and Slack.


Project 3 Data Sources

The data used to answer the question is taken from Glassdoor via Kaggle. The data was scrapped from Glassdoor and posted to the site.

Source: https://www.kaggle.com/datasets/nikhilbhathi/data-scientist-salary-us-glassdoor?resource=download

Source CSV (Uncleaned for Project): https://raw.githubusercontent.com/johnnydrodriguez/data607_project3/main/glassdoor_2021.csv


PART 2 - Which are the most valued data science skills?


Prepping the Environment Project Packages and Libraries

#install.packages("tidyverse")
#install.packages("RMySQL")
#install.packages("ggplot2")

#library(tidyverse)
#library(RMySQL)
#library(ggplot2)


Connecting to the SQL database and loading data


Data clean up and transformation


Analysis



Conclusion