Our team decided to use Slack as the main method of written communication, Zoom as a means to carry out team meetups, and Asana as a way to assign tasks to individuals and set deadlines.
Code and documents will be shared through two github repositories:
We may shift to a single github repository, but for the moment collaboration is occurring through both github repositories. Currently, https://github.com/baruab/msdsrepo/tree/main/Project_3_607 holds the data sources for the project and https://github.com/baruab/Team2_Project_3_607 is being used for team collaboration.
Our data source is from the ‘2018 Kaggle Machine Learning & Data Science Survey’ which we retrieved from Kaggle.com (https://www.kaggle.com/kaggle/kaggle-survey-2018). This data source contains three separate datasets:
After some time exploring the data, the team decided to focus our analysis on the multipleChoiceResponses.csv file.
We stored the data into the Github repository to allow us to easily access the data in r from any machine. To load the data run the following lines of code:
## survey schema
survey.schema <- read.csv("https://raw.githubusercontent.com/baruab/msdsrepo/main/Project_3_607/kaggle-survey-2018/SurveySchema.csv")
## freeform responses
free.form <- read.csv("https://raw.githubusercontent.com/baruab/msdsrepo/main/Project_3_607/kaggle-survey-2018/freeFormResponses.csv")
## multiple choice
multiple.choice <- read.csv("https://raw.githubusercontent.com/baruab/msdsrepo/main/Project_3_607/kaggle-survey-2018/multipleChoiceResponses.csv")
Project 3: Entity Relationship Diagram