Part 1 of Project 3

Members

Nfn Tenzin Dakar
Nwe Oo Mon (Nina)
Crystal Quezada

Description

Our group, DATA DOMINATION, will use Zoom and text messages for communication. For code sharing and documentation, we’re using RStudio Cloud with R Markdown published on RPubs. All files, including data and documentation, are stored in a GitHub repo. We’ve created an entity-relationship diagram (ERD) using Mysql DataBase and identified our data sources, which will be loaded via CSV files from GitHub.

Collaboration Tools

Our team leverages several tools to streamline collaboration and project development:

• RStudio Cloud (rstudio.cloud): This platform facilitates collaborative code development, allowing all team members to view, edit, and share code seamlessly in real-time.

• R Markdown in RStudio Cloud: We use R Markdown to document our project, and publish the results through RPubs (rpubs.com), making our work easily accessible.

• MySQL : This tool was employed to design the entity-relationship diagram (ERD) for the project.

• GitHub Repository : All source CSV files and R Markdown (RMD) files are stored centrally on GitHub, allowing the team to access and manage files efficiently.

• Communication: Our discussions and updates happen over Zoom for meetings and text messages for daily communication, ensuring smooth coordination.

Sources

Kaggle ML & DS Survey: Our dataset is job listings based on software, which can be found on Kaggle, it is also stored in our collaborative GitHub repository.

The dataset presents job market data related to various programming and data tools, comparing their demand across different job platforms: LinkedIn, Indeed, SimplyHired, Monster, etc. The table shows the number of job postings mentioning each keyword (e.g., Python, R, SQL) on different platforms and provides the percentage of the total postings each keyword represents on those platforms.

Entity Relationship Diagram

• Our diagram can be found on our shared GitHub Repository.

Job_listing_ERD In our ER diagram, we created three tables: Keyword, JobSource, and Metrics.

The Keyword table stores the list of programming languages or technologies (e.g., Python, SQL, R), while the JobSource table holds different job platforms (e.g., LinkedIn, Indeed, Monster). The Metrics table acts as an associative entity that breaks down the many-to-many relationship between Keyword and Job Source into two one-to-many relationships.

One-to-many between Keyword and Metrics:

For instance, “Python” can appear in multiple metrics entries, each representing a different job source (e.g., LinkedIn, Indeed).

One-to-many between Job Source and Metrics:

For example, LinkedIn can have metrics for various keywords (e.g., Python, SQL, R).

Metrics as the Associative Entity

Metrics has a foreign key to Keyword (keyword_id) and a foreign key to JobSource (source_id).

• Our sql codes can be found on our shared GitHub Repository.

MySQL Data Import

Data from MySQL will be imported to R in following steps:

Install and Load Necessary Packages: First, R packages DBI and RMariaDB are installed and loaded to connect and interact with the MySQL database.

Store the Database Password Securely: To avoid exposing the database password in the code, the Password will be stored in an environment file (.Renviron).

Connect to the MySQL Database: Using dbConnect(), a connection will established to the MySQL database by providing the necessary credentials (user, password, database name, host, and port).

Load Data from the Database: Once connected, SQL queries will be used to fetch data from specific tables. The result is stored in an R dataframe.

Close the Connection: After loading the data, the database will be disconnected using dbDisconnect() to free up resources.

Part 1 of Project 3

Team - Data Domination

Nfn Tenzin Dakar, Nwe Oo Mon, Crystal Quezada

2024-10-07

Table of Contents

Members

Description

Collaboration Tools

Sources

Entity Relationship Diagram

MySQL Data Import

Data Clean Up and Transformation

Data Analysis

Conclusion