Introduction:
In this weeks assignment I aim to create a table using SQL workbench and load it to the class Azure server. Then I want to utilize that data stored in the class SQL azure database using RStudio.
Creating an azure MySQL Database, Installing MySQL and MySQL Workbench using the resources provided in the class’s content.
Connecting MySQL Workbench to class Azure Database.
## The following script was provided via email and run to create a hash to use as my password for the connections made from SQL workbench to Azure.
# Load necessary libraries
library(digest) # For generating the hash
library(clipr) # For copying to clipboard
## Welcome to clipr. See ?write_clip for advisories on writing to the clipboard in R.
# The string to hash
input_string <- '24610585'
# Generate the hash using SHA-256
hash_hex <- digest(input_string, algo = "sha256", serialize = FALSE)
# Declare a variable with only a quarter of the resulting hash
quarter_hash <- substr(hash_hex, 1, nchar(hash_hex) %/% 4)
## First create connection using the server name, port, and my user and password found above.
## Then connect to database using the URL, cunydata607sql.mysql.database.azure.com, and the user and password from above.
Creating in SQL, on both the class azure database and my own azure database, the table of 6x6 for 1-5 movie ratings by five of my friends to six movies.
Connecting R to the class Azure SQL database.
## Loading required packages for connection
library("RMySQL")
## Loading required package: DBI
## Connecting to the SQL database
mydb <- dbConnect(MySQL(),user= 'brandon.chung85',password= '704c7720437dbcfb',dbname= 'brandon.chung85',host= 'cunydata607sql.mysql.database.azure.com')
dbListTables(mydb)
## [1] "movieratings"
## Importing data table from SQL to R as a dataframe
data <- dbReadTable(mydb, "movieratings")
## Previewing data
head(data)
## Movie.Name Claudia Raj Ryan Kelly Mark
## 1 A Quiet Place: Day One 3 3 2 5 2
## 2 Avengers End Game 5 5 5 4 3
## 3 Avengers Infinity War 5 5 4 4 3
## 4 Deadpool and Wolverine 4 3 5 2 4
## 5 Joker 4 4 3 4 5
## 6 The Arrival 5 2 3 3 3
summary(data)
## Movie.Name Claudia Raj Ryan
## Length:6 Min. :3.000 Min. :2.000 Min. :2.000
## Class :character 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.000
## Mode :character Median :4.500 Median :3.500 Median :3.500
## Mean :4.333 Mean :3.667 Mean :3.667
## 3rd Qu.:5.000 3rd Qu.:4.750 3rd Qu.:4.750
## Max. :5.000 Max. :5.000 Max. :5.000
## Kelly Mark
## Min. :2.000 Min. :2.000
## 1st Qu.:3.250 1st Qu.:3.000
## Median :4.000 Median :3.000
## Mean :3.667 Mean :3.333
## 3rd Qu.:4.000 3rd Qu.:3.750
## Max. :5.000 Max. :5.000
## A quick summary of the dataframe shows us some descriptive statistics of the data. Claudia has the highest median score out of the group, she may be a more avid movie lover than the others. Other call outs can include that all participants scored a 5 out of 5 rating on at least one of the movies listed, and non of the participants rated any movie below a 2 score.
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- data %>%
mutate(total_score = Claudia + Raj + Ryan + Kelly + Mark)
ggplot (data = data, aes (x = Movie.Name, y = total_score)) +
geom_point() +
scale_x_discrete(guide = guide_axis(angle = 90))
## Here we can see each movie total score (a sum of the scores given by each participant). Avengers end game ranked the highest and A Quiet Place: Day One ranked the lowest in the group.