Data 607 Assignment 2 SQL/R

#Introduction

I created a google survey form and received a few data collected. I transfer the csv file into github and got the data cleaned up and started on this assignment.

library(RMySQL)

## Loading required package: DBI

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──

## ✔ ggplot2 3.4.0     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.2
## ✔ readr   2.1.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

#Converting CSV file to R I exported the CSV data and used MySQLWorkbench. I exported the Data into Github.

mysql <- read.csv("https://raw.githubusercontent.com/AldataSci/Assignment2-607-/main/Homework2%5B607%5D.csv",header=TRUE,sep=",")
head(mysql)

##                    movie Fri_name Stars
## 1               Eternals    Ahmed     4
## 2              Shang-Chi    Ahmed     4
## 3 Spider-Man No Way Home    Ahmed     5
## 4                   Dune    Ahmed     3
## 5                 Venom     Ahmed     1
## 6         No Time to Die    Ahmed  NULL

#Figuring out the average I went through the average ratings and filter out the NULL value and then null values and then I had to make the chr values under Stars into integer values to calculate the average.

avg_mov <- mysql %>%
  group_by(movie) %>%
  filter(Stars != "NULL") %>%
  summarise(avg= mean(as.integer(Stars)))
avg_mov

## # A tibble: 6 × 2
##   movie                      avg
##   <chr>                    <dbl>
## 1 "Dune"                     3  
## 2 "Eternals"                 2.8
## 3 "No Time to Die"           3  
## 4 "Shang-Chi"                3.6
## 5 "Spider-Man No Way Home"   4.8
## 6 "Venom "                   2.2

Once I cleaned the data I went to visualize it

library(ggplot2)
ggplot(data=avg_mov, aes(x=movie,y=avg , fill=movie)) + 
  coord_flip() +
  geom_bar(stat="identity")

From this Survey I see Spider-Man No Way Home was the best option to watch.

#Movies not recommended

nul <- mysql %>%
  group_by(movie) %>%
  filter(Stars=="NULL") %>%
  count(Stars,sort=TRUE)
nul

## # A tibble: 2 × 3
## # Groups:   movie [2]
##   movie          Stars     n
##   <chr>          <chr> <int>
## 1 No Time to Die NULL      4
## 2 Dune           NULL      1

library(ggplot2)
ggplot(data=nul,aes(x=movie,y=n,fill=movie)) +
  geom_bar(stat="Identity")

Data 607 Assignment 2 SQL/R

Wilson Chau

2022-12-05