#Introduction
I created a google survey form and received a few data collected. I transfer the csv file into github and got the data cleaned up and started on this assignment.
library(RMySQL)
## Loading required package: DBI
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ stringr 1.4.0
## ✔ tidyr 1.2.0 ✔ forcats 0.5.2
## ✔ readr 2.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
#Converting CSV file to R I exported the CSV data and used MySQLWorkbench. I exported the Data into Github.
mysql <- read.csv("https://raw.githubusercontent.com/AldataSci/Assignment2-607-/main/Homework2%5B607%5D.csv",header=TRUE,sep=",")
head(mysql)
## movie Fri_name Stars
## 1 Eternals Ahmed 4
## 2 Shang-Chi Ahmed 4
## 3 Spider-Man No Way Home Ahmed 5
## 4 Dune Ahmed 3
## 5 Venom Ahmed 1
## 6 No Time to Die Ahmed NULL
#Figuring out the average I went through the average ratings and filter out the NULL value and then null values and then I had to make the chr values under Stars into integer values to calculate the average.
avg_mov <- mysql %>%
group_by(movie) %>%
filter(Stars != "NULL") %>%
summarise(avg= mean(as.integer(Stars)))
avg_mov
## # A tibble: 6 × 2
## movie avg
## <chr> <dbl>
## 1 "Dune" 3
## 2 "Eternals" 2.8
## 3 "No Time to Die" 3
## 4 "Shang-Chi" 3.6
## 5 "Spider-Man No Way Home" 4.8
## 6 "Venom " 2.2
Once I cleaned the data I went to visualize it
library(ggplot2)
ggplot(data=avg_mov, aes(x=movie,y=avg , fill=movie)) +
coord_flip() +
geom_bar(stat="identity")
From this Survey I see Spider-Man No Way Home was the best option to
watch.
#Movies not recommended
nul <- mysql %>%
group_by(movie) %>%
filter(Stars=="NULL") %>%
count(Stars,sort=TRUE)
nul
## # A tibble: 2 × 3
## # Groups: movie [2]
## movie Stars n
## <chr> <chr> <int>
## 1 No Time to Die NULL 4
## 2 Dune NULL 1
library(ggplot2)
ggplot(data=nul,aes(x=movie,y=n,fill=movie)) +
geom_bar(stat="Identity")