I read a csv file into R and I wanted to learn some basic information on the data I collected from my friends. I was curious on the average rating of each movie my friends watched and what movies my friends didn’t watched. After doing some data cleaning I then graphed the results on a bar graph.
I exported my sq data into a csv file and I placed the csv file on Github. I then made R read the csv file on Github and then I placed all that data into a data frame called mysql. I then took a look at the csv file to make sure I had the correct csv file read.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
mysql <- read.csv("https://raw.githubusercontent.com/AldataSci/Assignment2-607-/main/Homework2%5B607%5D.csv",header=TRUE,sep=",")
head(mysql)
## movie Fri_name Stars
## 1 Eternals Ahmed 4
## 2 Shang-Chi Ahmed 4
## 3 Spider-Man No Way Home Ahmed 5
## 4 Dune Ahmed 3
## 5 Venom Ahmed 1
## 6 No Time to Die Ahmed NULL
I was curious on what the average rating was between movies that my friends had watched.I first filtered out the null values and then I had to make the chr values under Stars into integer values to calculate the average.
avg_mov <- mysql %>%
group_by(movie) %>%
filter(Stars != "NULL") %>%
summarise(avg= mean(as.integer(Stars)))
avg_mov
## # A tibble: 6 x 2
## movie avg
## <chr> <dbl>
## 1 "Dune" 3
## 2 "Eternals" 2.8
## 3 "No Time to Die" 3
## 4 "Shang-Chi" 3.6
## 5 "Spider-Man No Way Home" 4.8
## 6 "Venom " 2.2
Here I graphed my data and not surprisingly Spider-Man was watched by all my friends and they also rated it very highly with a rating of 4.8
library(ggplot2)
ggplot(data=avg_mov, aes(x=movie,y=avg , fill=movie)) +
coord_flip() +
geom_bar(stat="identity")
I was curious to see which movies my friends did not watch so I aggregated the data by movie, I filtered the condition where the stars were null and I counted how many people did not see which movies. 4 of my friends did not watch No Time to Die and one of my friend did not watch dune.
nul <- mysql %>%
group_by(movie) %>%
filter(Stars=="NULL") %>%
count(Stars,sort=TRUE)
nul
## # A tibble: 2 x 3
## # Groups: movie [2]
## movie Stars n
## <chr> <chr> <int>
## 1 No Time to Die NULL 4
## 2 Dune NULL 1
Here I graphed my results in another bar graph.
library(ggplot2)
ggplot(data=nul,aes(x=movie,y=n,fill=movie)) +
geom_bar(stat="Identity")