Data 607 Hw 2

Intro:

I read a csv file into R and I wanted to learn some basic information on the data I collected from my friends. I was curious on the average rating of each movie my friends watched and what movies my friends didn’t watched. After doing some data cleaning I then graphed the results on a bar graph.

Reading a csv file to R

I exported my sq data into a csv file and I placed the csv file on Github. I then made R read the csv file on Github and then I placed all that data into a data frame called mysql. I then took a look at the csv file to make sure I had the correct csv file read.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

mysql <- read.csv("https://raw.githubusercontent.com/AldataSci/Assignment2-607-/main/Homework2%5B607%5D.csv",header=TRUE,sep=",")
head(mysql)

##                    movie Fri_name Stars
## 1               Eternals    Ahmed     4
## 2              Shang-Chi    Ahmed     4
## 3 Spider-Man No Way Home    Ahmed     5
## 4                   Dune    Ahmed     3
## 5                 Venom     Ahmed     1
## 6         No Time to Die    Ahmed  NULL

Average Rating of Movies My Friends Watched

I was curious on what the average rating was between movies that my friends had watched.I first filtered out the null values and then I had to make the chr values under Stars into integer values to calculate the average.

avg_mov <- mysql %>%
  group_by(movie) %>%
  filter(Stars != "NULL") %>%
  summarise(avg= mean(as.integer(Stars)))
avg_mov

## # A tibble: 6 x 2
##   movie                      avg
##   <chr>                    <dbl>
## 1 "Dune"                     3  
## 2 "Eternals"                 2.8
## 3 "No Time to Die"           3  
## 4 "Shang-Chi"                3.6
## 5 "Spider-Man No Way Home"   4.8
## 6 "Venom "                   2.2

Here I graphed my data and not surprisingly Spider-Man was watched by all my friends and they also rated it very highly with a rating of 4.8

library(ggplot2)
ggplot(data=avg_mov, aes(x=movie,y=avg , fill=movie)) + 
  coord_flip() +
  geom_bar(stat="identity")

Movies my Friends did not watched and how many didn’t watch

I was curious to see which movies my friends did not watch so I aggregated the data by movie, I filtered the condition where the stars were null and I counted how many people did not see which movies. 4 of my friends did not watch No Time to Die and one of my friend did not watch dune.

nul <- mysql %>%
  group_by(movie) %>%
  filter(Stars=="NULL") %>%
  count(Stars,sort=TRUE)
nul

## # A tibble: 2 x 3
## # Groups:   movie [2]
##   movie          Stars     n
##   <chr>          <chr> <int>
## 1 No Time to Die NULL      4
## 2 Dune           NULL      1

Here I graphed my results in another bar graph.

library(ggplot2)
ggplot(data=nul,aes(x=movie,y=n,fill=movie)) +
  geom_bar(stat="Identity")

Data 607 Hw 2

Al Haque

2/9/2022

Intro:

Reading a csv file to R

Average Rating of Movies My Friends Watched

Movies my Friends did not watched and how many didn’t watch