Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R data frame.
For this assignment you first have to collect the data and then input it into the SQL database. I used imaginary friends data for this assignment since it was allowable.
Next is to find a way to export to the data. There are many ways to export the data. I chose to export the data and converting it into a .csv file, which then converts it into a .txt file that can be then uploaded into github repository to make it into a URL link. The data is read with the read.csv command. (One thing that I noticed while trying to export the data, if the Title of the columns does not have "" it doesn’t recognize all of the columns in the table)
The SQL code that was used to collect the data from the table was in MYSQL:
SELECT * FROM movie_ratings.ratings;
Using SELECT * selects all of the data that is in the table.
movie_ratings <- read.csv("https://raw.githubusercontent.com/Luz917/moviesrated/master/movratings.txt", stringsAsFactors = FALSE)
str(movie_ratings)
## 'data.frame': 10 obs. of 8 variables:
## $ idfriends: int 1 2 3 4 5 6 7 8 9 10
## $ N_F : chr "Stacy" "Claribel" "Matt" "Corey" ...
## $ H_a_S : int 5 5 4 4 3 3 5 5 4 2
## $ L_K : int 5 3 4 4 4 3 4 4 3 2
## $ G_B : int 2 3 3 3 3 2 4 3 3 2
## $ SFFH : int 5 5 5 4 5 4 2 5 5 5
## $ D_T_F : int 5 2 5 4 5 4 4 4 5 5
## $ A_B : int 2 4 2 2 2 1 2 2 3 1
dim(movie_ratings)
## [1] 10 8
movie_ratings
## idfriends N_F H_a_S L_K G_B SFFH D_T_F A_B
## 1 1 Stacy 5 5 2 5 5 2
## 2 2 Claribel 5 3 3 5 2 4
## 3 3 Matt 4 4 3 5 5 2
## 4 4 Corey 4 4 3 4 4 2
## 5 5 Bailey 3 4 3 5 5 2
## 6 6 Alex 3 3 2 4 4 1
## 7 7 Angel 5 4 4 2 4 2
## 8 8 Chris 5 4 3 5 4 2
## 9 9 Cynthia 4 3 3 5 5 3
## 10 10 Martha 2 2 2 5 5 1
colnames(movie_ratings) <- c("ID","Name", "Hobbs_and_Shaw","Lion_King", "Good_Boys","Spiderman_Far_From_Home", "Dora_the_Explorer","Angry_Birds_2" )
movie_ratings
## ID Name Hobbs_and_Shaw Lion_King Good_Boys Spiderman_Far_From_Home
## 1 1 Stacy 5 5 2 5
## 2 2 Claribel 5 3 3 5
## 3 3 Matt 4 4 3 5
## 4 4 Corey 4 4 3 4
## 5 5 Bailey 3 4 3 5
## 6 6 Alex 3 3 2 4
## 7 7 Angel 5 4 4 2
## 8 8 Chris 5 4 3 5
## 9 9 Cynthia 4 3 3 5
## 10 10 Martha 2 2 2 5
## Dora_the_Explorer Angry_Birds_2
## 1 5 2
## 2 2 4
## 3 5 2
## 4 4 2
## 5 5 2
## 6 4 1
## 7 4 2
## 8 4 2
## 9 5 3
## 10 5 1
I chose to create the table this way because it would be easier to find information for each movie this way like if you wanted to find the mean for Lion King:
mean(movie_ratings$Lion_King)
## [1] 3.6
Summary of the Data
summary(movie_ratings)
## ID Name Hobbs_and_Shaw Lion_King
## Min. : 1.00 Length:10 Min. :2.00 Min. :2.0
## 1st Qu.: 3.25 Class :character 1st Qu.:3.25 1st Qu.:3.0
## Median : 5.50 Mode :character Median :4.00 Median :4.0
## Mean : 5.50 Mean :4.00 Mean :3.6
## 3rd Qu.: 7.75 3rd Qu.:5.00 3rd Qu.:4.0
## Max. :10.00 Max. :5.00 Max. :5.0
## Good_Boys Spiderman_Far_From_Home Dora_the_Explorer Angry_Birds_2
## Min. :2.00 Min. :2.00 Min. :2.0 Min. :1.0
## 1st Qu.:2.25 1st Qu.:4.25 1st Qu.:4.0 1st Qu.:2.0
## Median :3.00 Median :5.00 Median :4.5 Median :2.0
## Mean :2.80 Mean :4.50 Mean :4.3 Mean :2.1
## 3rd Qu.:3.00 3rd Qu.:5.00 3rd Qu.:5.0 3rd Qu.:2.0
## Max. :4.00 Max. :5.00 Max. :5.0 Max. :4.0
Replace some values with NA to make the data more realistic to show that not everyone watches the same movies. I decided to make the 1’s and 2’s to be NA.
(df[df == 0] <- NA)
movie_ratings[movie_ratings==1]<-NA
movie_ratings[movie_ratings==2]<-NA
movie_ratings
## ID Name Hobbs_and_Shaw Lion_King Good_Boys Spiderman_Far_From_Home
## 1 NA Stacy 5 5 NA 5
## 2 NA Claribel 5 3 3 5
## 3 3 Matt 4 4 3 5
## 4 4 Corey 4 4 3 4
## 5 5 Bailey 3 4 3 5
## 6 6 Alex 3 3 NA 4
## 7 7 Angel 5 4 4 NA
## 8 8 Chris 5 4 3 5
## 9 9 Cynthia 4 3 3 5
## 10 10 Martha NA NA NA 5
## Dora_the_Explorer Angry_Birds_2
## 1 5 NA
## 2 NA 4
## 3 5 NA
## 4 4 NA
## 5 5 NA
## 6 4 NA
## 7 4 NA
## 8 4 NA
## 9 5 3
## 10 5 NA
summary(movie_ratings)
## ID Name Hobbs_and_Shaw Lion_King
## Min. : 3.00 Length:10 Min. :3.000 Min. :3.000
## 1st Qu.: 4.75 Class :character 1st Qu.:4.000 1st Qu.:3.000
## Median : 6.50 Mode :character Median :4.000 Median :4.000
## Mean : 6.50 Mean :4.222 Mean :3.778
## 3rd Qu.: 8.25 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :10.00 Max. :5.000 Max. :5.000
## NA's :2 NA's :1 NA's :1
## Good_Boys Spiderman_Far_From_Home Dora_the_Explorer Angry_Birds_2
## Min. :3.000 Min. :4.000 Min. :4.000 Min. :3.00
## 1st Qu.:3.000 1st Qu.:5.000 1st Qu.:4.000 1st Qu.:3.25
## Median :3.000 Median :5.000 Median :5.000 Median :3.50
## Mean :3.143 Mean :4.778 Mean :4.556 Mean :3.50
## 3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:3.75
## Max. :4.000 Max. :5.000 Max. :5.000 Max. :4.00
## NA's :3 NA's :1 NA's :1 NA's :8
Subset of Data
Only including even rows, and only 3 movie columns
mov_ratings <- movie_ratings[c(2,4,6,8,10),c(1,2,4,6,7)]
mov_ratings
## ID Name Lion_King Spiderman_Far_From_Home Dora_the_Explorer
## 2 NA Claribel 3 5 NA
## 4 4 Corey 4 4 4
## 6 6 Alex 3 4 4
## 8 8 Chris 4 5 4
## 10 10 Martha NA 5 5
dim(mov_ratings)
## [1] 5 5
summary(mov_ratings)
## ID Name Lion_King Spiderman_Far_From_Home
## Min. : 4.0 Length:5 Min. :3.0 Min. :4.0
## 1st Qu.: 5.5 Class :character 1st Qu.:3.0 1st Qu.:4.0
## Median : 7.0 Mode :character Median :3.5 Median :5.0
## Mean : 7.0 Mean :3.5 Mean :4.6
## 3rd Qu.: 8.5 3rd Qu.:4.0 3rd Qu.:5.0
## Max. :10.0 Max. :4.0 Max. :5.0
## NA's :1 NA's :1
## Dora_the_Explorer
## Min. :4.00
## 1st Qu.:4.00
## Median :4.00
## Mean :4.25
## 3rd Qu.:4.25
## Max. :5.00
## NA's :1
In conclusion, not sure if this is the best method of exporting the MySQL database into R Studio but it exports the data and it gets read into R Studio and you are still able to manipulate the data and create a subset in R Studio. And the other thing is that not much sql is used in order to do this method.