HW 2 Data 607

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R data frame.

For this assignment you first have to collect the data and then input it into the SQL database. I used imaginary friends data for this assignment since it was allowable.

Next is to find a way to export to the data. There are many ways to export the data. I chose to export the data and converting it into a .csv file, which then converts it into a .txt file that can be then uploaded into github repository to make it into a URL link. The data is read with the read.csv command. (One thing that I noticed while trying to export the data, if the Title of the columns does not have "" it doesn’t recognize all of the columns in the table)

The SQL code that was used to collect the data from the table was in MYSQL:

SELECT * FROM movie_ratings.ratings;

Using SELECT * selects all of the data that is in the table.

movie_ratings <-   read.csv("https://raw.githubusercontent.com/Luz917/moviesrated/master/movratings.txt", stringsAsFactors = FALSE)
str(movie_ratings)

## 'data.frame':    10 obs. of  8 variables:
##  $ idfriends: int  1 2 3 4 5 6 7 8 9 10
##  $ N_F      : chr  "Stacy" "Claribel" "Matt" "Corey" ...
##  $ H_a_S    : int  5 5 4 4 3 3 5 5 4 2
##  $ L_K      : int  5 3 4 4 4 3 4 4 3 2
##  $ G_B      : int  2 3 3 3 3 2 4 3 3 2
##  $ SFFH     : int  5 5 5 4 5 4 2 5 5 5
##  $ D_T_F    : int  5 2 5 4 5 4 4 4 5 5
##  $ A_B      : int  2 4 2 2 2 1 2 2 3 1

dim(movie_ratings)

## [1] 10  8

movie_ratings

##    idfriends      N_F H_a_S L_K G_B SFFH D_T_F A_B
## 1          1    Stacy     5   5   2    5     5   2
## 2          2 Claribel     5   3   3    5     2   4
## 3          3     Matt     4   4   3    5     5   2
## 4          4    Corey     4   4   3    4     4   2
## 5          5   Bailey     3   4   3    5     5   2
## 6          6     Alex     3   3   2    4     4   1
## 7          7    Angel     5   4   4    2     4   2
## 8          8    Chris     5   4   3    5     4   2
## 9          9  Cynthia     4   3   3    5     5   3
## 10        10   Martha     2   2   2    5     5   1

colnames(movie_ratings) <- c("ID","Name", "Hobbs_and_Shaw","Lion_King", "Good_Boys","Spiderman_Far_From_Home", "Dora_the_Explorer","Angry_Birds_2" )
movie_ratings

##    ID     Name Hobbs_and_Shaw Lion_King Good_Boys Spiderman_Far_From_Home
## 1   1    Stacy              5         5         2                       5
## 2   2 Claribel              5         3         3                       5
## 3   3     Matt              4         4         3                       5
## 4   4    Corey              4         4         3                       4
## 5   5   Bailey              3         4         3                       5
## 6   6     Alex              3         3         2                       4
## 7   7    Angel              5         4         4                       2
## 8   8    Chris              5         4         3                       5
## 9   9  Cynthia              4         3         3                       5
## 10 10   Martha              2         2         2                       5
##    Dora_the_Explorer Angry_Birds_2
## 1                  5             2
## 2                  2             4
## 3                  5             2
## 4                  4             2
## 5                  5             2
## 6                  4             1
## 7                  4             2
## 8                  4             2
## 9                  5             3
## 10                 5             1

I chose to create the table this way because it would be easier to find information for each movie this way like if you wanted to find the mean for Lion King:

mean(movie_ratings$Lion_King)

## [1] 3.6

Summary of the Data

summary(movie_ratings)

##        ID            Name           Hobbs_and_Shaw   Lion_King  
##  Min.   : 1.00   Length:10          Min.   :2.00   Min.   :2.0  
##  1st Qu.: 3.25   Class :character   1st Qu.:3.25   1st Qu.:3.0  
##  Median : 5.50   Mode  :character   Median :4.00   Median :4.0  
##  Mean   : 5.50                      Mean   :4.00   Mean   :3.6  
##  3rd Qu.: 7.75                      3rd Qu.:5.00   3rd Qu.:4.0  
##  Max.   :10.00                      Max.   :5.00   Max.   :5.0  
##    Good_Boys    Spiderman_Far_From_Home Dora_the_Explorer Angry_Birds_2
##  Min.   :2.00   Min.   :2.00            Min.   :2.0       Min.   :1.0  
##  1st Qu.:2.25   1st Qu.:4.25            1st Qu.:4.0       1st Qu.:2.0  
##  Median :3.00   Median :5.00            Median :4.5       Median :2.0  
##  Mean   :2.80   Mean   :4.50            Mean   :4.3       Mean   :2.1  
##  3rd Qu.:3.00   3rd Qu.:5.00            3rd Qu.:5.0       3rd Qu.:2.0  
##  Max.   :4.00   Max.   :5.00            Max.   :5.0       Max.   :4.0

Replace some values with NA to make the data more realistic to show that not everyone watches the same movies. I decided to make the 1’s and 2’s to be NA.

(df[df == 0] <- NA)

movie_ratings[movie_ratings==1]<-NA
movie_ratings[movie_ratings==2]<-NA
movie_ratings

##    ID     Name Hobbs_and_Shaw Lion_King Good_Boys Spiderman_Far_From_Home
## 1  NA    Stacy              5         5        NA                       5
## 2  NA Claribel              5         3         3                       5
## 3   3     Matt              4         4         3                       5
## 4   4    Corey              4         4         3                       4
## 5   5   Bailey              3         4         3                       5
## 6   6     Alex              3         3        NA                       4
## 7   7    Angel              5         4         4                      NA
## 8   8    Chris              5         4         3                       5
## 9   9  Cynthia              4         3         3                       5
## 10 10   Martha             NA        NA        NA                       5
##    Dora_the_Explorer Angry_Birds_2
## 1                  5            NA
## 2                 NA             4
## 3                  5            NA
## 4                  4            NA
## 5                  5            NA
## 6                  4            NA
## 7                  4            NA
## 8                  4            NA
## 9                  5             3
## 10                 5            NA

summary(movie_ratings)

##        ID            Name           Hobbs_and_Shaw    Lion_King    
##  Min.   : 3.00   Length:10          Min.   :3.000   Min.   :3.000  
##  1st Qu.: 4.75   Class :character   1st Qu.:4.000   1st Qu.:3.000  
##  Median : 6.50   Mode  :character   Median :4.000   Median :4.000  
##  Mean   : 6.50                      Mean   :4.222   Mean   :3.778  
##  3rd Qu.: 8.25                      3rd Qu.:5.000   3rd Qu.:4.000  
##  Max.   :10.00                      Max.   :5.000   Max.   :5.000  
##  NA's   :2                          NA's   :1       NA's   :1      
##    Good_Boys     Spiderman_Far_From_Home Dora_the_Explorer Angry_Birds_2 
##  Min.   :3.000   Min.   :4.000           Min.   :4.000     Min.   :3.00  
##  1st Qu.:3.000   1st Qu.:5.000           1st Qu.:4.000     1st Qu.:3.25  
##  Median :3.000   Median :5.000           Median :5.000     Median :3.50  
##  Mean   :3.143   Mean   :4.778           Mean   :4.556     Mean   :3.50  
##  3rd Qu.:3.000   3rd Qu.:5.000           3rd Qu.:5.000     3rd Qu.:3.75  
##  Max.   :4.000   Max.   :5.000           Max.   :5.000     Max.   :4.00  
##  NA's   :3       NA's   :1               NA's   :1         NA's   :8

Subset of Data

Only including even rows, and only 3 movie columns

mov_ratings <- movie_ratings[c(2,4,6,8,10),c(1,2,4,6,7)]
mov_ratings

##    ID     Name Lion_King Spiderman_Far_From_Home Dora_the_Explorer
## 2  NA Claribel         3                       5                NA
## 4   4    Corey         4                       4                 4
## 6   6     Alex         3                       4                 4
## 8   8    Chris         4                       5                 4
## 10 10   Martha        NA                       5                 5

dim(mov_ratings)

## [1] 5 5

summary(mov_ratings)

##        ID           Name             Lion_King   Spiderman_Far_From_Home
##  Min.   : 4.0   Length:5           Min.   :3.0   Min.   :4.0            
##  1st Qu.: 5.5   Class :character   1st Qu.:3.0   1st Qu.:4.0            
##  Median : 7.0   Mode  :character   Median :3.5   Median :5.0            
##  Mean   : 7.0                      Mean   :3.5   Mean   :4.6            
##  3rd Qu.: 8.5                      3rd Qu.:4.0   3rd Qu.:5.0            
##  Max.   :10.0                      Max.   :4.0   Max.   :5.0            
##  NA's   :1                         NA's   :1                            
##  Dora_the_Explorer
##  Min.   :4.00     
##  1st Qu.:4.00     
##  Median :4.00     
##  Mean   :4.25     
##  3rd Qu.:4.25     
##  Max.   :5.00     
##  NA's   :1

In conclusion, not sure if this is the best method of exporting the MySQL database into R Studio but it exports the data and it gets read into R Studio and you are still able to manipulate the data and create a subset in R Studio. And the other thing is that not much sql is used in order to do this method.

HW 2 Data 607

Maryluz Cruz

9/6/2019