Introduction

The objective is to construct a survey for the six recent popular movies. several people will be asked to participate to rate each of these movie that they have seen on a scale of 1 to 5. Results will be stored in a MySQL database and then load the information into an R dataframe for analysis.

Library Requirement

The mysql and ggplot will be used to import, select query, and plot the data frame

library('DBI')
library('RMySQL')
library('ggplot2')

Import data

Connecting R to data base using MySQL library, then select a query to create a dataset and store the information into data frame.

##    Participantid ParticipantName               movieTitle rate
## 1              1          Zahara            DON'T BREATHE    3
## 2              2         Mohamad            DON'T BREATHE    5
## 3              3          Zeinab            DON'T BREATHE    4
## 4              4           Hadee            DON'T BREATHE    5
## 5              5           Salma            DON'T BREATHE    5
## 6              6         Anthony            DON'T BREATHE    5
## 7              1          Zahara            SUICIDE SQUAD    4
## 8              2         Mohamad            SUICIDE SQUAD    4
## 9              3          Zeinab            SUICIDE SQUAD    5
## 10             4           Hadee            SUICIDE SQUAD    4
## 11             5           Salma            SUICIDE SQUAD    4
## 12             6         Anthony            SUICIDE SQUAD    5
## 13             1          Zahara KUBO AND THE TWO STRINGS    5
## 14             2         Mohamad KUBO AND THE TWO STRINGS    3
## 15             3          Zeinab KUBO AND THE TWO STRINGS    3
## 16             4           Hadee KUBO AND THE TWO STRINGS    3
## 17             5           Salma KUBO AND THE TWO STRINGS    4
## 18             6         Anthony KUBO AND THE TWO STRINGS    3
## 19             1          Zahara            SAUSAGE PARTY    2
## 20             2         Mohamad            SAUSAGE PARTY    4
## 21             3          Zeinab            SAUSAGE PARTY    3
## 22             4           Hadee            SAUSAGE PARTY    4
## 23             5           Salma            SAUSAGE PARTY    5
## 24             6         Anthony            SAUSAGE PARTY    2
## 25             1          Zahara   MECHANIC: RESURRECTION    4
## 26             2         Mohamad   MECHANIC: RESURRECTION    4
## 27             3          Zeinab   MECHANIC: RESURRECTION    2
## 28             4           Hadee   MECHANIC: RESURRECTION    4
## 29             5           Salma   MECHANIC: RESURRECTION    4
## 30             6         Anthony   MECHANIC: RESURRECTION    1
## 31             1          Zahara            PETE'S DRAGON    1
## 32             2         Mohamad            PETE'S DRAGON    2
## 33             3          Zeinab            PETE'S DRAGON    2
## 34             4           Hadee            PETE'S DRAGON    2
## 35             5           Salma            PETE'S DRAGON    5
## 36             6         Anthony            PETE'S DRAGON    1

Data Summary

look into the data types and some ready statistic values

str(mydata)
## 'data.frame':    36 obs. of  4 variables:
##  $ Participantid  : int  1 2 3 4 5 6 1 2 3 4 ...
##  $ ParticipantName: chr  "Zahara" "Mohamad" "Zeinab" "Hadee" ...
##  $ movieTitle     : chr  "DON'T BREATHE" "DON'T BREATHE" "DON'T BREATHE" "DON'T BREATHE" ...
##  $ rate           : int  3 5 4 5 5 5 4 4 5 4 ...
summary(mydata)
##  Participantid ParticipantName     movieTitle             rate     
##  Min.   :1.0   Length:36          Length:36          Min.   :1.00  
##  1st Qu.:2.0   Class :character   Class :character   1st Qu.:2.75  
##  Median :3.5   Mode  :character   Mode  :character   Median :4.00  
##  Mean   :3.5                                         Mean   :3.50  
##  3rd Qu.:5.0                                         3rd Qu.:4.25  
##  Max.   :6.0                                         Max.   :5.00

Plot Analysis

Plot the movies verses rate to visually analyze any relationship or observe the results using the boxplot and add statistic summary to every movie boxplot

means <- aggregate(rate~movieTitle,data=mydata, FUN =mean)
medians <- aggregate(rate~movieTitle,data=mydata, FUN=median)

p <- ggplot(mydata, aes(factor(movieTitle), rate))
p + geom_boxplot(aes(fill = movieTitle))+
  stat_summary(fun.y=mean, colour="darkred", 
               geom="point", shape=18, size=3)+
                geom_text(data = means, aes(label =factor(movieTitle), 
                                            y = rate + 0.08))

the_means <-means[order(means$rate,decreasing = TRUE),]
the_medians <-medians[order(medians$rate,decreasing = TRUE),]

based on the mean values the following is the table present the rating from highets to the lowest

the_means
##                 movieTitle     rate
## 1            DON'T BREATHE 4.500000
## 6            SUICIDE SQUAD 4.333333
## 2 KUBO AND THE TWO STRINGS 3.500000
## 5            SAUSAGE PARTY 3.333333
## 3   MECHANIC: RESURRECTION 3.166667
## 4            PETE'S DRAGON 2.166667

based on the median values the follwing table present the rating from the heighest to the lowest

the_medians
##                 movieTitle rate
## 1            DON'T BREATHE  5.0
## 3   MECHANIC: RESURRECTION  4.0
## 6            SUICIDE SQUAD  4.0
## 5            SAUSAGE PARTY  3.5
## 2 KUBO AND THE TWO STRINGS  3.0
## 4            PETE'S DRAGON  2.0

Conclusion

In conclusion, the plot shows that people have a diverse opinion regarding the Sausage party and Mechanic Resurrection illustrated by the large interquartile value. While other where having fairly small consistent interquartile region which indicates that people tend to have similar opinion about these movies disregard of their rating. Based on the median the Don’t Breathe get the highest rating followed by Mechanic Resurrection and suicide squad with 4.0 rating. Sausage Party get 3.5 rating little higher the Kubo and the Two Strings of 3.0 points. Pete,s Dragon is the last on the list with 2.0 point rating.

                                                      ----