Load Required Libraries or packages for Code
library(RODBC)
## Warning: package 'RODBC' was built under R version 3.3.2
require(ggplot2)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.2
Load the SQL table into an R Dataframe To Set up SQL Server Database ODBC connection Follow the below steps as prescribed in https://technet.microsoft.com/en-us/library/cc879308(v=sql.105).aspx
To run the SQL Server DSN Configuration Wizard Open the ODBC Data Source Administrator as described in the topic, How to: Open the ODBC Data Source Administrator. On the User DSN tab or the System DSN tab, click Add to add a user data source or a system data source. Click SQL Native Client or SQL Server Native Client 10.0 to select the driver, and then click Finish. Follow the instructions in the Wizard.
SQL_Server_connection <- odbcConnect("SQL_MEZUE-PC")
# SELECT * FROM Movie Reviews table into a Data Frame
# Check that connection is working
odbcGetInfo(SQL_Server_connection)
## DBMS_Name DBMS_Ver Driver_ODBC_Ver
## "Microsoft SQL Server" "12.00.2569" "03.80"
## Data_Source_Name Driver_Name Driver_Ver
## "SQL_MEZUE-PC" "msodbcsql11.dll" "12.00.2000"
## ODBC_Ver Server_Name
## "03.80.0000" "MEZUE-PC"
# Query the database and put the results into the data frame "dataframe"
Movie_Review_Df <- sqlQuery(SQL_Server_connection, "
SELECT *
FROM
MyDb.[dbo].[Movie_Reviews_Stage]")
Movie_Review_Df
## ID Movie_Title Reviewer_Name Ratings
## 1 1 Rogue One: A Star Wars Story Bryan Obieyisi 3
## 2 2 Passengers Charles Blake 5
## 3 3 Hidden Figures Christoper Izu 1
## 4 4 Sully Mbata Kurt 5
## 5 5 Moonlight Zara Suet 4
## 6 6 La La Land Stuart Chinza 4
## 7 7 Rogue One: A Star Wars Story Harry Trump 2
## 8 8 Passengers Jack Daniel 5
## 9 9 Hidden Figures Lisa Conway 3
## 10 10 Sully Jeff Desi 1
## 11 11 Moonlight Bryan Obieyisi 5
## 12 12 La La Land Champli Klurt 4
## 13 13 Rogue One: A Star Wars Story Christoper Izu 4
## 14 14 Passengers Xavi Hendo 5
## 15 15 Hidden Figures Leo Messi 2
## 16 16 Sully Ronaldo Cristiano 3
## 17 17 Moonlight Felipe Luiz 3
## 18 18 La La Land Honda Terry 5
## 19 19 Rogue One: A Star Wars Story Mathew Johnny 1
## 20 20 Passengers Elvis Duke 5
## 21 21 Hidden Figures Prince Jorge 4
## 22 22 Sully Leo Messi 4
## 23 23 Moonlight Luis Suarez 2
## 24 24 La La Land Mark Zuckberg 5
## 25 25 Rogue One: A Star Wars Story Elin Page 3
## 26 26 Passengers Andrew Cooper 1
## 27 27 Hidden Figures Chidera Obieyisi 5
## 28 28 Sully Machina Conjukita 4
## 29 29 Moonlight Felix Neptune 4
## 30 30 La La Land Davinci Raneri 5
Normalized version of the data frame is below
Reviewer_df <- sqlQuery(SQL_Server_connection, "
SELECT *
FROM
MyDb.[dbo].[Reviewer]")
Reviewer_df
## Reviewer_ID Reviewer_Name
## 1 1 Andrew Cooper
## 2 2 Bryan Obieyisi
## 3 3 Champli Klurt
## 4 4 Charles Blake
## 5 5 Chidera Obieyisi
## 6 6 Christoper Izu
## 7 7 Davinci Raneri
## 8 8 Elin Page
## 9 9 Elvis Duke
## 10 10 Felipe Luiz
## 11 11 Felix Neptune
## 12 12 Harry Trump
## 13 13 Honda Terry
## 14 14 Jack Daniel
## 15 15 Jeff Desi
## 16 16 Leo Messi
## 17 17 Lisa Conway
## 18 18 Luis Suarez
## 19 19 Machina Conjukita
## 20 20 Mark Zuckberg
## 21 21 Mathew Johnny
## 22 22 Mbata Kurt
## 23 23 Prince Jorge
## 24 24 Ronaldo Cristiano
## 25 25 Stuart Chinza
## 26 26 Xavi Hendo
## 27 27 Zara Suet
Movies_Title_df <- sqlQuery(SQL_Server_connection, "
SELECT *
FROM
MyDb.[dbo].[Movies]")
Movies_Title_df
## Movie_ID Movie_Title
## 1 1 Hidden Figures
## 2 2 La La Land
## 3 3 Moonlight
## 4 4 Passengers
## 5 5 Rogue One: A Star Wars Story
## 6 6 Sully
Movie_Review_Xref_df <- sqlQuery(SQL_Server_connection, "
SELECT *
FROM
MyDb.[dbo].[Movie_Review_Xref]")
Movie_Review_Xref_df
## ID Reviewer_ID Movie_ID Ratings
## 1 1 1 4 1
## 2 2 2 5 3
## 3 3 2 3 5
## 4 4 3 2 4
## 5 5 4 4 5
## 6 6 5 1 5
## 7 7 6 1 1
## 8 8 6 5 4
## 9 9 7 2 5
## 10 10 8 5 3
## 11 11 9 4 5
## 12 12 10 3 3
## 13 13 11 3 4
## 14 14 12 5 2
## 15 15 13 2 5
## 16 16 14 4 5
## 17 17 15 6 1
## 18 18 16 1 2
## 19 19 16 6 4
## 20 20 17 1 3
## 21 21 18 3 2
## 22 22 19 6 4
## 23 23 20 2 5
## 24 24 21 5 1
## 25 25 22 6 5
## 26 26 23 1 4
## 27 27 24 6 3
## 28 28 25 2 4
## 29 29 26 4 5
## 30 30 27 3 4
Perform few data analysis on the Average Ratings from Reviews
hist(Movie_Review_Df$Ratings,main ="Movie_Review", xlab = "Ratings")
Perform few data analysis on the Average Ratings from Reviews
ggplot(data = Movie_Review_Df) +geom_histogram(aes(x=Ratings))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Display box plot of the Ratings per Title
boxplot(Ratings~ Movie_Title, data = Movie_Review_Df)
Plot points of Movie Ratings
p1 <- ggplot(Movie_Review_Df, aes(x = Movie_Title, y = Ratings))
p1 + geom_point(color="red")
Violin Plot of the Average Ratings per movie
ggplot(Movie_Review_Df, aes(x=Movie_Title,y=Ratings)) + geom_violin() + stat_summary(fun.y=mean, geom="point", shape=23, size=2)