Choose six recent popular movies.
Movies: Mean Girls, Wonka, Aquaman And The Last Kingdom, The Beekeeper, Night Swim and Migration
Person1 ratings:5,3,5,3,4,4 Person2 ratings:3,3,4,4,4,3 Person3 ratings:4,3,4,3,3,3 Person4 ratings:3,4,4,3,3,4 Person5 ratings:3,3,3,3,2,3
CREATE TABLE movies ( popular_movie VARCHAR(200) NULL, person1_rating INT NULL, person2_rating INT NULL, person3_rating INT NULL, person4_rating INT NULL, person5_rating INT NULL);
INSERT INTO movies(popular_movie, person1_rating, person2_rating, person3_rating, person4_rating, person5_rating)
VALUES(‘Mean girls’, 5,3,4,3,3), (‘Wonka’, 3,3,3,4,3), (‘Aquaman and the last kingdom’, 5,4,4,4,3), (‘The beekeeper’, 3,4,3,3,3), (‘Night swim’, 4,4,3,3,2), (‘Migration’,4,3,3,4,3);
select * from movies;
library(“RMySQL”)
mydb<-dbConnect(MySQL(),user=‘anna.moy40’,password=‘XXXX’,dbname=‘anna.moy40’,host =‘cunydata607sql.mysql.database.azure.com’)
dbListTables(mydb)
My approach to missing data is to remove the missing data in the file to perform calculations such as mean, and median. I decided to take this approach because it allows you to continue to perform your calculation if you remove the missing data. If missing data is in your data it may cause errors and not perform the calculation you want.