## Warning: package 'dplyr' was built under R version 4.4.1
## Warning: package 'openintro' was built under R version 4.4.1
library(rstudioapi)
username<-showPrompt(title='username', message='Enter username:', default="")
pw<-askForPassword(prompt="Enter password:")
###Forming DB connection
## Warning: package 'RMySQL' was built under R version 4.4.1
## Loading required package: DBI
mydb <- dbConnect(MySQL(), user=username, password=pw, dbname=username, host="cunydata607sql.mysql.database.azure.com")
dbListTables(mydb)
## [1] "airlines" "airports" "flights" "movie" "movies" "planes" "weather"
## Movies reviewer 1 reviewer 2 reviewer 3 reviewer 4 reviewer 5
## 1 Interstellar 5 2 2 5 5
## 2 Shutter Island 1 4 1 3 1
## 3 Spirited Away 2 1 2 5 3
## 4 The Minions 2 5 2 1 4
## 5 Toy Story 2 2 4 5 4
## 6 Your Name 5 2 4 5 4
## reviewer 6
## 1 4
## 2 4
## 3 5
## 4 2
## 5 2
## 6 4
## Movies reviewer 1 reviewer 2 reviewer 3 reviewer 4 reviewer 5
## 1 Interstellar 5 2 2 5 5
## 2 Shutter Island 1 4 1 3 1
## 3 Spirited Away 2 1 2 5 3
## 4 The Minions 2 5 2 1 4
## 5 Toy Story 2 2 4 5 4
## 6 Your Name 5 2 4 5 4
## reviewer 6 averageScore
## 1 4 3
## 2 4 2
## 3 5 3
## 4 2 2
## 5 2 3
## 6 4 4
library(ggplot2)
ggplot(df, aes(x = Movies, y = averageScore, color=Movies, fill=Movies, legend=FALSE)) +
geom_bar(stat='identity') +
labs(title = 'Average ratings from 6 reviews of the 6 popular movies', x='Movie names')+
theme(legend.position = 'none')
There are few approaches that i would take depending on the nature of the miss values. First, the proportion of the missing data is more than 50% of the columns total data, i might consider excluding the column from my analysis. Huge amount of missing data sometimes can be more misleading than providing meaningful insights Second, if the missing data is small (>20%) and is numerical, i will consider to use simple imputation or mean imputation to fill in the missing data with the column values’ mean. If the missing data are categorical, it will require addition analysis or even predictive model to try filling in the missing value