#library(tidyverse)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(DBI)
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 4.3.2
lamadb <- dbConnect(RMySQL::MySQL(),
user = "chhiring.lama65",
password = password,
dbname = "chhiring.lama65",
host = "cunydata607sql.mysql.database.azure.com")
dbListTables(lamadb)
## [1] "movie_ratings"
table_name <- "movie_ratings"
column_names <- dbListFields(lamadb, table_name)
query <- sprintf("SELECT * FROM %s", table_name)
table_data <- dbGetQuery(lamadb, query)
I have chosen to make null for one of Paricipants’ values. If it was an integer value, one of the approaches to handle it would be taking mean value from the specific field and replace the null value with it. The impact of a null value in the Participants field depends on the specific context and the importance of that field in the analysis. If the Participants field is not critical for the analysis or does not significantly affect the interpretation of the data, we might choose to leave the null value as it is. However, if the Participants field is relevant and its absence could lead to incomplete or inaccurate insights, imputing the missing value using relevant information ensures a more comprehensive dataset for analysis. Ultimately, the decision should align with the goals and requirements of our analysis.
table_data
## Movie_ID Movies Participants Rating Rating_Score
## 1 111 The Creator Janesh Kaur Awful 1
## 2 34 The Holdovers Nancy Jain Not bad 3
## 3 65 The Marvels Jaspreet Devi Boring 2
## 4 89 Wonka David Goldsmith Great 4
## 5 105 Society of the Snow Super 5
## 6 124 Napoleon Putar Junee Boring 2