library(RMySQL)
## Loading required package: DBI
con <- dbConnect(
MySQL(),
user = "yourlagnevargas87",
password = "Bella520!",
dbname = "data607sql",
host = "data607sql.mysql.database.azure.com"
)
data <- dbGetQuery(con, "SELECT * FROM movie_ratings")
dbDisconnect(con)
## [1] TRUE
print(data)
## movie Bella Anabelle Leila Nipsey Kaiden
## 1 Toy Story 4 3 5 4 4
## 2 Titanic 5 4 3 5 3
## 3 Forest Gump 3 4 5 2 4
## 4 Scarface 4 2 4 3 5
## 5 Goodfellas 5 4 3 4 5
## 6 Lion King 3 5 4 4 3
if (is.data.frame(data) | is.matrix(data)) {
numeric_columns <- sapply(data, is.numeric)
if (any(numeric_columns)) {
avg_ratings <- colMeans(data[, numeric_columns], na.rm = TRUE)
data_imputed <- data
data_imputed[, numeric_columns][is.na(data_imputed[, numeric_columns])] <- avg_ratings[is.na(data_imputed[, numeric_columns])]
} else {
}
}
if (any(is.na(data_imputed))) {
print(data_imputed)
} else {
print("No missing values found in the data frame.")
}
## [1] "No missing values found in the data frame."
Bonus Challenge Questions:
1. Using survey software to gather information can make the data
collection process more organized and efficient. Platforms like Google
Forms can be used to create surveys and collect responses
automatically.
2. To share code without revealing the password, you can use
environment variables or store the password in a separate configuration
file that is not shared publicly.
3. Exploring different approaches to data cleaning and analysis can
improve the quality and accuracy of the results. For example, using data
transformations, outlier detection, or advanced imputation methods can
enhance the data analysis process.
4. Creating a normalized set of tables that correspond to the
relationship between movie viewing friends and the movies being rated
can improve data organization and reduce data redundancy. This can be
achieved by dividing the data into separate tables for movies, friends,
and ratings with appropriate foreign key relationships.
5. Standardizing ratings can be beneficial in certain scenarios,
especially when comparing ratings across multiple movies or aggregating
ratings from different sources. This can be done by converting ratings
to a common scale, such as a range from 0 to 1 or using z-scores.
Implementing standardization in MySQL and R will depend on the specific
approach chosen for standardization.