The following is a data export brought into R through a csv file transfer from a SQL Select Statement that brought together a person and a movie preferences tables.
movie_data <- read.table('/Users/Michele/Desktop/Movie_Data.csv', header = TRUE, sep = ",")
is.na(movie_data) <- movie_data == "NULL"
movie_data
## ID Age Favorite_Genre Get_Out Interstellar Mad_Max_Fury_Road Her X13th
## 1 1 22 Sci-Fi 5 5 3 3 5
## 2 2 21 Documentary 5 3 5 <NA> 5
## 3 3 22 Anime 5 3 5 4 5
## 4 4 22 Drama 3 5 <NA> <NA> 3
## 5 5 34 Comedy 3 4 5 5 <NA>
## Ex_Machina
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 4
## 5 3
Age_Get_Out <- cor.test(movie_data$Age, movie_data$Get_Out, method = "pearson")
Age_Get_Out
##
## Pearson's product-moment correlation
##
## data: movie_data$Age and movie_data$Get_Out
## t = -1.4097, df = 3, p-value = 0.2534
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9721139 0.5665594
## sample estimates:
## cor
## -0.6312327
Age_Interstellar <- cor.test(movie_data$Age, movie_data$Interstellar, method = "pearson")
Age_Interstellar
##
## Pearson's product-moment correlation
##
## data: movie_data$Age and movie_data$Interstellar
## t = 0.078876, df = 3, p-value = 0.9421
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8717635 0.8919587
## sample estimates:
## cor
## 0.04549216
Person5 <- lm(X13th ~ Age + Get_Out, data=movie_data)
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
Person5
##
## Call:
## lm(formula = X13th ~ Age + Get_Out, data = movie_data)
##
## Coefficients:
## (Intercept) Age Get_Out
## -5.000e-01 -2.564e-16 5.000e-01
Person_5_13th_Rating = -.5 + (-.00000000000000256 * 34) + (.5 * 3)
Person_5_13th_Rating
## [1] 1
Using linear regression, person 5 would rate the documentary “13th” with a value of 1. Again, we take all of this analysis with caution, since there is not enough data in the dataset.