This system will recommend Fantasy Books to readers. Book recommendations are difficult because of the large amount of available books, so the system will focus on Fantasy books specifically. The dataset I am using is a .csv file and was build from survey results I asked a few of my friend to fill out. There are 5 users and 9 books and I want to acknowledge the fact that the people all come from the same social group so their results cannot be clasified as random.

Libraries:

library(kableExtra)

1. Reading in the Data.

BRData <- read.csv(file="https://raw.githubusercontent.com/che10vek/Data612/master/FantasyBookRatings.csv", header=TRUE, sep=",")
BRData <- BRData[1:5,]
BRData %>% kable(caption = "Fantasy Book Ratings") %>% kable_styling("striped", full_width = TRUE)
Fantasy Book Ratings
Readers Wizards_First_Rule Harry_Potter Twilight Hitchhikers_Guide_to_the_Galaxy Jonathan_Strange_and_MR_Norrell Master_and_Margarita Lord_of_the._Rings Song_of_Ice_and_Fire Hunger_Games
Lina 5 4 5 3 NA 4 NA NA 3
Lilya 4 5 5 4 5 3 NA 4 3
Mariya 4 NA 5 NA 2 5 2 NA 3
Yuliya NA 5 NA 4 2 5 3 1 3
Irene 4 5 4 4 3 4 NA 5 3

2. Creating a User Item Matrix and a blank dataframe for testing.

BRDataMatrix <- BRData
BRDataTrain <- BRDataMatrix
BRDataTest<-data.frame()[6:10, ]

3. Splitting the data into Test and Train Matrixes.

The Train matrix was created by randomly selecting a position in a matrix and replacing it with “NA” and the Test data set was created by putting those missing values into a new matrix:

set.seed(3)
n <- sample(1:5,ncol(BRDataMatrix),replace=T)
x <- c(1:ncol(BRDataMatrix)-1)
for (i in x){
BRDataTrain[n[i],(i+1)]<-NA
BRDataTest[n[i],(i+1)]<-BRDataMatrix[n[i],(i+1)]
}
BRDataTest <- BRDataTest[1:5,2:10]
BRDataTrain <- BRDataTrain[1:5,2:10]

BRDataTrain %>% kable(caption = "Train Data Set") %>% kable_styling("striped", full_width = TRUE)
Train Data Set
Wizards_First_Rule Harry_Potter Twilight Hitchhikers_Guide_to_the_Galaxy Jonathan_Strange_and_MR_Norrell Master_and_Margarita Lord_of_the._Rings Song_of_Ice_and_Fire Hunger_Games
NA 4 5 3 NA 4 NA NA 3
4 5 NA NA 5 3 NA NA 3
4 NA 5 NA 2 5 2 NA NA
NA 5 NA 4 NA NA 3 1 3
4 NA 4 4 3 4 NA 5 3
BRDataTest %>% kable(caption = "Test Data Set") %>% kable_styling("striped", full_width = TRUE)
Test Data Set
V2 V3 V4 V5 V6 V7 V8 V9 V10
NA 5 NA NA NA NA NA NA NA NA
NA.1 NA NA 5 4 NA NA NA 4 NA
NA.2 NA NA NA NA NA NA NA NA 3
NA.3 NA NA NA NA 2 5 NA NA NA
NA.4 NA 5 NA NA NA NA NA NA NA

4. Calculating the raw average (mean) rating for every user-item combination in my Train data.

#Convert data to numeric
BRDataTrain<-sapply(BRDataTrain, as.numeric)
BRDataTest<-sapply(BRDataTest, as.numeric)

RawAverage<-mean(BRDataTrain, na.rm=TRUE)

#Calculating RMSE of the Train set
errortrain <- RawAverage-BRDataTrain
RMSETrain <- sqrt(mean((errortrain^2), na.rm=TRUE))
round(RMSETrain,2)
## [1] 1.05
#Calculating RMSE of the Test set
errortest <- RawAverage-BRDataTest
RMSETest <- sqrt(mean((errortest^2), na.rm=TRUE))
round(RMSETest,2)
## [1] 1.13

5. Using training data, let’s calculate the bias for each user and each item.

UserBias <- round(((rowMeans(BRDataTrain, na.rm=TRUE))-RawAverage),3)
y<-cbind(BRData,UserBias)
y <- y[-(2:10)]
y %>% kable(caption = "User Bias Calculations") %>% kable_styling("striped", full_width = TRUE)
User Bias Calculations
Readers UserBias
Lina 0.096
Lilya 0.296
Mariya -0.104
Yuliya -0.504
Irene 0.153
BookBias <- round(((colMeans(BRDataTrain, na.rm=TRUE))-RawAverage),3)
BookBias %>% kable(caption = "Book Bias Calculations") %>% kable_styling("striped", full_width = TRUE)
Book Bias Calculations
x
Wizards_First_Rule 0.296
Harry_Potter 0.963
Twilight 0.963
Hitchhikers_Guide_to_the_Galaxy -0.037
Jonathan_Strange_and_MR_Norrell -0.370
Master_and_Margarita 0.296
Lord_of_the._Rings -1.204
Song_of_Ice_and_Fire -0.704
Hunger_Games -0.704

6. Calculating the baseline predictors for every user-item combination.

#Duplicate user bias to populate a 5x9 matrix 
y<-t(BookBias)
y<-rbind(y,y,y,y,y)
#Duplicate book bias to populate a 5x9 matrix 
z<-cbind(UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias)
#Sum both bias matrixes with raw average to calculate Baseline Predictor
BRBaseLinePred=round((z+y+RawAverage),2)

#Adding Column Names
BookNames <- c("Wizard's First Rule", "Harry Potter", "Twilight", "Hitchhiker's Guide to the Galaxy", "Jonathan Strange", "Master and Margarita", "Lord of the Rings", "Song of Ice and Fire", "Hunger Games")
colnames(BRBaseLinePred) <- BookNames

BRBaseLinePred %>% kable(caption = "Baseline Predictor Calculations") %>% kable_styling("striped", full_width = TRUE)
Baseline Predictor Calculations
Wizard’s First Rule Harry Potter Twilight Hitchhiker’s Guide to the Galaxy Jonathan Strange Master and Margarita Lord of the Rings Song of Ice and Fire Hunger Games
4.10 4.76 4.76 3.76 3.43 4.10 2.60 3.10 3.10
4.30 4.96 4.96 3.96 3.63 4.30 2.80 3.30 3.30
3.90 4.56 4.56 3.56 3.23 3.90 2.40 2.90 2.90
3.50 4.16 4.16 3.16 2.83 3.50 2.00 2.50 2.50
4.15 4.82 4.82 3.82 3.49 4.15 2.65 3.15 3.15

7. Calculating the RMSE for the baseline predictors for both training data and test data.

#Calculating RMSE of the Train set
errortrain <- BRBaseLinePred-BRDataTrain
RMSETrain <- sqrt(mean((errortrain^2), na.rm=TRUE))
round(RMSETrain,2)
## [1] 0.8
#Calculating RMSE of the Test set
errortest <- BRBaseLinePred-BRDataTest
RMSETest <- sqrt(mean((errortest^2), na.rm=TRUE))
round(RMSETest,2)
## [1] 0.73

Summary

It is evident from the results above that Root Mean Squeare Error (RMSE) is significantly lower when we use Baseline Predictors rather than Raw Averages so that is a more accurate method of predicting user ratings for Fantasy Book list.

Predictions

#Adding User Names to Results
Users <- c("Lina", "Lilya", "Mariya", "Yulia", "Irene")
BRBaseLinePred<-cbind(Users,BRBaseLinePred)

BRBaseLinePred %>% kable(caption = "Predictions") %>% kable_styling("striped", full_width = TRUE)
Predictions
Users Wizard’s First Rule Harry Potter Twilight Hitchhiker’s Guide to the Galaxy Jonathan Strange Master and Margarita Lord of the Rings Song of Ice and Fire Hunger Games
Lina 4.1 4.76 4.76 3.76 3.43 4.1 2.6 3.1 3.1
Lilya 4.3 4.96 4.96 3.96 3.63 4.3 2.8 3.3 3.3
Mariya 3.9 4.56 4.56 3.56 3.23 3.9 2.4 2.9 2.9
Yulia 3.5 4.16 4.16 3.16 2.83 3.5 2 2.5 2.5
Irene 4.15 4.82 4.82 3.82 3.49 4.15 2.65 3.15 3.15