This system will recommend Fantasy Books to readers. Book recommendations are difficult because of the large amount of available books, so the system will focus on Fantasy books specifically. The dataset I am using is a .csv file and was build from survey results I asked a few of my friend to fill out. There are 5 users and 9 books and I want to acknowledge the fact that the people all come from the same social group so their results cannot be clasified as random.
Libraries:
library(kableExtra)
1. Reading in the Data.
BRData <- read.csv(file="https://raw.githubusercontent.com/che10vek/Data612/master/FantasyBookRatings.csv", header=TRUE, sep=",")
BRData <- BRData[1:5,]
BRData %>% kable(caption = "Fantasy Book Ratings") %>% kable_styling("striped", full_width = TRUE)
Fantasy Book Ratings
Readers
|
Wizards_First_Rule
|
Harry_Potter
|
Twilight
|
Hitchhikers_Guide_to_the_Galaxy
|
Jonathan_Strange_and_MR_Norrell
|
Master_and_Margarita
|
Lord_of_the._Rings
|
Song_of_Ice_and_Fire
|
Hunger_Games
|
Lina
|
5
|
4
|
5
|
3
|
NA
|
4
|
NA
|
NA
|
3
|
Lilya
|
4
|
5
|
5
|
4
|
5
|
3
|
NA
|
4
|
3
|
Mariya
|
4
|
NA
|
5
|
NA
|
2
|
5
|
2
|
NA
|
3
|
Yuliya
|
NA
|
5
|
NA
|
4
|
2
|
5
|
3
|
1
|
3
|
Irene
|
4
|
5
|
4
|
4
|
3
|
4
|
NA
|
5
|
3
|
2. Creating a User Item Matrix and a blank dataframe for testing.
BRDataMatrix <- BRData
BRDataTrain <- BRDataMatrix
BRDataTest<-data.frame()[6:10, ]
3. Splitting the data into Test and Train Matrixes.
The Train matrix was created by randomly selecting a position in a matrix and replacing it with “NA” and the Test data set was created by putting those missing values into a new matrix:
set.seed(3)
n <- sample(1:5,ncol(BRDataMatrix),replace=T)
x <- c(1:ncol(BRDataMatrix)-1)
for (i in x){
BRDataTrain[n[i],(i+1)]<-NA
BRDataTest[n[i],(i+1)]<-BRDataMatrix[n[i],(i+1)]
}
BRDataTest <- BRDataTest[1:5,2:10]
BRDataTrain <- BRDataTrain[1:5,2:10]
BRDataTrain %>% kable(caption = "Train Data Set") %>% kable_styling("striped", full_width = TRUE)
Train Data Set
Wizards_First_Rule
|
Harry_Potter
|
Twilight
|
Hitchhikers_Guide_to_the_Galaxy
|
Jonathan_Strange_and_MR_Norrell
|
Master_and_Margarita
|
Lord_of_the._Rings
|
Song_of_Ice_and_Fire
|
Hunger_Games
|
NA
|
4
|
5
|
3
|
NA
|
4
|
NA
|
NA
|
3
|
4
|
5
|
NA
|
NA
|
5
|
3
|
NA
|
NA
|
3
|
4
|
NA
|
5
|
NA
|
2
|
5
|
2
|
NA
|
NA
|
NA
|
5
|
NA
|
4
|
NA
|
NA
|
3
|
1
|
3
|
4
|
NA
|
4
|
4
|
3
|
4
|
NA
|
5
|
3
|
BRDataTest %>% kable(caption = "Test Data Set") %>% kable_styling("striped", full_width = TRUE)
Test Data Set
|
V2
|
V3
|
V4
|
V5
|
V6
|
V7
|
V8
|
V9
|
V10
|
NA
|
5
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA.1
|
NA
|
NA
|
5
|
4
|
NA
|
NA
|
NA
|
4
|
NA
|
NA.2
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
3
|
NA.3
|
NA
|
NA
|
NA
|
NA
|
2
|
5
|
NA
|
NA
|
NA
|
NA.4
|
NA
|
5
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
NA
|
4. Calculating the raw average (mean) rating for every user-item combination in my Train data.
#Convert data to numeric
BRDataTrain<-sapply(BRDataTrain, as.numeric)
BRDataTest<-sapply(BRDataTest, as.numeric)
RawAverage<-mean(BRDataTrain, na.rm=TRUE)
#Calculating RMSE of the Train set
errortrain <- RawAverage-BRDataTrain
RMSETrain <- sqrt(mean((errortrain^2), na.rm=TRUE))
round(RMSETrain,2)
## [1] 1.05
#Calculating RMSE of the Test set
errortest <- RawAverage-BRDataTest
RMSETest <- sqrt(mean((errortest^2), na.rm=TRUE))
round(RMSETest,2)
## [1] 1.13
5. Using training data, let’s calculate the bias for each user and each item.
UserBias <- round(((rowMeans(BRDataTrain, na.rm=TRUE))-RawAverage),3)
y<-cbind(BRData,UserBias)
y <- y[-(2:10)]
y %>% kable(caption = "User Bias Calculations") %>% kable_styling("striped", full_width = TRUE)
User Bias Calculations
Readers
|
UserBias
|
Lina
|
0.096
|
Lilya
|
0.296
|
Mariya
|
-0.104
|
Yuliya
|
-0.504
|
Irene
|
0.153
|
BookBias <- round(((colMeans(BRDataTrain, na.rm=TRUE))-RawAverage),3)
BookBias %>% kable(caption = "Book Bias Calculations") %>% kable_styling("striped", full_width = TRUE)
Book Bias Calculations
|
x
|
Wizards_First_Rule
|
0.296
|
Harry_Potter
|
0.963
|
Twilight
|
0.963
|
Hitchhikers_Guide_to_the_Galaxy
|
-0.037
|
Jonathan_Strange_and_MR_Norrell
|
-0.370
|
Master_and_Margarita
|
0.296
|
Lord_of_the._Rings
|
-1.204
|
Song_of_Ice_and_Fire
|
-0.704
|
Hunger_Games
|
-0.704
|
6. Calculating the baseline predictors for every user-item combination.
#Duplicate user bias to populate a 5x9 matrix
y<-t(BookBias)
y<-rbind(y,y,y,y,y)
#Duplicate book bias to populate a 5x9 matrix
z<-cbind(UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias)
#Sum both bias matrixes with raw average to calculate Baseline Predictor
BRBaseLinePred=round((z+y+RawAverage),2)
#Adding Column Names
BookNames <- c("Wizard's First Rule", "Harry Potter", "Twilight", "Hitchhiker's Guide to the Galaxy", "Jonathan Strange", "Master and Margarita", "Lord of the Rings", "Song of Ice and Fire", "Hunger Games")
colnames(BRBaseLinePred) <- BookNames
BRBaseLinePred %>% kable(caption = "Baseline Predictor Calculations") %>% kable_styling("striped", full_width = TRUE)
Baseline Predictor Calculations
Wizard’s First Rule
|
Harry Potter
|
Twilight
|
Hitchhiker’s Guide to the Galaxy
|
Jonathan Strange
|
Master and Margarita
|
Lord of the Rings
|
Song of Ice and Fire
|
Hunger Games
|
4.10
|
4.76
|
4.76
|
3.76
|
3.43
|
4.10
|
2.60
|
3.10
|
3.10
|
4.30
|
4.96
|
4.96
|
3.96
|
3.63
|
4.30
|
2.80
|
3.30
|
3.30
|
3.90
|
4.56
|
4.56
|
3.56
|
3.23
|
3.90
|
2.40
|
2.90
|
2.90
|
3.50
|
4.16
|
4.16
|
3.16
|
2.83
|
3.50
|
2.00
|
2.50
|
2.50
|
4.15
|
4.82
|
4.82
|
3.82
|
3.49
|
4.15
|
2.65
|
3.15
|
3.15
|
7. Calculating the RMSE for the baseline predictors for both training data and test data.
#Calculating RMSE of the Train set
errortrain <- BRBaseLinePred-BRDataTrain
RMSETrain <- sqrt(mean((errortrain^2), na.rm=TRUE))
round(RMSETrain,2)
## [1] 0.8
#Calculating RMSE of the Test set
errortest <- BRBaseLinePred-BRDataTest
RMSETest <- sqrt(mean((errortest^2), na.rm=TRUE))
round(RMSETest,2)
## [1] 0.73
Summary
It is evident from the results above that Root Mean Squeare Error (RMSE) is significantly lower when we use Baseline Predictors rather than Raw Averages so that is a more accurate method of predicting user ratings for Fantasy Book list.
Predictions
#Adding User Names to Results
Users <- c("Lina", "Lilya", "Mariya", "Yulia", "Irene")
BRBaseLinePred<-cbind(Users,BRBaseLinePred)
BRBaseLinePred %>% kable(caption = "Predictions") %>% kable_styling("striped", full_width = TRUE)
Predictions
Users
|
Wizard’s First Rule
|
Harry Potter
|
Twilight
|
Hitchhiker’s Guide to the Galaxy
|
Jonathan Strange
|
Master and Margarita
|
Lord of the Rings
|
Song of Ice and Fire
|
Hunger Games
|
Lina
|
4.1
|
4.76
|
4.76
|
3.76
|
3.43
|
4.1
|
2.6
|
3.1
|
3.1
|
Lilya
|
4.3
|
4.96
|
4.96
|
3.96
|
3.63
|
4.3
|
2.8
|
3.3
|
3.3
|
Mariya
|
3.9
|
4.56
|
4.56
|
3.56
|
3.23
|
3.9
|
2.4
|
2.9
|
2.9
|
Yulia
|
3.5
|
4.16
|
4.16
|
3.16
|
2.83
|
3.5
|
2
|
2.5
|
2.5
|
Irene
|
4.15
|
4.82
|
4.82
|
3.82
|
3.49
|
4.15
|
2.65
|
3.15
|
3.15
|