1. Reading in the Data.
2. Creating a User Item Matrix and a blank dataframe for testing.
3. Splitting the data into Test and Train Matrixes.
4. Calculating the raw average (mean) rating for every user-item combination in my Train data.
5. Using training data, let’s calculate the bias for each user and each item.
6. Calculating the baseline predictors for every user-item combination.
7. Calculating the RMSE for the baseline predictors for both training data and test data.
Summary
Predictions

This system will recommend Fantasy Books to readers. Book recommendations are difficult because of the large amount of available books, so the system will focus on Fantasy books specifically. The dataset I am using is a .csv file and was build from survey results I asked a few of my friend to fill out. There are 5 users and 9 books and I want to acknowledge the fact that the people all come from the same social group so their results cannot be clasified as random.

Libraries:

library(kableExtra)

1. Reading in the Data.

BRData <- read.csv(file="https://raw.githubusercontent.com/che10vek/Data612/master/FantasyBookRatings.csv", header=TRUE, sep=",")
BRData <- BRData[1:5,]
BRData %>% kable(caption = "Fantasy Book Ratings") %>% kable_styling("striped", full_width = TRUE)

Fantasy Book Ratings
Readers	Wizards_First_Rule	Harry_Potter	Twilight	Hitchhikers_Guide_to_the_Galaxy	Jonathan_Strange_and_MR_Norrell	Master_and_Margarita	Lord_of_the._Rings	Song_of_Ice_and_Fire	Hunger_Games
Lina	5	4	5	3	NA	4	NA	NA	3
Lilya	4	5	5	4	5	3	NA	4	3
Mariya	4	NA	5	NA	2	5	2	NA	3
Yuliya	NA	5	NA	4	2	5	3	1	3
Irene	4	5	4	4	3	4	NA	5	3

2. Creating a User Item Matrix and a blank dataframe for testing.

BRDataMatrix <- BRData
BRDataTrain <- BRDataMatrix
BRDataTest<-data.frame()[6:10, ]

3. Splitting the data into Test and Train Matrixes.

The Train matrix was created by randomly selecting a position in a matrix and replacing it with “NA” and the Test data set was created by putting those missing values into a new matrix:

set.seed(3)
n <- sample(1:5,ncol(BRDataMatrix),replace=T)
x <- c(1:ncol(BRDataMatrix)-1)
for (i in x){
BRDataTrain[n[i],(i+1)]<-NA
BRDataTest[n[i],(i+1)]<-BRDataMatrix[n[i],(i+1)]
}
BRDataTest <- BRDataTest[1:5,2:10]
BRDataTrain <- BRDataTrain[1:5,2:10]

BRDataTrain %>% kable(caption = "Train Data Set") %>% kable_styling("striped", full_width = TRUE)

Train Data Set
Wizards_First_Rule	Harry_Potter	Twilight	Hitchhikers_Guide_to_the_Galaxy	Jonathan_Strange_and_MR_Norrell	Master_and_Margarita	Lord_of_the._Rings	Song_of_Ice_and_Fire	Hunger_Games
NA	4	5	3	NA	4	NA	NA	3
4	5	NA	NA	5	3	NA	NA	3
4	NA	5	NA	2	5	2	NA	NA
NA	5	NA	4	NA	NA	3	1	3
4	NA	4	4	3	4	NA	5	3

BRDataTest %>% kable(caption = "Test Data Set") %>% kable_styling("striped", full_width = TRUE)

Test Data Set
	V2	V3	V4	V5	V6	V7	V8	V9	V10
NA	5	NA	NA	NA	NA	NA	NA	NA	NA
NA.1	NA	NA	5	4	NA	NA	NA	4	NA
NA.2	NA	NA	NA	NA	NA	NA	NA	NA	3
NA.3	NA	NA	NA	NA	2	5	NA	NA	NA
NA.4	NA	5	NA	NA	NA	NA	NA	NA	NA

4. Calculating the raw average (mean) rating for every user-item combination in my Train data.

#Convert data to numeric
BRDataTrain<-sapply(BRDataTrain, as.numeric)
BRDataTest<-sapply(BRDataTest, as.numeric)

RawAverage<-mean(BRDataTrain, na.rm=TRUE)

#Calculating RMSE of the Train set
errortrain <- RawAverage-BRDataTrain
RMSETrain <- sqrt(mean((errortrain^2), na.rm=TRUE))
round(RMSETrain,2)

## [1] 1.05

#Calculating RMSE of the Test set
errortest <- RawAverage-BRDataTest
RMSETest <- sqrt(mean((errortest^2), na.rm=TRUE))
round(RMSETest,2)

## [1] 1.13

5. Using training data, let’s calculate the bias for each user and each item.

UserBias <- round(((rowMeans(BRDataTrain, na.rm=TRUE))-RawAverage),3)
y<-cbind(BRData,UserBias)
y <- y[-(2:10)]
y %>% kable(caption = "User Bias Calculations") %>% kable_styling("striped", full_width = TRUE)

User Bias Calculations
Readers	UserBias
Lina	0.096
Lilya	0.296
Mariya	-0.104
Yuliya	-0.504
Irene	0.153

BookBias <- round(((colMeans(BRDataTrain, na.rm=TRUE))-RawAverage),3)
BookBias %>% kable(caption = "Book Bias Calculations") %>% kable_styling("striped", full_width = TRUE)

Book Bias Calculations
	x
Wizards_First_Rule	0.296
Harry_Potter	0.963
Twilight	0.963
Hitchhikers_Guide_to_the_Galaxy	-0.037
Jonathan_Strange_and_MR_Norrell	-0.370
Master_and_Margarita	0.296
Lord_of_the._Rings	-1.204
Song_of_Ice_and_Fire	-0.704
Hunger_Games	-0.704

6. Calculating the baseline predictors for every user-item combination.

#Duplicate user bias to populate a 5x9 matrix 
y<-t(BookBias)
y<-rbind(y,y,y,y,y)
#Duplicate book bias to populate a 5x9 matrix 
z<-cbind(UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias,UserBias)
#Sum both bias matrixes with raw average to calculate Baseline Predictor
BRBaseLinePred=round((z+y+RawAverage),2)

#Adding Column Names
BookNames <- c("Wizard's First Rule", "Harry Potter", "Twilight", "Hitchhiker's Guide to the Galaxy", "Jonathan Strange", "Master and Margarita", "Lord of the Rings", "Song of Ice and Fire", "Hunger Games")
colnames(BRBaseLinePred) <- BookNames

BRBaseLinePred %>% kable(caption = "Baseline Predictor Calculations") %>% kable_styling("striped", full_width = TRUE)

Baseline Predictor Calculations
Wizard’s First Rule	Harry Potter	Twilight	Hitchhiker’s Guide to the Galaxy	Jonathan Strange	Master and Margarita	Lord of the Rings	Song of Ice and Fire	Hunger Games
4.10	4.76	4.76	3.76	3.43	4.10	2.60	3.10	3.10
4.30	4.96	4.96	3.96	3.63	4.30	2.80	3.30	3.30
3.90	4.56	4.56	3.56	3.23	3.90	2.40	2.90	2.90
3.50	4.16	4.16	3.16	2.83	3.50	2.00	2.50	2.50
4.15	4.82	4.82	3.82	3.49	4.15	2.65	3.15	3.15

7. Calculating the RMSE for the baseline predictors for both training data and test data.

#Calculating RMSE of the Train set
errortrain <- BRBaseLinePred-BRDataTrain
RMSETrain <- sqrt(mean((errortrain^2), na.rm=TRUE))
round(RMSETrain,2)

## [1] 0.8

#Calculating RMSE of the Test set
errortest <- BRBaseLinePred-BRDataTest
RMSETest <- sqrt(mean((errortest^2), na.rm=TRUE))
round(RMSETest,2)

## [1] 0.73

Summary

It is evident from the results above that Root Mean Squeare Error (RMSE) is significantly lower when we use Baseline Predictors rather than Raw Averages so that is a more accurate method of predicting user ratings for Fantasy Book list.

Predictions

#Adding User Names to Results
Users <- c("Lina", "Lilya", "Mariya", "Yulia", "Irene")
BRBaseLinePred<-cbind(Users,BRBaseLinePred)

BRBaseLinePred %>% kable(caption = "Predictions") %>% kable_styling("striped", full_width = TRUE)

Predictions
Users	Wizard’s First Rule	Harry Potter	Twilight	Hitchhiker’s Guide to the Galaxy	Jonathan Strange	Master and Margarita	Lord of the Rings	Song of Ice and Fire	Hunger Games
Lina	4.1	4.76	4.76	3.76	3.43	4.1	2.6	3.1	3.1
Lilya	4.3	4.96	4.96	3.96	3.63	4.3	2.8	3.3	3.3
Mariya	3.9	4.56	4.56	3.56	3.23	3.9	2.4	2.9	2.9
Yulia	3.5	4.16	4.16	3.16	2.83	3.5	2	2.5	2.5
Irene	4.15	4.82	4.82	3.82	3.49	4.15	2.65	3.15	3.15

Wizards_First_Rule	Harry_Potter	Twilight	Hitchhikers_Guide_to_the_Galaxy	Jonathan_Strange_and_MR_Norrell	Master_and_Margarita	Lord_of_the._Rings	Song_of_Ice_and_Fire	Hunger_Games
NA	4	5	3	NA	4	NA	NA	3
4	5	NA	NA	5	3	NA	NA	3
4	NA	5	NA	2	5	2	NA	NA
NA	5	NA	4	NA	NA	3	1	3
4	NA	4	4	3	4	NA	5	3

Wizards_First_Rule	Harry_Potter	Twilight	Hitchhikers_Guide_to_the_Galaxy	Jonathan_Strange_and_MR_Norrell	Master_and_Margarita	Lord_of_the._Rings	Song_of_Ice_and_Fire	Hunger_Games
NA	4	5	3	NA	4	NA	NA	3
4	5	NA	NA	5	3	NA	NA	3
4	NA	5	NA	2	5	2	NA	NA
NA	5	NA	4	NA	NA	3	1	3
4	NA	4	4	3	4	NA	5	3

Elina Azrilyan - Data 612 - Project 1