To create a prediction for each user/president pair, we calculate a mean for all of the entries. We found out how nice/harsh a rater was by calculating the difference between user average and total average. We calculated the difference from the mean for each president. Our predicted rating was made up of the overall average, a user bias and a president average.
## Ronald Reagan Abraham Lincoln John Quincy Adams Gerald Ford John F Kennedy Franklin Pierce Herbert Hoover
## 1 NA 5 NA 3 NA NA NA
## 2 2 NA 3.0 3 2.0 NA NA
## 3 3 NA 1.5 NA 3.5 1 1
## 4 NA 5 NA NA NA NA NA
## 5 3 4 2.0 2 4.0 NA NA
## 6 4 5 NA 4 4.0 3 NA
## 7 NA 4 4.0 NA 5.0 3 4
## 8 3 5 1.0 2 NA NA 1
## 9 NA 5 2.0 2 2.0 1 NA
## 10 2 5 4.0 NA 3.0 NA NA
## 11 2 NA NA 3 3.0 NA NA
## 12 NA NA 4.0 3 3.0 2 NA
## 13 2 5 4.0 2 NA NA NA
## 14 3 5 3.0 NA NA NA 1
## 15 3 5 3.0 3 4.0 NA NA
## William McKinley Andrew Jackson Thomas Jefferson
## 1 NA 1 4
## 2 2 1 4
## 3 2 1 3
## 4 4 1 3
## 5 NA 1 5
## 6 NA 4 4
## 7 2 NA 4
## 8 3 NA 4
## 9 3 3 4
## 10 NA 1 5
## 11 2 3 NA
## 12 2 1 4
## 13 2 2 5
## 14 4 5 NA
## 15 2 2 5
The mean rating for the training set, by itsself, is
## [1] 3.030612
The root mean square error for the training set, by itsself, is
## [1] 1.61641
The mean for each column and row in the training set, by itsself, is
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
## columns 2.70 4.818182 2.863636 2.70 3.35 2 1.750000 2.545455 2.00 4.153846 0.0 0.000000 0.000000 0.0 0.000
## rows 3.25 2.428571 2.000000 3.25 3.00 4 3.714286 2.714286 2.75 3.333333 2.6 2.714286 3.142857 3.5 3.375
The bias for each column and row in the training set, by itsself, is
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## columns -0.3306122 1.7875696 -0.1669759 -0.3306122 0.31938776 -1.0306122 -1.2806122 -0.4851577 -1.0306122 1.1232339
## rows 0.2193878 -0.6020408 -1.0306122 0.2193878 -0.03061224 0.9693878 0.6836735 -0.3163265 -0.2806122 0.3027211
## [,11] [,12] [,13] [,14] [,15]
## columns NA NA NA NA NA
## rows -0.4306122 -0.3163265 0.1122449 0.4693878 0.3443878
The baseline predictors for each user and president is
## Ronald Reagan Abraham Lincoln John Quincy Adams Gerald Ford John F Kennedy Franklin Pierce Herbert Hoover
## 1 2.919388 5.037570 3.083024 2.919388 3.569388 2.2193878 1.9693878
## 2 2.097959 4.216141 2.261596 2.097959 2.747959 1.3979592 1.1479592
## 3 1.669388 3.787570 1.833024 1.669388 2.319388 0.9693878 0.7193878
## 4 2.919388 5.037570 3.083024 2.919388 3.569388 2.2193878 1.9693878
## 5 2.669388 4.787570 2.833024 2.669388 3.319388 1.9693878 1.7193878
## 6 3.669388 5.787570 3.833024 3.669388 4.319388 2.9693878 2.7193878
## 7 3.383673 5.501855 3.547310 3.383673 4.033673 2.6836735 2.4336735
## 8 2.383673 4.501855 2.547310 2.383673 3.033673 1.6836735 1.4336735
## 9 2.419388 4.537570 2.583024 2.419388 3.069388 1.7193878 1.4693878
## 10 3.002721 5.120903 3.166357 3.002721 3.652721 2.3027211 2.0527211
## 11 2.269388 4.387570 2.433024 2.269388 2.919388 1.5693878 1.3193878
## 12 2.383673 4.501855 2.547310 2.383673 3.033673 1.6836735 1.4336735
## 13 2.812245 4.930427 2.975881 2.812245 3.462245 2.1122449 1.8622449
## 14 3.169388 5.287570 3.333024 3.169388 3.819388 2.4693878 2.2193878
## 15 3.044388 5.162570 3.208024 3.044388 3.694388 2.3443878 2.0943878
## William McKinley Andrew Jackson Thomas Jefferson
## 1 2.764842 2.2193878 4.373234
## 2 1.943414 1.3979592 3.551805
## 3 1.514842 0.9693878 3.123234
## 4 2.764842 2.2193878 4.373234
## 5 2.514842 1.9693878 4.123234
## 6 3.514842 2.9693878 5.123234
## 7 3.229128 2.6836735 4.837520
## 8 2.229128 1.6836735 3.837520
## 9 2.264842 1.7193878 3.873234
## 10 2.848176 2.3027211 4.456567
## 11 2.114842 1.5693878 3.723234
## 12 2.229128 1.6836735 3.837520
## 13 2.657699 2.1122449 4.266091
## 14 3.014842 2.4693878 4.623234
## 15 2.889842 2.3443878 4.498234
The root mean square error of the training baseline predictor, is
## [1] 1.130327
After predicting the rating for each user, for each president, we want to test it using our test data. The root mean square error of the test data, without using a prediction model, is
## [1] 1.941389
The matrix for our testing data is:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## 1 2 NA NA NA NA NA NA NA NA NA
## 2 NA 4.0 NA NA NA NA 1 NA NA NA
## 3 NA 4.5 NA 1 NA NA NA NA NA NA
## 4 3 NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA NA
## 6 NA NA 2 NA NA NA 4 2 NA NA
## 7 1 NA NA 5 NA NA NA NA 4 NA
## 8 NA NA NA NA NA NA NA NA 3 NA
## 9 3 NA NA NA NA NA 2 NA NA NA
## 10 NA NA NA NA NA NA NA NA NA NA
## 11 NA 5.0 NA NA NA 1 1 NA NA 4
## 12 3 5.0 NA NA NA NA 1 NA NA NA
## 13 NA NA NA NA 3 NA 3 NA NA NA
## 14 NA NA NA 3 4 3 NA NA NA 5
## 15 NA NA NA NA NA 1 1 NA NA NA
The root mean square error for the testing data, taking the recommender model into account, is
## [1] 0.9413736
The ratio of the RMSEraw to RMSEmodeled is:
## [1] 2.062294
Our model successfully reduced the root mean square error to half of what it was without a model.