- 套件簡介
- Data Preparation
- Data Cleaning & Exploration
- Recommenderlab
- 數據處理
- recommender
- 推薦方法簡介
- 以UBCF為例的參數含義介紹
- 建立推薦模型 - UBCF為例
- 推薦系統的評估
- 預測模型的評估
- 推薦結果的評估
- Reference
- IBCF code
2017-03-03
recommenderlab套件的資料屬性運用s4結構,使用抽象的raringMatrix來做為可分析數據image()函數,專門用來畫heatmapGroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities specializing in recommender systems, online communities, mobile and ubiquitous technologies, digital libraries, and local geographic information systems.
movie <- read.table("u.data", header = F, stringsAsFactors = T)
head(movie)
V1 V2 V3 V4 1 196 242 3 881250949 2 186 302 3 891717742 3 22 377 1 878887116 4 244 51 2 880606923 5 166 346 1 886397596 6 298 474 4 884182806
temp = movie %>% select(1:3) %>% spread(V2,V3) %>% select(-1) temp[1:10,1:10]
1 2 3 4 5 6 7 8 9 10 1 5 3 4 3 3 5 4 1 5 3 2 4 NA NA NA NA NA NA NA NA 2 3 NA NA NA NA NA NA NA NA NA NA 4 NA NA NA NA NA NA NA NA NA NA 5 4 3 NA NA NA NA NA NA NA NA 6 4 NA NA NA NA NA 2 4 4 NA 7 NA NA NA 5 NA NA 5 5 5 4 8 NA NA NA NA NA NA 3 NA NA NA 9 NA NA NA NA NA 5 4 NA NA NA 10 4 NA NA 4 NA NA 4 NA 4 NA
realRatingMatrix是Recommenderlab這個套件針對rating:1~5的類別所使用的資料結構,需要從Matrix轉換過來。
class(temp)
[1] "data.frame"
library("recommenderlab")
temp_mov = temp %>%
as.matrix() %>%
as("realRatingMatrix")
class(temp_mov)
[1] "realRatingMatrix" attr(,"package") [1] "recommenderlab"
temp_mov
943 x 1682 rating matrix of class 'realRatingMatrix' with 100000 ratings.
# 我們挑兩種出來看就好,不然位置不夠 recommenderRegistry$get_entries(dataType = "realRatingMatrix")[c(3,9)]
$IBCF_realRatingMatrix
Recommender method: IBCF for realRatingMatrix
Description: Recommender based on item-based collaborative filtering.
Reference: NA
Parameters:
k method normalize normalize_sim_matrix alpha na_as_zero
1 30 "Cosine" "center" FALSE 0.5 FALSE
$UBCF_realRatingMatrix
Recommender method: UBCF for realRatingMatrix
Description: Recommender based on user-based collaborative filtering.
Reference: NA
Parameters:
method nn sample normalize
1 "cosine" 25 FALSE "center"
$UBCF_realRatingMatrix
Recommender method: UBCF for realRatingMatrix
Description: Recommender based on user-based collaborative filtering.
Reference: NA
Parameters:
method nn sample normalize
1 "cosine" 25 FALSE "center"
recommender()是recommenderlab套件中,用於建立模型的函數colnames(temp_mov) <- paste("M", 1:1682, sep = "")
as(temp_mov[1,1:10], "list")
$`1` M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 5 3 4 3 3 5 4 1 5 3
# 基於用戶推薦的模型建立 temp_mov.recommModel <- Recommender(temp_mov[1:700], method = "UBCF") temp_mov.recommModel
Recommender of type 'UBCF' for 'realRatingMatrix' learned using 700 users.
##TopN推薦,n = 5 表示Top5推薦 temp_mov.predict1 <- predict(temp_mov.recommModel, temp_mov[701:703], n = 5) temp_mov.predict1
Recommendations as 'topNList' with n = 5 for 3 users.
$`701` [1] "M302" "M268" "M258" "M126" "M475" $`702` [1] "M50" "M272" "M172" "M302" "M174" $`703` [1] "M313" "M98" "M174" "M427" "M125"
temp_mov.predict2 <- predict(temp_mov.recommModel,temp_mov[701:703], type = "ratings") temp_mov.predict2
3 x 1682 rating matrix of class 'realRatingMatrix' with 4935 ratings.
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 701 NA 4.2 4.1 4.2 4.2 4.3 4.4 4.2 4.4 4.2 4.2 4.3 4.2 4.4 4.3 4.2 702 2.6 2.4 2.5 2.4 2.5 2.5 2.3 2.5 2.4 2.5 2.4 2.5 2.4 2.5 2.5 2.5 703 NA 3.6 3.6 3.5 3.6 3.5 NA 3.6 NA 3.5 3.7 3.7 3.5 3.6 NA 3.6
To evaluate models, you need to build them with some data and test them on some other data. This chapter will show you how to prepare the two sets of data. The recommenderlab package contains prebuilt tools that help in this task.
evaluationScheme()
# 評價方案:943個樣本中,80%做training,20%做testing
# 測試集中15個項目用於推薦演算法中,剩餘的項目用於計算誤差
model.eval <- evaluationScheme(temp_mov[1:943], method = "split",
train = 0.8, given = 15 , goodRating = 5)
model.eval
Evaluation scheme with 15 items given Method: 'split' with 1 run(s). Training set proportion: 0.800 Good ratings: >=5.000000 Data set: 943 x 1682 rating matrix of class 'realRatingMatrix' with 100000 ratings.
evaluationScheme()資料差異getData(model.eval, "train")
754 x 1682 rating matrix of class 'realRatingMatrix' with 80899 ratings.
getData(model.eval, "known")
189 x 1682 rating matrix of class 'realRatingMatrix' with 2835 ratings.
getData(model.eval, "unknown")
189 x 1682 rating matrix of class 'realRatingMatrix' with 16266 ratings.
model.ubcf <- Recommender(getData(model.eval, "train"), method = "UBCF")
predict.ubcf <- predict(model.ubcf, getData(model.eval, "known"), type = "ratings")
error_ubcf = calcPredictionAccuracy(predict.ubcf, getData(model.eval, "unknown")) error_ubcf
RMSE MSE MAE 1.05 1.10 0.85
Evaluating the recommendations
results_I <- evaluate(x = model.eval, method = "IBCF", n = seq(10, 100, 10) )
IBCF run fold/sample [model time/prediction time]
1 [52sec/0.26sec]
results_U <- evaluate(x = model.eval, method = "UBCF", n = seq(10, 100, 10) )
UBCF run fold/sample [model time/prediction time]
1 [0.01sec/2.2sec]
getConfusionMatrix(), we can extract a list of confusion matriceshead(getConfusionMatrix(results_U)[[1]]) %>% kable
| TP | FP | FN | TN | precision | recall | TPR | FPR | |
|---|---|---|---|---|---|---|---|---|
| 10 | 1.9 | 8.1 | 17 | 1640 | 0.19 | 0.18 | 0.18 | 0.00 |
| 20 | 3.0 | 17.0 | 16 | 1631 | 0.15 | 0.25 | 0.25 | 0.01 |
| 30 | 3.9 | 26.1 | 15 | 1622 | 0.13 | 0.30 | 0.30 | 0.02 |
| 40 | 4.7 | 35.3 | 14 | 1613 | 0.12 | 0.34 | 0.34 | 0.02 |
| 50 | 5.4 | 44.6 | 14 | 1603 | 0.11 | 0.37 | 0.37 | 0.03 |
| 60 | 5.9 | 54.1 | 13 | 1594 | 0.10 | 0.40 | 0.40 | 0.03 |
# 基於物品的推薦 model.ibcf <- Recommender(temp_mov[1:700], method = "IBCF") model.ibcf
Recommender of type 'IBCF' for 'realRatingMatrix' learned using 700 users.
##TopN推薦,n = 5 表示Top5推薦 pre_model_ibcf <- predict(temp_mov.recommModel, temp_mov[701:703], n = 5) pre_model_ibcf
Recommendations as 'topNList' with n = 5 for 3 users.
as( temp_mov.predict1, "list")
$`701` [1] "M302" "M268" "M258" "M126" "M475" $`702` [1] "M50" "M272" "M172" "M302" "M174" $`703` [1] "M313" "M98" "M174" "M427" "M125"
pre_model_ibcf_rating <- predict(temp_mov.recommModel,
temp_mov[701:703], type = "ratings")
pre_model_ibcf_rating
3 x 1682 rating matrix of class 'realRatingMatrix' with 4935 ratings.
# 評價方案:943個樣本中,80%做training,20%做testing
# 測試集中15個項目用於推薦演算法中,剩餘的項目用於計算誤差
model.eval <- evaluationScheme(temp_mov[1:943], method = "split",
train = 0.8, given = 15 , goodRating = 5)
model.eval
Evaluation scheme with 15 items given Method: 'split' with 1 run(s). Training set proportion: 0.800 Good ratings: >=5.000000 Data set: 943 x 1682 rating matrix of class 'realRatingMatrix' with 100000 ratings.
# trainindata建立
model.ibcf <- Recommender(getData(model.eval, "train"), method = "IBCF")
# 評分預測
predict.ibcf <- predict(model.ibcf,
getData(model.eval, "known"), type = "ratings")
# 計算評分的test error rate
error_ibcf = calcPredictionAccuracy(predict.ibcf,
getData(model.eval, "unknown"))
error_ibcf
RMSE MSE MAE 1.19 1.40 0.85