-Basic Idea: using different data sources to infer customer interests.
-Basic principle: significant dependencies exist between user- and item-centric activity
-User: recommendation; \(m\) is the number of users.
-item: product being recommended; \(n\) is the number of items
\(r_{ij}\): the observed rating of user \(i\) for item \(j\).
Note: similar to response matrix in DCM.
| Respondent/User | Item1 | Item2 | Item3 | Item4 | Item5 | Item6 | Item7 | Item8 |
|---|---|---|---|---|---|---|---|---|
| Respondent1/User1 | 1 | 0 | na | 0 | na | 1 | 1 | na |
| Respondent2/User2 | 0 | 1 | 1 | na | 0 | na | 1 | 1 |
| Respondent3/User3 | na | 0 | 0 | 1 | 1 | 1 | 0 | na |
| Respondent4/User4 | 0 | 1 | 1 | na | 0 | 1 | 0 | na |
Note: we have the following \(Q-matrix\) design in DCM, which is pre-defined by content experts. The \(Q-matrix\) can have simple or complex structures.
| Item | Attribute1 | Attribute2 | Attribute3 | Attribute4 | Attribute5 | Attribute6 |
|---|---|---|---|---|---|---|
| Item1 | 1 | 0 | 1 | 1 | 0 | 1 |
| Item2 | 0 | 1 | 1 | 1 | 0 | 1 |
| Item3 | 1 | 0 | 0 | 0 | 0 | 1 |
| Item4 | 0 | 1 | 1 | 0 | 1 | 0 |
| Item5 | 0 | 1 | 0 | 1 | 1 | 0 |
| Item6 | 0 | 1 | 1 | 1 | 0 | 1 |
| Item7 | 1 | 0 | 0 | 0 | 1 | 1 |
| Item8 | 1 | 1 | 1 | 0 | 1 | 0 |
Collaborative filtering: using ratings from multiple users in a collaborative way to predict missing ratings.
DCM: using responses from different respondents to predict attributes and missing attributes.
Main challenge: underlying ratings matrices are sparse (but can be imputed)
user-based: similarity functions are computed between rows of rating matrix to discover similar users
respondent-based: to discover similar respondents with same mastery profiles.
item-based: Similarity functions are computed between the columns of the ratings matrix to discover similar items.
attribute-based: to discover similar attributes between the columns.
Advantages: simple to implement; easy to explain.
Disadvantages: not working very well with sparse ratings matrices.
Advantages: working very well with sparse ratings matrices.
Combinations of memory-based and model-based methods provide very accurate results.
Types of Ratings
interval-based: {-2, -1, 0, 1, 2} from extremely dislike to extremely like
ordinal-based: {1, 2, 3, 4, 5} from poor to excellent.
binary: {0, 1} frequently used in psychometric models.
unary: {1} to specify a liking but no disliking. not recommended
| Movie1 | Movie2 | Movie3 | Movie4 | Movie5 | Movie6 | |
|---|---|---|---|---|---|---|
| User1 | 1 | 1 | 1 | |||
| User2 | 1 | 1 | ||||
| User3 | 1 | 1 | 1 | |||
| User4 | 1 | 1 | ||||
| User5 | 1 | 1 | ||||
| User6 | 1 | 1 |
Figure 1.3: Examples of utility matrices: Unary rating (Aggarwal, 2016; p.12)
utility matrix: utility refers to the amount of profit incurred by recommending that item to the particular user.
unique attribute matrix: the most difficult item correctly responded by the particular respondent, which gives the respondent a unique mastery profile??.
Note: substitution of missing entries with any value leads to a significant amount of bias
Aggarwal, 2016; p. 13
Collaborative filtering models are closely related to missing value analysis.
Collaborative filtering as a generalization of classification and regression modeling.
Aggarwal, 2016; p. 13
Accuracy
Coverage
Confidence and Trust
Novelty 新颖度
Serendipity 惊喜度
Diversity 多样化
Robustness and Stability
Scalability
Interpretability (my thoughts)
a. Context-Sensitive (multidimensional?? multi-level??)
b. Time-Sensitive(process data??)
c. Location-Based (online vs offline??)
d. Social
Neighborhood-based collaborative filtering algorithms : memory-based algorithm
user-based: the ratings provided by similar users to a target user A to make recommendations for A. The predicted ratings of A are computed as the weighted average values of these “peer group” ratings for each item.
respondent-based: the assumption here: respondents respond to each item independently, no dependency. Independent and identically distributed (IID).
item-based: to determine a set S of similar items to target item B. The weighted average of these ratings is used to compute the predicted rating of user A for item B.
item-based: assumption: IID for items. .
Conclusion: Different assumptions here and in educational assessments: We assume conditional independence or local independence. i.e., the responses to an item are independent of the responses to any other item conditional on the respondent’s ability \(\theta\). In the cases of interdependency, additional latent variables such as gender, ethnicity, speededness,etc. are needed.
Continuous ratings: any values
Interval-based ratings: 5-point or N-point scales, equidistant.
Ordinal rating: similar to interval-based, also ordered categorical; difference between pair of adjacent ratings values.
Binary ratings: two options (0/1) (M/F)
Unary rating: one option only
1. User-based Neighborhood Models: similarity functions are computed between rows of rating matrix to discover similar users.
respondent-based: IID assumptions .
Step One: Compute the mean rating \(\mu_u\) for each user \(u\) using her specified ratings:
\(\mu_{u}=\frac{\sum_{k \in I_{u}} r_{u k}}{\left|I_{u}\right|} \quad \forall u \in\{1 \ldots m\}\) (2.1; p. 35)
Cosine function on the raw ratings: is it the right one?? (2.5, p. 37)
\(\operatorname{Raw} \operatorname{Cosine}(u, v)=\frac{\sum_{k \in I_{u} \cap I_{v}} r_{u k} \cdot r_{v k}}{\sqrt{\sum_{k \in I_{u} \cap I_{v}} r_{u k}^{2}} \cdot \sqrt{\sum_{k \in I_{u} \cap I_{v}} r_{v k}^{2}}}\)
Step Two: Pearson correlation coefficient between rows(users) \(u\) and \(v\):
\(\operatorname{Sim}(u, v)=\operatorname{Pearson}(u, v)=\frac{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right) \cdot\left(r_{v k}-\mu_{v}\right)}{\sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right)^{2}} \cdot \sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{v k}-\mu_{v}\right)^{2}}}\) (2.2; p. 35)
covariance by the product of the two variables’ standard deviations.
The Pearson correlation coefficient is preferable to the raw cosine because of the bias adjustment effect of meaning-centering.
Step Three: mean-centered rating \(s_{uj}\):
\(s_{u j}=r_{u j}-\mu_{u} \quad \forall u \in\{1 \ldots m\}\) (2.3; p. 35)
Step Four: overall neighborhood-based prediction function (p.36):
\(\hat{r}_{u j}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot s_{v j}}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot\left(r_{v j}-\mu_{v}\right)}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}\) (2.4; p.36)
Example: (p.36)
Similarity function variants
variants of the prediction function
Variations in Filtering peer groups
impact of the long tail
Aggarwal, 2016; p. 33
2. Item-based Neighborhood Models: Similarity functions are computed between the columns of the ratings matrix to discover similar items. (p.40)
item-based: IID assumptions .
The adjusted cosine similarity between the items (columns) \(i\) and \(j\) is defined as follows:
\(\operatorname{Adjusted} \operatorname{Cosine}(i, j)=\frac{\sum_{u \in U_{i} \cap U_{j}} s_{u i} \cdot s_{u j}}{\sqrt{\sum_{u \in U_{i} \cap U_{j}} s_{u i}^{2}} \cdot \sqrt{\sum_{u \in U_{i} \cap U_{j}} s_{u j}^{2}}}\) (2.14, p.40)
The predicted rating \(\hat{r}_{ut}\) of user \(u\) for target item \(t\) is as follows:
\(\hat{r}_{u t}=\frac{\sum_{j \in Q_{t}(u)} \operatorname{Adjusted} \operatorname{Cosine}(j, t) \cdot r_{u j}}{\sum_{j \in Q_{t}(u)}|\operatorname{Adjusted} \operatorname{Cosine}(j, t)|}\) (2.15, p.40)
Example (p.40- p.41)
3. Efficient Implementation and Computational Complexity
Neighborhood-based methods is used to determine the best item recommendations for a target user or the best user recommendations for a target item.
We have more users than the number of items in real situations.
4. Comparing User-Based and Item-Based Methods (p.42-p.44)
Item-based Neighborhood Models: more relevant recommendations, better accuracy, more robust, concrete reasons for a recommendation, more stable with changes to the ratings. How about diversity, serendipity, novelty, etc.
User-based Neighborhood Models: Different explanations.
5. Strengths and Weaknesses of Neighborhood-Based Methods (p.44)
simple and intuitive;
easy to implement and debug;
easy to justify;
not easily available in model-based methods
impractical in large-scale settings.
6. Unified View
Main idea of clustering-based methods: replace offline nearest-neighbor computation phase with clustering.
the trade-off between accuracy and efficiency.
the rating matrix is incomplete. massive incomplete data sets.
clusters are dependent on the representatives.
purpose of clustering here: to reduce the computational complexity. It divided the original matrices into smaller matrices.
Dimensionality Reduction: improve neighborhood-based methods in quality and efficiency. (p.47)
Dimensionality Reduction: latent factor models.
singular value decomposition (SVD)-like methods. Netflix??? SVD for large-scale recommendations??? reduce the rank???. An approach for dimensionality reduction.
principal component analysis (PCA)-like methods.
Bias (p.49-p.51)
Bias: caused by filling the unspecified entries with average values along either the rows or columns.
Maximum Likelihood Estimation: covariance matrix is computed.
Aggarwal, 2016; p.50
Direct matrix factorization of incomplete data (p.50)
\(R=Q\Sigma P^T\) (2.18; p.51)
1. User-Based Nearest Neighbor Regression (p.53)
The predicted rating of target user \(u\) for item \(j\) is as follows:
\(\hat{r}_{u j}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot\left(r_{v j}-\mu_{v}\right)}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}\) (2.22; p. 53)
The least-squares objective function is as follows:
\(\begin{aligned} \text { Minimize } J_{u} &=\sum_{j \in I_{u}}\left(r_{u j}-\hat{r}_{u j}\right)^{2} \\ &=\sum_{j \in I_{u}}\left(r_{u j}-\left[\mu_{u}+\sum_{v \in P_{u}(j)} w_{v u}^{u s e r} \cdot\left(r_{v j}-\mu_{v}\right)\right]\right)^{2} \end{aligned}\) (p.55)
The consolidated optimization problem is as follows:
\(\operatorname{Minimize} \sum_{u=1}^{m} J_{u}=\sum_{u=1}^{m} \sum_{j \in I_{u}}\left(r_{u j}-\left[\mu_{u}+\sum_{v \in P_{u}(j)} w_{v u}^{u s e r} \cdot\left(r_{v j}-\mu_{v}\right)\right]\right)^{2}\) (2.23, p.54)
The User-Based Nearest Neighbor Regression Model: (2.26; p.55)
\(\hat{r}_{u j}=b_{u}^{\text {user }}+b_{j}^{i t e m}+\frac{\sum_{v \in P_{u}(j)} w_{v u}^{\text {user }} \cdot\left(r_{v j}-b_{v}^{\text {user }}-b_{j}^{\text {item }}\right)}{\sqrt{\left|P_{u}(j)\right|}}\) (2.26; p.55)
sparsity and Bias
2. Item-Based Nearest Neighbor Regression (p.55)
The predicted rating of target user \(u\) for item \(t\) is as follows:
\(\hat{r}_{u t}=\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot r_{u j}\) (2.27, p.55)
The least-squares objective function is as follows:
\(\begin{aligned} \text { Minimize } J_{t} &=\sum_{u \in U_{t}}\left(r_{u t}-\hat{r}_{u t}\right)^{2} \\ &=\sum_{u \in U_{t}}\left(r_{u t}-\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot r_{u j}\right)^{2} \end{aligned}\) (p.56)
The consolidated optimization problem is as follows:
\(\operatorname{Minimize} \sum_{t=1}^{n} \sum_{u \in U_{t}}\left(r_{u t}-\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot r_{u j}\right)^{2}\) (2.28; p.56)
The Item-Based Nearest Neighbor Regression Model: (2.30; p.55)
\(\hat{r}_{u t}=b_{u}^{u s e r}+b_{t}^{i t e m}+\frac{\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot\left(r_{u j}-B_{u j}\right)}{\sqrt{\left|Q_{t}(u)\right|}}\) (2.30; p.56)
3. Combing Two Methods (p.57)
The combined model can be expressed as follows:
\(\hat{r}_{u j}=b_{u}^{u s e r}+b_{j}^{i t e m}+\frac{\sum_{v \in P_{u}(j)} w_{v u}^{\text {user }} \cdot\left(r_{v j}-B_{v j}\right)}{\sqrt{\left|P_{u}(j)\right|}}+\frac{\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot\left(r_{u j}-B_{u j}\right)}{\sqrt{\left|Q_{t}(u)\right|}}\) (2.31, p.57)
4. Joint Interpolation with Similarity Weighting (p.57)
The objective function:
\(\begin{aligned} \text { Minimize } & \sum_{s:(u, s) \in S} \sum_{j: j \neq s} \operatorname{AdjustedCosine}(j, s) \cdot\left(r_{u s}-\hat{r}_{u j}\right)^{2} \\ &=\sum_{s:(u, s) \in S j: j \neq s} \sum_{j d j u s t e d C o s i n e}(j, s) \cdot\left(r_{u s}-\left[\mu_{u}+\sum_{v \in P_{u}(j)} w_{v u}^{u s e r} \cdot\left(r_{v j}-\mu_{v}\right)\right]\right)^{2} \end{aligned}\) (p.57)
5. Sparse Linear Models (SLIM) (p.58)
the prediction function in SLIM is:
\(\hat{r}_{u t}=\sum^{n} w_{j t}^{i t e m} \cdot r_{u j} \forall u \in\{1 \ldots m\}, \forall t \in\{1 \ldots n\}\) (2.33; p.58)
the objective function is:
\(\begin{aligned} \text { Minimize } J_{t}^{s} &=\sum_{u=1}^{m}\left(r_{u t}-\hat{r}_{u t}\right)^{2}+\lambda \cdot \sum_{j=1}^{n}\left(w_{j t}^{i t e m}\right)^{2}+\lambda_{1} \cdot \sum_{j=1}^{n}\left|w_{j t}^{i t e m}\right| \\ &=\sum_{u=1}^{m}\left(r_{u t}-\sum_{j=1}^{n} w_{j t}^{i t e m} \cdot r_{u j}\right)^{2}+\lambda \cdot \sum_{j=1}^{n}\left(w_{j t}^{i t e m}\right)^{2}+\lambda_{1} \cdot \sum_{j=1}^{n}\left|w_{j t}^{i t e m}\right| \\ \text { subject to: } \end{aligned}\) \[ \begin{aligned} &w_{j t}^{i t e m} \geq 0 \forall j \in\{1 \ldots n\} \\ &w_{t t}^{\text {item }}=0 \end{aligned} \] (p.59)
1. User-Item Graphs (p.61-p.63)
2. User-User Graphs (p.63-p.65)
Aggarwal, 2016; p. 43
3. Item-Item Graphs (p.66-p.67)
Aggarwal, 2016; p. 43