Chapter One: An Introduction to Recommender Systems

ONE: Introduction

-Basic Idea: using different data sources to infer customer interests.

-Basic principle: significant dependencies exist between user- and item-centric activity

  • \(m \times n\) rating matrix denoted by \(R\).

-User: recommendation; \(m\) is the number of users.

-item: product being recommended; \(n\) is the number of items

\(r_{ij}\): the observed rating of user \(i\) for item \(j\).

Note: similar to response matrix in DCM.

Respondent/User Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8
Respondent1/User1 1 0 na 0 na 1 1 na
Respondent2/User2 0 1 1 na 0 na 1 1
Respondent3/User3 na 0 0 1 1 1 0 na
Respondent4/User4 0 1 1 na 0 1 0 na

Note: we have the following \(Q-matrix\) design in DCM, which is pre-defined by content experts. The \(Q-matrix\) can have simple or complex structures.

Item Attribute1 Attribute2 Attribute3 Attribute4 Attribute5 Attribute6
Item1 1 0 1 1 0 1
Item2 0 1 1 1 0 1
Item3 1 0 0 0 0 1
Item4 0 1 1 0 1 0
Item5 0 1 0 1 1 0
Item6 0 1 1 1 0 1
Item7 1 0 0 0 1 1
Item8 1 1 1 0 1 0

Collaborative filtering: using ratings from multiple users in a collaborative way to predict missing ratings.

DCM: using responses from different respondents to predict attributes and missing attributes.

TWO: Basic Models

a. Collaborative filtering models

Main challenge: underlying ratings matrices are sparse (but can be imputed)

  1. memory-based methods: neighborhood-based algorithms; regression-based models; similarity-based models.
  • user-based: similarity functions are computed between rows of rating matrix to discover similar users

  • respondent-based: to discover similar respondents with same mastery profiles.

  • item-based: Similarity functions are computed between the columns of the ratings matrix to discover similar items.

  • attribute-based: to discover similar attributes between the columns.

  • Advantages: simple to implement; easy to explain.

  • Disadvantages: not working very well with sparse ratings matrices.

  1. model-based methods: predictive parameterized models, such as decision trees, rule-based models, Bayesian methods and latent factor models.
  • Advantages: working very well with sparse ratings matrices.

  • Combinations of memory-based and model-based methods provide very accurate results.

Types of Ratings

  • interval-based: {-2, -1, 0, 1, 2} from extremely dislike to extremely like

  • ordinal-based: {1, 2, 3, 4, 5} from poor to excellent.

  • binary: {0, 1} frequently used in psychometric models.

  • unary: {1} to specify a liking but no disliking. not recommended

Movie1 Movie2 Movie3 Movie4 Movie5 Movie6
User1 1 1 1
User2 1 1
User3 1 1 1
User4 1 1
User5 1 1
User6 1 1

Figure 1.3: Examples of utility matrices: Unary rating (Aggarwal, 2016; p.12)

  • utility matrix: utility refers to the amount of profit incurred by recommending that item to the particular user.

  • unique attribute matrix: the most difficult item correctly responded by the particular respondent, which gives the respondent a unique mastery profile??.

  • Note: substitution of missing entries with any value leads to a significant amount of bias

Aggarwal, 2016; p. 13

  1. Collaborative filtering models with missing value analysis
  • Collaborative filtering models are closely related to missing value analysis.

  • Collaborative filtering as a generalization of classification and regression modeling.

b. Content-Based Recommender Systems

c. Knowledge-Based Recommender Systems

d. Demographic Recommender Systems

e. Hybrid and Ensemble-Based Recommender Systems

Aggarwal, 2016; p. 13

f. Evaluation -orthogonal to each other

  • Accuracy

  • Coverage

  • Confidence and Trust

  • Novelty 新颖度

  • Serendipity 惊喜度

  • Diversity 多样化

  • Robustness and Stability

  • Scalability

  • Interpretability (my thoughts)

THREE: Domain-Specific Challenges

  • a. Context-Sensitive (multidimensional?? multi-level??)

  • b. Time-Sensitive(process data??)

  • c. Location-Based (online vs offline??)

  • d. Social

FOUR: Advanced Topics and Applications

FIVE: Summary

Chapter Two: Neighborhood-Based Collaborative Filtering

ONE: Introduction (p. 29-p.31)

Neighborhood-based collaborative filtering algorithms : memory-based algorithm

  • user-based: the ratings provided by similar users to a target user A to make recommendations for A. The predicted ratings of A are computed as the weighted average values of these “peer group” ratings for each item.

  • respondent-based: the assumption here: respondents respond to each item independently, no dependency. Independent and identically distributed (IID).

  • item-based: to determine a set S of similar items to target item B. The weighted average of these ratings is used to compute the predicted rating of user A for item B.

  • item-based: assumption: IID for items. .

  • Conclusion: Different assumptions here and in educational assessments: We assume conditional independence or local independence. i.e., the responses to an item are independent of the responses to any other item conditional on the respondent’s ability \(\theta\). In the cases of interdependency, additional latent variables such as gender, ethnicity, speededness,etc. are needed.

TWO: Key Properties of Rating Matrices *(p. 31-p.33)

  • Continuous ratings: any values

  • Interval-based ratings: 5-point or N-point scales, equidistant.

  • Ordinal rating: similar to interval-based, also ordered categorical; difference between pair of adjacent ratings values.

  • Binary ratings: two options (0/1) (M/F)

  • Unary rating: one option only

THREE: Predicting Ratings with Neighborhood-Based Methods (p. 33-p.45)

1. User-based Neighborhood Models: similarity functions are computed between rows of rating matrix to discover similar users.

  • respondent-based: IID assumptions .

  • Step One: Compute the mean rating \(\mu_u\) for each user \(u\) using her specified ratings:

  • \(\mu_{u}=\frac{\sum_{k \in I_{u}} r_{u k}}{\left|I_{u}\right|} \quad \forall u \in\{1 \ldots m\}\) (2.1; p. 35)

  • Cosine function on the raw ratings: is it the right one?? (2.5, p. 37)

  • \(\operatorname{Raw} \operatorname{Cosine}(u, v)=\frac{\sum_{k \in I_{u} \cap I_{v}} r_{u k} \cdot r_{v k}}{\sqrt{\sum_{k \in I_{u} \cap I_{v}} r_{u k}^{2}} \cdot \sqrt{\sum_{k \in I_{u} \cap I_{v}} r_{v k}^{2}}}\)

  • Step Two: Pearson correlation coefficient between rows(users) \(u\) and \(v\):

  • \(\operatorname{Sim}(u, v)=\operatorname{Pearson}(u, v)=\frac{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right) \cdot\left(r_{v k}-\mu_{v}\right)}{\sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right)^{2}} \cdot \sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{v k}-\mu_{v}\right)^{2}}}\) (2.2; p. 35)

    covariance by the product of the two variables’ standard deviations.

  • The Pearson correlation coefficient is preferable to the raw cosine because of the bias adjustment effect of meaning-centering.

  • Step Three: mean-centered rating \(s_{uj}\):

  • \(s_{u j}=r_{u j}-\mu_{u} \quad \forall u \in\{1 \ldots m\}\) (2.3; p. 35)

  • Step Four: overall neighborhood-based prediction function (p.36):

\(\hat{r}_{u j}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot s_{v j}}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot\left(r_{v j}-\mu_{v}\right)}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}\) (2.4; p.36)

Aggarwal, 2016; p. 34 Example: (p.36)

  • Similarity function variants

  • variants of the prediction function

  • Variations in Filtering peer groups

  • impact of the long tail

Aggarwal, 2016; p. 33

2. Item-based Neighborhood Models: Similarity functions are computed between the columns of the ratings matrix to discover similar items. (p.40)

  • item-based: IID assumptions .

  • The adjusted cosine similarity between the items (columns) \(i\) and \(j\) is defined as follows:

  • \(\operatorname{Adjusted} \operatorname{Cosine}(i, j)=\frac{\sum_{u \in U_{i} \cap U_{j}} s_{u i} \cdot s_{u j}}{\sqrt{\sum_{u \in U_{i} \cap U_{j}} s_{u i}^{2}} \cdot \sqrt{\sum_{u \in U_{i} \cap U_{j}} s_{u j}^{2}}}\) (2.14, p.40)

  • The predicted rating \(\hat{r}_{ut}\) of user \(u\) for target item \(t\) is as follows:

  • \(\hat{r}_{u t}=\frac{\sum_{j \in Q_{t}(u)} \operatorname{Adjusted} \operatorname{Cosine}(j, t) \cdot r_{u j}}{\sum_{j \in Q_{t}(u)}|\operatorname{Adjusted} \operatorname{Cosine}(j, t)|}\) (2.15, p.40)

Example (p.40- p.41)

3. Efficient Implementation and Computational Complexity

  • Neighborhood-based methods is used to determine the best item recommendations for a target user or the best user recommendations for a target item.

  • We have more users than the number of items in real situations.

4. Comparing User-Based and Item-Based Methods (p.42-p.44)

  • Item-based Neighborhood Models: more relevant recommendations, better accuracy, more robust, concrete reasons for a recommendation, more stable with changes to the ratings. How about diversity, serendipity, novelty, etc.

  • User-based Neighborhood Models: Different explanations.

Aggarwal, 2016; p. 43 5. Strengths and Weaknesses of Neighborhood-Based Methods (p.44)

  • simple and intuitive;

  • easy to implement and debug;

  • easy to justify;

  • not easily available in model-based methods

  • impractical in large-scale settings.

6. Unified View

FOUR: Clustering and Neighborhood-Based Methods (p. 45-p.47)

  • Main idea of clustering-based methods: replace offline nearest-neighbor computation phase with clustering.

  • the trade-off between accuracy and efficiency.

  • the rating matrix is incomplete. massive incomplete data sets.

  • clusters are dependent on the representatives.

  • purpose of clustering here: to reduce the computational complexity. It divided the original matrices into smaller matrices.

FIVE: Dimensionality Reduction and Neighborhood-Based Methods **(p. 47-p.48)

  • Dimensionality Reduction: improve neighborhood-based methods in quality and efficiency. (p.47)

  • Dimensionality Reduction: latent factor models.

  • singular value decomposition (SVD)-like methods. Netflix??? SVD for large-scale recommendations??? reduce the rank???. An approach for dimensionality reduction.

  • principal component analysis (PCA)-like methods.

  • Bias (p.49-p.51)

  • Bias: caused by filling the unspecified entries with average values along either the rows or columns.

  • Maximum Likelihood Estimation: covariance matrix is computed.

Aggarwal, 2016; p.50

  • Direct matrix factorization of incomplete data (p.50)

  • \(R=Q\Sigma P^T\) (2.18; p.51)

SIX: Regression Modeling View of Neighborhood-Based Methods (p. 51-p.60)

1. User-Based Nearest Neighbor Regression (p.53)

  • The predicted rating of target user \(u\) for item \(j\) is as follows:

  • \(\hat{r}_{u j}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot\left(r_{v j}-\mu_{v}\right)}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}\) (2.22; p. 53)

  • The least-squares objective function is as follows:

  • \(\begin{aligned} \text { Minimize } J_{u} &=\sum_{j \in I_{u}}\left(r_{u j}-\hat{r}_{u j}\right)^{2} \\ &=\sum_{j \in I_{u}}\left(r_{u j}-\left[\mu_{u}+\sum_{v \in P_{u}(j)} w_{v u}^{u s e r} \cdot\left(r_{v j}-\mu_{v}\right)\right]\right)^{2} \end{aligned}\) (p.55)

  • The consolidated optimization problem is as follows:

  • \(\operatorname{Minimize} \sum_{u=1}^{m} J_{u}=\sum_{u=1}^{m} \sum_{j \in I_{u}}\left(r_{u j}-\left[\mu_{u}+\sum_{v \in P_{u}(j)} w_{v u}^{u s e r} \cdot\left(r_{v j}-\mu_{v}\right)\right]\right)^{2}\) (2.23, p.54)

  • The User-Based Nearest Neighbor Regression Model: (2.26; p.55)

  • \(\hat{r}_{u j}=b_{u}^{\text {user }}+b_{j}^{i t e m}+\frac{\sum_{v \in P_{u}(j)} w_{v u}^{\text {user }} \cdot\left(r_{v j}-b_{v}^{\text {user }}-b_{j}^{\text {item }}\right)}{\sqrt{\left|P_{u}(j)\right|}}\) (2.26; p.55)

  • sparsity and Bias

2. Item-Based Nearest Neighbor Regression (p.55)

  • The predicted rating of target user \(u\) for item \(t\) is as follows:

  • \(\hat{r}_{u t}=\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot r_{u j}\) (2.27, p.55)

  • The least-squares objective function is as follows:

  • \(\begin{aligned} \text { Minimize } J_{t} &=\sum_{u \in U_{t}}\left(r_{u t}-\hat{r}_{u t}\right)^{2} \\ &=\sum_{u \in U_{t}}\left(r_{u t}-\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot r_{u j}\right)^{2} \end{aligned}\) (p.56)

  • The consolidated optimization problem is as follows:

  • \(\operatorname{Minimize} \sum_{t=1}^{n} \sum_{u \in U_{t}}\left(r_{u t}-\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot r_{u j}\right)^{2}\) (2.28; p.56)

  • The Item-Based Nearest Neighbor Regression Model: (2.30; p.55)

  • \(\hat{r}_{u t}=b_{u}^{u s e r}+b_{t}^{i t e m}+\frac{\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot\left(r_{u j}-B_{u j}\right)}{\sqrt{\left|Q_{t}(u)\right|}}\) (2.30; p.56)

3. Combing Two Methods (p.57)

  • The combined model can be expressed as follows:

  • \(\hat{r}_{u j}=b_{u}^{u s e r}+b_{j}^{i t e m}+\frac{\sum_{v \in P_{u}(j)} w_{v u}^{\text {user }} \cdot\left(r_{v j}-B_{v j}\right)}{\sqrt{\left|P_{u}(j)\right|}}+\frac{\sum_{j \in Q_{t}(u)} w_{j t}^{i t e m} \cdot\left(r_{u j}-B_{u j}\right)}{\sqrt{\left|Q_{t}(u)\right|}}\) (2.31, p.57)

4. Joint Interpolation with Similarity Weighting (p.57)

  • The objective function:

  • \(\begin{aligned} \text { Minimize } & \sum_{s:(u, s) \in S} \sum_{j: j \neq s} \operatorname{AdjustedCosine}(j, s) \cdot\left(r_{u s}-\hat{r}_{u j}\right)^{2} \\ &=\sum_{s:(u, s) \in S j: j \neq s} \sum_{j d j u s t e d C o s i n e}(j, s) \cdot\left(r_{u s}-\left[\mu_{u}+\sum_{v \in P_{u}(j)} w_{v u}^{u s e r} \cdot\left(r_{v j}-\mu_{v}\right)\right]\right)^{2} \end{aligned}\) (p.57)

5. Sparse Linear Models (SLIM) (p.58)

  • the prediction function in SLIM is:

  • \(\hat{r}_{u t}=\sum^{n} w_{j t}^{i t e m} \cdot r_{u j} \forall u \in\{1 \ldots m\}, \forall t \in\{1 \ldots n\}\) (2.33; p.58)

  • the objective function is:

  • \(\begin{aligned} \text { Minimize } J_{t}^{s} &=\sum_{u=1}^{m}\left(r_{u t}-\hat{r}_{u t}\right)^{2}+\lambda \cdot \sum_{j=1}^{n}\left(w_{j t}^{i t e m}\right)^{2}+\lambda_{1} \cdot \sum_{j=1}^{n}\left|w_{j t}^{i t e m}\right| \\ &=\sum_{u=1}^{m}\left(r_{u t}-\sum_{j=1}^{n} w_{j t}^{i t e m} \cdot r_{u j}\right)^{2}+\lambda \cdot \sum_{j=1}^{n}\left(w_{j t}^{i t e m}\right)^{2}+\lambda_{1} \cdot \sum_{j=1}^{n}\left|w_{j t}^{i t e m}\right| \\ \text { subject to: } \end{aligned}\) \[ \begin{aligned} &w_{j t}^{i t e m} \geq 0 \forall j \in\{1 \ldots n\} \\ &w_{t t}^{\text {item }}=0 \end{aligned} \] (p.59)

SEVEN: Graph Models for Neighborhood-Based Methods **(p. 60-p.67)

1. User-Item Graphs (p.61-p.63) Aggarwal, 2016; p. 43

2. User-User Graphs (p.63-p.65)

Aggarwal, 2016; p. 43

3. Item-Item Graphs (p.66-p.67)

Aggarwal, 2016; p. 43

EIGHT: Summary (p. 67)