Chapter One: An Introduction to Recommender Systems
ONE: Introduction
-Basic Idea: using different data sources to infer customer interests.
-Basic principle: significant dependencies exist between user- and item-centric activity
- \(m \times n\) rating matrix denoted by \(R\).
-User: recommendation; \(m\) is the number of users.
-item: product being recommended; \(n\) is the number of items
\(r_{ij}\): the observed rating of user \(i\) for item \(j\).
Note: similar to response matrix in DCM.
| Respondent1/User1 |
1 |
0 |
na |
0 |
na |
1 |
1 |
na |
| Respondent2/User2 |
0 |
1 |
1 |
na |
0 |
na |
1 |
1 |
| Respondent3/User3 |
na |
0 |
0 |
1 |
1 |
1 |
0 |
na |
| Respondent4/User4 |
0 |
1 |
1 |
na |
0 |
1 |
0 |
na |
Note: we have the following \(Q-matrix\) design in DCM, which is pre-defined by content experts. The \(Q-matrix\) can have simple or complex structures.
| Item1 |
1 |
0 |
1 |
1 |
0 |
1 |
| Item2 |
0 |
1 |
1 |
1 |
0 |
1 |
| Item3 |
1 |
0 |
0 |
0 |
0 |
1 |
| Item4 |
0 |
1 |
1 |
0 |
1 |
0 |
| Item5 |
0 |
1 |
0 |
1 |
1 |
0 |
| Item6 |
0 |
1 |
1 |
1 |
0 |
1 |
| Item7 |
1 |
0 |
0 |
0 |
1 |
1 |
| Item8 |
1 |
1 |
1 |
0 |
1 |
0 |
Collaborative filtering: using ratings from multiple users in a collaborative way to predict missing ratings.
DCM: using responses from different respondents to predict attributes and missing attributes.
TWO: Basic Models
a. Collaborative filtering models
Main challenge: underlying ratings matrices are sparse (but can be imputed)
- memory-based methods: neighborhood-based algorithms; regression-based models; similarity-based models.
user-based: similarity functions are computed between rows of rating matrix to discover similar users
respondent-based: to discover similar respondents with same mastery profiles.
item-based: Similarity functions are computed between the columns of the ratings matrix to discover similar items.
attribute-based: to discover similar attributes between the columns.
Advantages: simple to implement; easy to explain.
Disadvantages: not working very well with sparse ratings matrices.
- model-based methods: predictive parameterized models, such as decision trees, rule-based models, Bayesian methods and latent factor models.
Types of Ratings
interval-based: {-2, -1, 0, 1, 2} from extremely dislike to extremely like
ordinal-based: {1, 2, 3, 4, 5} from poor to excellent.
binary: {0, 1} frequently used in psychometric models.
unary: {1} to specify a liking but no disliking. not recommended
| User1 |
1 |
|
|
1 |
|
1 |
| User2 |
|
1 |
|
|
1 |
|
| User3 |
1 |
1 |
|
1 |
|
|
| User4 |
|
|
1 |
|
|
1 |
| User5 |
|
|
|
1 |
1 |
|
| User6 |
1 |
|
1 |
|
|
|
Figure 1.3: Examples of utility matrices: Unary rating (Aggarwal, 2016; p.12)
utility matrix: utility refers to the amount of profit incurred by recommending that item to the particular user.
unique attribute matrix: the most difficult item correctly responded by the particular respondent, which gives the respondent a unique mastery profile??.
Note: substitution of missing entries with any value leads to a significant amount of bias
- Collaborative filtering models with missing value analysis
b. Content-Based Recommender Systems
c. Knowledge-Based Recommender Systems
d. Demographic Recommender Systems
e. Hybrid and Ensemble-Based Recommender Systems
f. Evaluation -orthogonal to each other
THREE: Domain-Specific Challenges
a. Context-Sensitive (multidimensional?? multi-level??)
b. Time-Sensitive(process data??)
c. Location-Based (online vs offline??)
d. Social
FOUR: Advanced Topics and Applications
Chapter Two: Neighborhood-Based Collaborative Filtering
ONE: Introduction *(p. 29-p.31)
Neighborhood-based collaborative filtering algorithms : memory-based algorithm
user-based: the ratings provided by similar users to a target user A to make recommendations for A. The predicted ratings of A are computed as the weighted average values of these “peer group” ratings for each item.
respondent-based: the assumption here: respondents respond to each item independently, no dependency. Independent and identically distributed (IID).
item-based: to determine a set S of similar items to target item B. The weighted average of these ratings is used to compute the predicted rating of user A for item B.
item-based: assumption: IID for items. .
Conclusion: Different assumptions in here and in educational assessments: We assume conditional independence or local independence. i.e., the responses to an item are independent of the responses to any other item conditional on the respondent’s ability \(\theta\). In the cases of interdependency, additional latent variables such as gender, ethnicity, speededness,etc. are needed.
TWO: Key Properties of Rating Matrices *(p. 31-p.33)
Continuous ratings: any values
Interval-based ratings: 5-point or N-point scales, equidistant.
Ordinal rating: similar to interval-based, also ordered categorical; difference between pair of adjacent ratings values.
Binary ratings: two options (0/1) (M/F)
Unary rating: one option only
THREE: Predicting Ratings with Neighborhood-Based Methods *(p. 33-p.45)
- User-based Neighborhood Models : similarity functions are computed between rows of rating matrix to discover similar users
respondent-based: IID assumptions .
Step One: Compute the mean rating \(\mu_u\) for each user \(u\) using her specified ratings:
\(\mu_{u}=\frac{\sum_{k \in I_{u}} r_{u k}}{\left|I_{u}\right|} \quad \forall u \in\{1 \ldots m\}\) (2.1; p. 35)
Cosine Formula is missing??
Step Two: Pearson correlation coefficient between rows(users) \(u\) and \(v\):
\(\operatorname{Sim}(u, v)=\operatorname{Pearson}(u, v)=\frac{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right) \cdot\left(r_{v k}-\mu_{v}\right)}{\sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right)^{2}} \cdot \sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{v k}-\mu_{v}\right)^{2}}}\) (2.2; p. 35)
covariance by the product of the two variables’ standard deviations.
Step Three: mean-centered rating \(s_{uj}\):
\(s_{u j}=r_{u j}-\mu_{u} \quad \forall u \in\{1 \ldots m\}\) (2.3; p. 35)
Step Four: overall neighborhood-based prediction function:
\(\hat{r}_{u j}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot s_{v j}}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot\left(r_{v j}-\mu_{v}\right)}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}\)
Example: (p.36)
- Similarity function variants
- item-based: Similarity functions are computed between the columns of the ratings matrix to discover similar items.
- item-based: IID assumptions .
---
title: "A Review of Recommender System by Aggarwal (2016)"
author: Mingying Zheng
output: html_notebook
---

####################################################################################################################
####################################################################################################################

## <span style="color:blue"> Chapter One: An Introduction to Recommender Systems </span>

### <span style="color:blue"> ONE: Introduction </span>

-*Basic Idea*: using different data sources to infer customer interests. 

-*Basic principle*: significant dependencies exist between user- and item-centric activity

- $m \times n$ rating matrix denoted by $R$.
 
-*User*: recommendation; $m$ is the number of users.

-*item*: product being recommended; $n$ is the number of items

$r_{ij}$: the observed rating of user $i$ for item $j$. 

<span style="color:green"> **Note**: similar to response matrix in DCM. </span>

| Respondent/User|Item1|Item2|Item3|Item4|Item5|Item6|Item7|Item8|
| :-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
| Respondent1/User1 |1   |0   |na   |0   |na   |1   |1   |na   |0   |
| Respondent2/User2 |0   |1   |1    |na  |0    |na  |1   |1    |na  |
| Respondent3/User3 |na  |0   |0    |1   |1    |1   |0   |na   |0   |
| Respondent4/User4 |0   |1   |1    |na  |0    |1   |0   |na   |0   |

<span style="color:green"> **Note**: we have the following $Q-matrix$ design in DCM, which is pre-defined by content experts. The $Q-matrix$ can have simple or complex structures. </span>

| Item  |Attribute1|Attribute2|Attribute3|Attribute4|Attribute5|Attribute6|
| :-|:-:|:-:|:-:|:-:|:-:|:-:|
| Item1 | 1 | 0 |1 | 1 |0 | 1 |
| Item2 | 0 | 1 |1 | 1 |0 | 1 |
| Item3 | 1 | 0 |0 | 0 |0 | 1 |
| Item4 | 0 | 1 |1 | 0 |1 | 0 |
| Item5 | 0 | 1 |0 | 1 |1 | 0 |
| Item6 | 0 | 1 |1 | 1 |0 | 1 |
| Item7 | 1 | 0 |0 | 0 |1 | 1 |
| Item8 | 1 | 1 |1 | 0 |1 | 0 |

*Collaborative filtering*: using ratings from multiple users in a collaborative way to predict missing ratings. 

<span style="color:green">*DCM*: using responses from different respondents to predict attributes and **missing** attributes</span>.

### <span style="color:blue"> TWO: Basic Models </span>

####  <span style="color:blue"> a. Collaborative filtering models </span>

*Main challenge*: underlying ratings matrices are sparse (but can be imputed)

1. **memory-based methods**: neighborhood-based algorithms; regression-based models; similarity-based models.

  -  *user-based*: similarity functions are computed between rows of rating matrix to discover similar users
   
  - <span style="color:green"> *respondent-based*: to discover similar respondents with same mastery profiles</span>.
   
  - *item-based*: Similarity functions are computed between the *columns* of the ratings matrix to discover similar items. 
  
  - <span style="color:green"> *attribute-based*: to discover similar attributes between the *columns*</span>.
  
  - **Advantages**: simple to implement; easy to explain.
   
  - **Disadvantages**: not working very well with <span style="color:red">sparse</span> ratings matrices.
  
2. **model-based methods**: predictive parameterized models, such as decision trees, rule-based models, Bayesian methods and latent factor models.  
  
  - **Advantages**: working very well with <span style="color:red">sparse</span> ratings matrices.
    
  - <span style="color:green"> **Combinations of memory-based and model-based methods provide very accurate results.** </span>

*Types of Ratings*

- **interval-based**: {-2, -1, 0, 1, 2} from extremely dislike to extremely like

- **ordinal-based**: {1, 2, 3, 4, 5} from poor to excellent.

- **binary**: {0, 1} <span style="color:green"> frequently used in psychometric models</span>.

- **unary**: {1} <span style="color:red"> to specify a liking but no disliking. **not recommended** </span>

|   |Movie1|Movie2|Movie3|Movie4|Movie5|Movie6|
| :-|:-:|:-:|:-:|:-:|:-:|:-:|
| User1 |1 |  |  |1 |  |1 |
| User2 |  |1 |  |  |1 |  |
| User3 |1 |1 |  |1 |  |  |
| User4 |  |  |1 |  |  |1 |
| User5 |  |  |  |1 |1 |  |
| User6 |1 |  |1 |  |  |  |

<span style="color:black">**Figure 1.3: Examples of utility matrices: Unary rating** (Aggarwal, 2016; p.12)</span>

- *utility matrix*: utility refers to the amount of profit incurred by recommending that item to the particular user. 

- *unique attribute matrix*: <span style="color:green"> the most difficult item correctly responded by the particular respondent, which gives the respondent a unique mastery profile??</span>.

- <span style="color:red">**Note: substitution of missing entries with any value leads to a significant amount of bias**</span>

![Aggarwal, 2016; p. 13](./ch1_Fig1.4.png)

3. Collaborative filtering models with missing value analysis

- Collaborative filtering models are closely related to missing value analysis. 

- <span style="color:Green">**Collaborative filtering as a generalization of classification and regression modeling. ** </span>


#### <span style="color:grey"> b. Content-Based Recommender Systems </span>

#### <span style="color:grey"> c. Knowledge-Based Recommender Systems </span>

#### <span style="color:grey"> d. Demographic Recommender Systems </span>

#### <span style="color:red"> e. **Hybrid and Ensemble-Based Recommender Systems** </span>

![Aggarwal, 2016; p. 13](./ch1_Fig1.5.png)


#### <span style="color:blue"> f. Evaluation -**orthogonal to each other** </span>

- <span style="color:green"> **Accuracy** </span>

- Coverage

- Confidence and Trust

- Novelty 新颖度

- Serendipity 惊喜度

- Diversity 多样化

- <span style="color:green"> **Robustness and Stability** </span>

- Scalability

- <span style="color:green"> **Interpretability (my thoughts)** </span>

### <span style="color:blue"> THREE: Domain-Specific Challenges </span>

- <span style="color:green"> a. **Context-Sensitive (multidimensional?? multi-level??)** </span>

- <span style="color:green"> b. **Time-Sensitive(process data??)** </span>

- <span style="color:green"> c. **Location-Based (online vs offline??)** </span>

- <span style="color:grey"> d. Social </span>



### <span style="color:grey"> FOUR:  Advanced Topics and Applications </span>

### <span style="color:grey"> FIVE: Summary </span>


####################################################################################################################
####################################################################################################################

## <span style="color:blue"> Chapter Two: Neighborhood-Based Collaborative Filtering </span>

### <span style="color:blue"> ONE: Introduction </span> *(p. 29-p.31)

**Neighborhood-based collaborative filtering algorithms **: *memory-based algorithm*

-  *user-based*: the ratings provided by similar users to *a target* user **A** to make recommendations for A. The predicted ratings of **A** are computed as the weighted average values of these "peer group" ratings for each item. 
   
  - <span style="color:green"> *respondent-based*: the assumption here: respondents respond to each item independently, no dependency. Independent and identically distributed (IID)</span>.
   
  - *item-based*: to determine a set **S** of similar items to *target* item **B**. The weighted average of these ratings is used to compute the predicted rating of user **A** for item **B**. 
  
  - <span style="color:green"> *item-based*: assumption: IID for items. </span>.
  
  - <span style="color:green"> **Conclusion**: Different assumptions in here and in educational assessments: We assume conditional independence or local independence. i.e., the responses to an item are independent of the responses to any other item conditional on the respondent's ability $\theta$. In the cases of interdependency, additional latent variables such as gender, ethnicity, speededness,etc. are needed. </span>
  
### <span style="color:blue"> TWO: Key Properties of Rating Matrices </span> *(p. 31-p.33)
  
  - *Continuous ratings*: any values 
  
  - *Interval-based ratings*: 5-point or N-point scales, equidistant.

  - *Ordinal rating*: similar to interval-based, also ordered categorical; difference between pair of adjacent ratings values. 
  
  - *Binary ratings*: two options (0/1) (M/F)
  
  - *Unary rating*: one option only
  

### <span style="color:blue">THREE: Predicting Ratings with Neighborhood-Based Methods </span> *(p. 33-p.45)

1.  *User-based Neighborhood Models *: similarity functions are computed between rows of rating matrix to discover similar users
   
  - <span style="color:green"> *respondent-based*: IID assumptions </span>.
  
  - Step One: Compute the mean rating $\mu_u$ for each user $u$ using her specified ratings:
  
  - $\mu_{u}=\frac{\sum_{k \in I_{u}} r_{u k}}{\left|I_{u}\right|} \quad \forall u \in\{1 \ldots m\}$  (2.1; p. 35)
  
  <span style="color:red"> **Cosine Formula is missing??** </span>
  
 - Step Two: Pearson correlation coefficient between rows(users) $u$ and $v$:
   
- $\operatorname{Sim}(u, v)=\operatorname{Pearson}(u, v)=\frac{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right) \cdot\left(r_{v k}-\mu_{v}\right)}{\sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{u k}-\mu_{u}\right)^{2}} \cdot \sqrt{\sum_{k \in I_{u} \cap I_{v}}\left(r_{v k}-\mu_{v}\right)^{2}}}$ (2.2; p. 35)
   
  <span style="color:white"> covariance by the product of the two variables' standard deviations.</span>
  
 - Step Three: mean-centered rating $s_{uj}$:
 
 - $s_{u j}=r_{u j}-\mu_{u} \quad \forall u \in\{1 \ldots m\}$ (2.3; p. 35)
  
 - Step Four: overall neighborhood-based prediction function:
 
 $\hat{r}_{u j}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot s_{v j}}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}=\mu_{u}+\frac{\sum_{v \in P_{u}(j)} \operatorname{Sim}(u, v) \cdot\left(r_{v j}-\mu_{v}\right)}{\sum_{v \in P_{u}(j)}|\operatorname{Sim}(u, v)|}$
 
 Example: (p.36)
 ![Aggarwal, 2016; p. 34](./ch2_Table2.1.png)
 - <span style="color:grey"> Similarity function variants </span>
 
 - <span style="color:grey"> variants of the prediction function </span>
 
 - <span style="color:grey"> Variations in Filtering peer groups </span>
 
 - <span style="color:red"> **impact of the long tail** </span>
 
 ![Aggarwal, 2016; p. 13](./ch2_Fig2.1.png)
 
  
2.  *item-based*: Similarity functions are computed between the *columns* of the ratings matrix to discover similar items. 
  
  - <span style="color:green"> *item-based*: IID assumptions </span>.
  
 
  












