rm(list=ls(all=T))
options(digits=3, scipen=40)
library(caTools)
library(magrittr)



1. 價量關係

\[ \sigma = \sqrt{\frac{s^2 \pi^2}{3}} \quad ;\quad s = \sqrt{\frac{3\sigma^2}{\pi^2}}\]

par(cex=0.7, mfcol=c(2,2), mar=c(4,4,4,2))
mu=100; sigma=10; s = sqrt(3 * sigma^2 / pi^2)
curve(dlogis(x,mu,s),60,140,col='blue',lwd=2,xlab="",ylab="",
      main="Logistic Dist. over WTP")
curve(1-plogis(x,mu,s),60,140,col='cyan',lwd=2,xlab="",ylab="",
      main="Logistic Price Response Function") 
curve(dnorm(x,mu,sigma),60,140,col='blue',lwd=2,xlab="",ylab="",
      main="Normal Dist. over WTP")
curve(1-pnorm(x,mu,sigma),60,140,col='cyan',lwd=2,xlab="",ylab="",
      main="Probit Price Response Function")


2. 顧客價值與顧客選擇

\(value = utility - price = V\)

\(V \sim \mathit{logistic} \rightarrow Pr[buy] \sim \mathit{Logistic}\)

\(Pr[buy] = \frac{1}{1+exp(-V)}\)

par(cex=0.8, mar=c(4,4,4,2))
curve(plogis(x,0,4),-30,30,col='cyan',lwd=3,xlab="Value",ylab="Prob[buy]",
      main="Logistic Function")
abline(v=seq(-30,30,5), h=seq(0,1,0.1), col='lightgray',lty=3)
points(0,0.5,pch=20,cex=2.5,col='orange')


2.1 Where does the Value come from?

Value as a function of product attribures - \(V(x_1, x_2, ..., price)\) where \(x\)’s are product option items - - -

3. 顧客選擇模型

1.將資料中要加入的變數先做線性迴歸模型,得出linear function: \(f(x)=y=b_0+b_1 x_1+⋯+b_k x_k\) 這裡的概念是透過一些屬性\((x_1,x_2,x_3…,x_k)\)來得出顧客對這次訂單的效用(y)

2.由剛剛線性迴歸得出的效用(y)代入logistic function可轉換出「購買機率(p)」 p= 1/(1+Exp(-f(X)))

接下來以一份顧客資料來實際操作模型。

bd <- read.csv("RawShort1.csv")
head(bd)

3.1 Modeling

mod1 <- glm(Order ~ Time+Quantity+PricePerLb, data=bd, family=binomial)
summary(mod1)

Call:
glm(formula = Order ~ Time + Quantity + PricePerLb, family = binomial, 
    data = bd)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.332  -1.083  -0.805   1.221   3.175  

Coefficients:
            Estimate Std. Error z value            Pr(>|z|)    
(Intercept)  0.64640    0.05509    11.7 <0.0000000000000002 ***
Time        -0.03106    0.00259   -12.0 <0.0000000000000002 ***
Quantity    -0.48567    0.02522   -19.3 <0.0000000000000002 ***
PricePerLb  -0.19365    0.01514   -12.8 <0.0000000000000002 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 21532  on 15886  degrees of freedom
Residual deviance: 20702  on 15883  degrees of freedom
AIC: 20710

Number of Fisher Scoring iterations: 5

3.2 Regression Coefficient 迴歸係數

coef(mod1)
(Intercept)        Time    Quantity  PricePerLb 
     0.6464     -0.0311     -0.4857     -0.1936 


\[V(x) = 0.646 - 0.031 Time - 0.4857 Qty - 0.1937 Price\] \[\text{Logit (log odd):} \quad V(x) = b_0 + b_1 x_1 + ... + b_k x_k = logit = log(odd) = log(\frac{p}{1-p})\]

\[\text{Prob. of Buying:} \quad p = \mathfrak{L}(logit) = \frac{1}{1+exp(-V)}\]

  • 觀念複習:迴歸結果的係數會如何影Prob. Logit與Odd呢?

3.3 Prediction

bd$logOdd = predict(mod1, bd)
bd$ProbBuy = predict(mod1, bd, type='response')
head(bd)
  • 若在指令中未鍵入”type=’response’”,R會計算出顧客價值(V)。
  • 反之,若在指令中鍵入”type=’response’”則會轉化成購買機率。
1/(1+exp(-bd$logOdd[1:6]))
[1] 0.149 0.367 0.472 0.362 0.505 0.387
  • Q: 這行的目的與上面的response有何差異?

3.4 Marginal Effect of Predictors

從迴歸係數估計預測變數對購買機率的邊際效果

\[ \text{odd} = e^{b_0+b_1x_1} \\ e^{b_0+b_1(x_1+1)} = e^{b_0+b_1x_1+b_1} = e^{b_0+b_1x_1} \times e^{b_1}\]

迴歸係數可以透過指數函數轉換為勝率比

  • 結論:logit的邊際效果不用代入指數函數,以變數前的係數即可衡量;odd的邊際效果則要代入指數函數。
coef(mod1) %>% exp
(Intercept)        Time    Quantity  PricePerLb 
      1.909       0.969       0.615       0.824 

注意:勝率和機率之間沒有固定的比率關係

\[ o = \frac{p}{1-p} \quad ; \quad p = \frac{o}{1+o} \]

par(mfcol= c(1,2), cex=0.8)
curve(x/(1-x), 0.01, 0.99, xlab="prob", ylab="odd")
curve(x/(1+x), 0.01, 100, xlab="odd", ylab="prob")

k = 2
p = seq(0.01, 0.99, 0.01)
o = p/(1-p)
pko = (k*o)/(1+k*o)
plot(p, pko - p, type='l', xlab="prob", ylab="delta prob", lwd=2, col="cyan")

par(cex=0.8)
plot(0.5,0,pch=20,col='gray',xlim=c(0,1),ylim=c(-0.8,0.8),
     xlab="base prob.",ylab="prob. increment",
     main = "Odd Ratio = 50, 10, 5, 2, 1, 1/2, 1/5, 1/10, 1/50")
abline(h=seq(-0.8,0.8,0.1), v=seq(0,1,0.05), col='lightgray', lty=3 )
p = seq(0.01, 0.99, 0.01)
for(k in c(1/50, 1/10, 1/5, 1/2, 1, 2, 5, 10, 50)) {
  o = p/(1-p)
  pko = (k*o)/(1+k*o)
  lines(p, pko - p, lwd=2)  
}

  • 給定每個不同機率p,在odd比值的改變下,所影響p的效果也會不一樣。
par(cex=0.8)
plot(0.5,0.5,pch=20,col='gray',xlim=c(0,1),ylim=c(0,1),
     xlab="base prob.",ylab="resultant prob.",
     main = "Odd Ratio = 50, 10, 5, 2, 1, 1/2, 1/5, 1/10, 1/50")
abline(h=seq(0,1,0.1), v=seq(0,1,0.1), col='lightgray', lty=3 )
p = seq(0.01, 0.99, 0.01)
for(k in c(1/50, 1/10, 1/5, 1/2, 1, 2, 5, 10, 50)) {
  o = p/(1-p)
  pko = (k*o)/(1+k*o)
  lines(p, pko, lwd=2)  
}


4. 客製化報價

expProfit =  function(price, mod, data) {
  data$PricePerLb = price
  (price - data$CostPerLb) * predict(mod, data, type="response")
}
expProfit(4.5, mod1, bd[1,])
   1 
0.29 
mean(bd$PricePerLb)
[1] 3.15
optim(3.15, expProfit, method="BFGS", control=list(fnscale=-1), 
      mod=mod1, data=bd[1,] )
$par
[1] 7.09

$value
[1] 0.345

$counts
function gradient 
      11        9 

$convergence
[1] 0

$message
NULL
bd[1,]
price = seq(4, 10, 0.2)
prob = predict(mod1, data.frame(Time=9, Quantity=3.5, PricePerLb=price), type="response")
expReturn = prob * (price - 1.58)
plot(price, expReturn , type='l')

5. 行為經濟模型(是否刪除?)

5.1 Modeling

bd$gain = (bd$LagPrice - bd$PricePerLb)*(bd$LagPrice > bd$PricePerLb)
bd$loss = (bd$PricePerLb - bd$LagPrice)*(bd$PricePerLb > bd$LagPrice)
mod2 = glm(Order ~ Time+Quantity+gain+loss+Quantity:gain+Quantity:loss, 
            data=bd, family=binomial)
summary(mod2)

Call:
glm(formula = Order ~ Time + Quantity + gain + loss + Quantity:gain + 
    Quantity:loss, family = binomial, data = bd)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.544  -1.081  -0.751   1.206   3.176  

Coefficients:
              Estimate Std. Error z value             Pr(>|z|)    
(Intercept)    0.13223    0.02789    4.74           0.00000213 ***
Time          -0.03248    0.00261  -12.43 < 0.0000000000000002 ***
Quantity      -0.50230    0.03079  -16.31 < 0.0000000000000002 ***
gain           0.10646    0.02124    5.01           0.00000054 ***
loss          -0.32350    0.02513  -12.87 < 0.0000000000000002 ***
Quantity:gain  0.05545    0.01818    3.05               0.0023 ** 
Quantity:loss -0.13806    0.07437   -1.86               0.0634 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 21532  on 15886  degrees of freedom
Residual deviance: 20475  on 15880  degrees of freedom
AIC: 20489

Number of Fisher Scoring iterations: 5

5.2 Model Comparison

par(cex=0.8)
data.frame(
  model1 = predict(mod1, bd, type="response"),
  model2 = predict(mod2, bd, type="response")
  ) %>% colAUC(bd$Order, plotROC=TRUE)
        model1 model2
0 vs. 1  0.614  0.633

5.3 Optimization/Customization

expProfit2 =  function(price, mod, data) {
  data$gain = (price - data$PricePerLb)*(data$LagPrice > price)
  data$loss = (price - data$LagPrice)*(price > data$LagPrice)
  (price - data$CostPerLb) * predict(mod, data, type="response")
}
expProfit2(4.5, mod2, bd[1,])
     1 
0.0655 
optim(4, expProfit2, method="BFGS", control=list(fnscale=-1), 
      mod=mod2, data=bd[1,] )
$par
[1] 2.93

$value
[1] 0.101

$counts
function gradient 
      10        7 

$convergence
[1] 0

$message
NULL





---
title: "顧客價值、顧客選擇、客製化報價"
author: "tonychuo@mail.nsysu.edu.tw"
date: "`r Sys.time()`"
output: html_notebook
---
+ 對於顧客願付價格之pdf，為何在normal dist.與logistic dist.中選擇後者呢?
+ 前情提要: 上次我們針對市場之價格反應函數，推估出最高利潤之定價，但這是對市場所有消費者統一定價。
+ 如果可以根據每個消費者所感覺的"價值"，對他們制定不同價格，相對於統一價格有更高利益。

<br>
```{r set-options, echo=FALSE, cache=FALSE}
library(knitr)
options(width=100)
opts_chunk$set(comment = NA)
```

```{r  warning=F, message=F, cache=F, error=F}
rm(list=ls(all=T))
options(digits=3, scipen=40)
library(caTools)
library(magrittr)
```
<br>

- - -

### 1. 價量關係

$$ \sigma = \sqrt{\frac{s^2 \pi^2}{3}} \quad ;\quad 
s = \sqrt{\frac{3\sigma^2}{\pi^2}}$$

```{r fig.width=7, fig.height=6}
par(cex=0.7, mfcol=c(2,2), mar=c(4,4,4,2))
mu=100; sigma=10; s = sqrt(3 * sigma^2 / pi^2)
curve(dlogis(x,mu,s),60,140,col='blue',lwd=2,xlab="",ylab="",
      main="Logistic Dist. over WTP")
curve(1-plogis(x,mu,s),60,140,col='cyan',lwd=2,xlab="",ylab="",
      main="Logistic Price Response Function") 
curve(dnorm(x,mu,sigma),60,140,col='blue',lwd=2,xlab="",ylab="",
      main="Normal Dist. over WTP")
curve(1-pnorm(x,mu,sigma),60,140,col='cyan',lwd=2,xlab="",ylab="",
      main="Probit Price Response Function")
```

+ 上面WTP與價格反應函數之圖中，X軸與Y軸分別代表什麼呢?
+ pdf與cdf之差別? 如何在R中畫出這些分布?

- - -

### 2. 顧客價值與顧客選擇

+ 什麼是顧客價值(Value)? 每個顧客價值的分布會長什麼樣子?
+ 我們有辦法知道每個顧客價值嗎?
+ 使用邏輯式迴歸可依據顧客購買之機率(Probability)，預測到最後顧客有沒有買

$value = utility - price = V$

$V \sim \mathit{logistic} \rightarrow Pr[buy] \sim \mathit{Logistic}$

$Pr[buy] = \frac{1}{1+exp(-V)}$

```{r}
par(cex=0.8, mar=c(4,4,4,2))
curve(plogis(x,0,4),-30,30,col='cyan',lwd=3,xlab="Value",ylab="Prob[buy]",
      main="Logistic Function")
abline(v=seq(-30,30,5), h=seq(0,1,0.1), col='lightgray',lty=3)
points(0,0.5,pch=20,cex=2.5,col='orange')
```
<br>

+ 實務上，因為無法取得顧客價值(Value)的分布，但我們可以知道當一個產品的價格和效用一樣的時候，顧客選擇購買的機率正是0.5。(圖中的橘點)
+ 而將顧客價值與購買機率的關係是透過Logistic function轉換得知，而此圖則是Logistic function的函數圖形。


#### 2.1 Where does the Value come from?
Value as a function of product attribures - $V(x_1, x_2, ..., price)$
where $x$'s are product option items
- - -

### 3. 顧客選擇模型

1.將資料中要加入的變數先做線性迴歸模型，得出linear function:
 $f(x)=y=b_0+b_1 x_1+⋯+b_k x_k$
這裡的概念是透過一些屬性$(x_1,x_2,x_3…,x_k)$來得出顧客對這次訂單的效用(y)

2.由剛剛線性迴歸得出的效用(y)代入logistic function可轉換出「購買機率(p)」
p= 1/(1+Exp(-f(X)))

+ 補充說明：
    +	p為購買機率；odd為勝算(odd=p/(1-p))；Logit是勝算取log的值。(所以邏輯式迴歸跟邏輯完全無關！)
    + 這三者(p,odd,logit)事實上皆是機率的概念，但所要表達的本質不同。
    + 效用value(y)其實剛好是ln(odd)的結果。若我們能觀察到value，則能知道其分布，但要如何觀察到呢?

接下來以一份顧客資料來實際操作模型。


```{r}
bd <- read.csv("RawShort1.csv")
head(bd)
```



#### 3.1 Modeling
```{r}
mod1 <- glm(Order ~ Time+Quantity+PricePerLb, data=bd, family=binomial)
summary(mod1)
```

#### 3.2 Regression Coefficient 迴歸係數

```{r}
coef(mod1)
```
<br>

$$V(x) = 0.646 - 0.031 Time - 0.4857 Qty - 0.1937 Price$$
$$\text{Logit (log odd):} \quad 
V(x) = b_0 + b_1 x_1 + ... + b_k x_k = logit = log(odd) = log(\frac{p}{1-p})$$

$$\text{Prob. of Buying:} \quad p = \mathfrak{L}(logit) = \frac{1}{1+exp(-V)}$$

+ 觀念複習:迴歸結果的係數會如何影Prob. Logit與Odd呢?

#### 3.3 Prediction
```{r}
bd$logOdd = predict(mod1, bd)
bd$ProbBuy = predict(mod1, bd, type='response')
head(bd)
```
+ 若在指令中未鍵入”type=’response’”，R會計算出顧客價值(V)。
+ 反之，若在指令中鍵入”type=’response’”則會轉化成購買機率。

```{r}
1/(1+exp(-bd$logOdd[1:6]))
```
+ Q: 這行的目的與上面的response有何差異？

#### 3.4 Marginal Effect of Predictors 

從迴歸係數估計預測變數對購買機率的邊際效果

$$ \text{odd} = e^{b_0+b_1x_1} \\
e^{b_0+b_1(x_1+1)} = e^{b_0+b_1x_1+b_1} = e^{b_0+b_1x_1} \times e^{b_1}$$

迴歸係數可以透過指數函數轉換為勝率比

+ 結論：logit的邊際效果不用代入指數函數，以變數前的係數即可衡量；odd的邊際效果則要代入指數函數。

```{r}
coef(mod1) %>% exp
```

注意：勝率和機率之間沒有固定的比率關係

$$ o = \frac{p}{1-p} \quad ; \quad p = \frac{o}{1+o} $$ 

```{r fig.height=3, fig.width=7}
par(mfcol= c(1,2), cex=0.8)
curve(x/(1-x), 0.01, 0.99, xlab="prob", ylab="odd")
curve(x/(1+x), 0.01, 100, xlab="odd", ylab="prob")
```

```{r}
k = 2
p = seq(0.01, 0.99, 0.01)
o = p/(1-p)
pko = (k*o)/(1+k*o)
plot(p, pko - p, type='l', xlab="prob", ylab="delta prob", lwd=2, col="cyan")
```

```{r fig.height=4.5, fig.width=6}
par(cex=0.8)
plot(0.5,0,pch=20,col='gray',xlim=c(0,1),ylim=c(-0.8,0.8),
     xlab="base prob.",ylab="prob. increment",
     main = "Odd Ratio = 50, 10, 5, 2, 1, 1/2, 1/5, 1/10, 1/50")
abline(h=seq(-0.8,0.8,0.1), v=seq(0,1,0.05), col='lightgray', lty=3 )
p = seq(0.01, 0.99, 0.01)
for(k in c(1/50, 1/10, 1/5, 1/2, 1, 2, 5, 10, 50)) {
  o = p/(1-p)
  pko = (k*o)/(1+k*o)
  lines(p, pko - p, lwd=2)  
}
```
+ 給定每個不同機率p，在odd比值的改變下，所影響p的效果也會不一樣。

```{r fig.height=4.5, fig.width=6}
par(cex=0.8)
plot(0.5,0.5,pch=20,col='gray',xlim=c(0,1),ylim=c(0,1),
     xlab="base prob.",ylab="resultant prob.",
     main = "Odd Ratio = 50, 10, 5, 2, 1, 1/2, 1/5, 1/10, 1/50")
abline(h=seq(0,1,0.1), v=seq(0,1,0.1), col='lightgray', lty=3 )
p = seq(0.01, 0.99, 0.01)
for(k in c(1/50, 1/10, 1/5, 1/2, 1, 2, 5, 10, 50)) {
  o = p/(1-p)
  pko = (k*o)/(1+k*o)
  lines(p, pko, lwd=2)  
}
```

- - -

### 4. 客製化報價

```{r}
expProfit =  function(price, mod, data) {
  data$PricePerLb = price
  (price - data$CostPerLb) * predict(mod, data, type="response")
}
expProfit(4.5, mod1, bd[1,])
```
+ 先用第一筆顧客來做demo…

```{r}
mean(bd$PricePerLb)
optim(3.15, expProfit, method="BFGS", control=list(fnscale=-1), 
      mod=mod1, data=bd[1,] )
```
+ optim的參數要先給一個初始值，這邊以資料price平均值為參考。
+ 因optim函數預設為找出最小值，因此在fnscale=-1使他找出最大值。
+ 從結果得知對第一個顧客的最適價格為7.09元，獲利為0.345，但除了數據給我們的資訊以外，我們還是必須參考其他數據或個人的主觀知識來做定價。
```{r}
bd[1,]
```


```{r}
price = seq(4, 10, 0.2)
prob = predict(mod1, data.frame(Time=9, Quantity=3.5, PricePerLb=price), type="response")
expReturn = prob * (price - 1.58)
plot(price, expReturn , type='l')
```

+ 迴歸模型幫助我們了解X與Y之關聯
+ optimization幫助我們由迴歸結果找出最適Y
- - -

### 5. 行為經濟模型(是否刪除?)

#### 5.1 Modeling
```{r}
bd$gain = (bd$LagPrice - bd$PricePerLb)*(bd$LagPrice > bd$PricePerLb)
bd$loss = (bd$PricePerLb - bd$LagPrice)*(bd$PricePerLb > bd$LagPrice)
mod2 = glm(Order ~ Time+Quantity+gain+loss+Quantity:gain+Quantity:loss, 
            data=bd, family=binomial)
summary(mod2)
```

#### 5.2 Model Comparison
```{r fig.height=4, fig.width=4}
par(cex=0.8)
data.frame(
  model1 = predict(mod1, bd, type="response"),
  model2 = predict(mod2, bd, type="response")
  ) %>% colAUC(bd$Order, plotROC=TRUE)
```

#### 5.3 Optimization/Customization
```{r}
expProfit2 =  function(price, mod, data) {
  data$gain = (price - data$PricePerLb)*(data$LagPrice > price)
  data$loss = (price - data$LagPrice)*(price > data$LagPrice)
  (price - data$CostPerLb) * predict(mod, data, type="response")
}
expProfit2(4.5, mod2, bd[1,])
```


```{r}
optim(4, expProfit2, method="BFGS", control=list(fnscale=-1), 
      mod=mod2, data=bd[1,] )
```

- - -



<br><br><br><br>

<style>
.caption {
  color: #777;
  margin-top: 10px;
}
p code {
  white-space: inherit;
}
pre {
  word-break: normal;
  word-wrap: normal;
  line-height: 1;
}
pre code {
  white-space: inherit;
}

p,li {
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

.r{
  line-height: 1.2;
}

.qiz {
  line-height: 1.75;
  background: #f0f0f0;
  border-left: 12px solid #ccffcc;
  padding: 4px;
  padding-left: 10px;
  color: #009900;
}

title{
  color: #cc0000;
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

body{
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

h1,h2,h3,h4,h5{
  color: #0066ff;
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}


h3{
  color: #008800;
  background: #e6ffe6;
  line-height: 2;
  font-weight: bold;
}

h5{
  color: #006000;
  background: #f8f8f8;
  line-height: 1.5;
  font-weight: bold;
}
</style>
