ln(odds) = -6.3896 + 0.0266 * conserv - 0.0208 * age + 1.0790 * knowledge
logodds <- -6.3896 + 0.0266*10 + 1.0790*5
logodds
## [1] -0.7286
x <- exp(logodds)
probability <- x/(1+x)
probability
## [1] 0.325502
the probability of voting=1 for 69-year old person with a conservativeness level of 10 and a knowledge score of 5 is 32.55%
Conservativeness level When a person’s convservativeness level increases by 1 unit, the log odds of voting 1 increases by .0266
age age doesn’t have any significant effect on the probability of voting 1
knowledge when a person’s knowledgelevel increases by 1 unit, the log odds of voting 1 increases by 1.0790
odds_ratio <- exp(0.0266)
odds_ratio
## [1] 1.026957
lower_CI <- 0.0266-1.96*0.00894
upper_CI <- 0.0266+1.96*0.00894
exp(lower_CI)
## [1] 1.009119
exp(upper_CI)
## [1] 1.04511
The odds ratio for conservativeness is 1.026957. confidence interval is [1.009119, 1.04511]
pU <- 145/(145+47)
odds <- pU/(1-pU)
odds
The odds that Ulsan experience retaurant closure are 3.0851.
pB <- 89/(89+15)
odds <- pB/(1-pB)
odds
## [1] 5.933333
The odds that Bsan experience retaurant closure are 5.9333.
The odds ratio is a measure of association between an exposure and an outcome. The odds ratio means the odds that an outcome with a particular exposure, compared to the odds of the coutcome occurring without that exposure.
pU <- 145/(145+47)
pB <- 89/(89+15)
odds_ratio <- pU/pB
odds_ratio
## [1] 0.8824906
The odds ratio of that Ulsan would experience closure compared to Busan is 0.8825. Which means that Ulsan is less likely to experience restaurant closure by 88.25% compared to Busan.
hb <- read.csv("hbat_200.csv")
perception <- hb$X23
customer_type <- as.factor(hb$X1)
distribution_system <- as.factor(hb$X5)
e_commerce <- hb$X7
new_products <- hb$X15
hb_model <- glm(perception ~ customer_type + distribution_system + e_commerce + new_products)
summary(hb_model)
##
## Call:
## glm(formula = perception ~ customer_type + distribution_system +
## e_commerce + new_products)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.94674 -0.32568 -0.00016 0.30571 0.78254
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.599364 0.176627 -3.393 0.000837 ***
## customer_type2 0.387044 0.072824 5.315 2.92e-07 ***
## customer_type3 0.558696 0.072077 7.751 4.99e-13 ***
## distribution_system1 0.137172 0.062248 2.204 0.028727 *
## e_commerce 0.160231 0.038293 4.184 4.33e-05 ***
## new_products 0.009439 0.019234 0.491 0.624147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.156117)
##
## Null deviance: 49.020 on 199 degrees of freedom
## Residual deviance: 30.287 on 194 degrees of freedom
## AIC: 204.05
##
## Number of Fisher Scoring iterations: 2
exp(coef(hb_model))
## (Intercept) customer_type2 customer_type3
## 0.549161 1.472622 1.748392
## distribution_system1 e_commerce new_products
## 1.147025 1.173782 1.009484
customer type Customers who have been buying products from HBAT for between 1 andd 5 years are 1.47times more likely to buy products from HBAT in the future than those who have been buying products from HBAT for less than 1 year.(p < .01) Customers who have been buying products from HBAT for more than 5 years are 1.75times more likely to buy products from HBAT in the future than those who have been buying products from HBAT for less than 1 year.(p < .01)
Distribution system When papaer products are sold to customers directly the likelihood of future purchase of products from HBAT is 1.14times higher than when paper products are sold to customers indirectly through a broker. (p = .02)
E-commerce activities and Web site. When the overall image of HBAT’s Website, especially user-friendliness increases by one-unit, the likelihood of future purchase of products from HBAT increases by 1.17times. (p < .01)
New products New products doesn’t have any significant effect on future purchase of products from HBAT. (p = .62)
perception<- as.numeric(hb$X23)
diff <- aov(perception~customer_type)
summary(diff)
## Df Sum Sq Mean Sq F value Pr(>F)
## customer_type 2 13.94 6.968 39.12 4.92e-15 ***
## Residuals 197 35.08 0.178
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Considering the result of ANOVA, yes, the three categorical levels are significantly different. (F = 39.12, p < .01)
pairwise.t.test(perception, customer_type, "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: perception and customer_type
##
## 1 2
## 2 8.4e-09 -
## 3 1.1e-14 0.092
##
## P value adjustment method: bonferroni
I examined if each of the three categorical variables are significantly different by using Bonferroni test. As we can see from the result, customer group 1 and 2 are significantly different from each other(p < .01), likewise customer grou 1 and 3 are significantly different from each other(p <.01). However, group 2 and 3 are not significantly different from each other(p = .09)
Pseudo R squared is a way to evaluate the goodness-of-fit of logistic regression model. There are several pseudo R squared developed. It is called ‘pseudo’ R squared because they are look like R squared in the sense that they are on a similar scale, ranging from 0 to 1 with higher values indicating better model fit, but they cannot be interpreted as one would interpret an OLS R squared.
LLF <- hb_model$deviance
LL0 <- hb_model$null.deviance
Pseudo_R2 <- as.vector(1 - (LLF/LL0))
Pseudo_R2
## [1] 0.3821562
The pseudo R squared, in logistic regression, is defined as 1 - LLF/LL0, where LL0 represents the log likelihood for the “constant-only” model and LLF is the log likelihood for the full model with constant and predictors.(McFadden) So, the pseudo R squared calculated with McFadden method is .3821.
logits <- array(perception)
yhat <- predict(hb_model, newdata=hb, type="response")
for(i in 1:200) logits[i] <- log((yhat[i])/(1-yhat[i]))
plot(logits,perception, col="red")
points(logits, yhat, col="blue")
mean((perception-yhat)^2)
## [1] 0.1514335
Mean Squared Error of fitted value and observed value is .1514.
product_quality <- hb$X6
hb_model2 <- glm(perception ~ customer_type + distribution_system + product_quality + e_commerce + new_products)
summary(hb_model2)
##
## Call:
## glm(formula = perception ~ customer_type + distribution_system +
## product_quality + e_commerce + new_products)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.96753 -0.32068 0.01801 0.34194 0.72566
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.074600 0.301656 -3.562 0.000463 ***
## customer_type2 0.398525 0.072556 5.493 1.24e-07 ***
## customer_type3 0.451031 0.090632 4.977 1.43e-06 ***
## distribution_system1 0.076204 0.069369 1.099 0.273338
## product_quality 0.060376 0.031180 1.936 0.054285 .
## e_commerce 0.175944 0.038881 4.525 1.05e-05 ***
## new_products 0.009525 0.019099 0.499 0.618532
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1539354)
##
## Null deviance: 49.02 on 199 degrees of freedom
## Residual deviance: 29.71 on 193 degrees of freedom
## AIC: 202.21
##
## Number of Fisher Scoring iterations: 2
The AIC of the frist model(which excludes X6) is 204.05, while the AIC of the second moedl(which includes X6) is 202.21. The difference between these two model is less than 2. For this reason, we need to choose the model which is simpler. The first model has 4 significant variables(customer type2, customer type3, distribution system1, e-commerce), while the second model has 3 significant variables(customer type2, customer type3, e-commerce). consequently, the simpler model which excludes X6 is chosen.
customer_type <- as.factor(hb$X1)
firm_size <- as.factor(hb$X3)
distribution_system <- as.factor(hb$X5)
technical_support <- hb$X8
price_flexibility <- hb$X17
delivery_speed <- hb$X18
library(nnet)
multi_hb <- multinom(customer_type ~ firm_size + distribution_system + technical_support + price_flexibility + delivery_speed)
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 125.281776
## iter 20 value 116.891137
## final value 116.891131
## converged
summary(multi_hb)
## Call:
## multinom(formula = customer_type ~ firm_size + distribution_system +
## technical_support + price_flexibility + delivery_speed)
##
## Coefficients:
## (Intercept) firm_size1 distribution_system1 technical_support
## 2 -12.77404 -0.003256774 1.462977 0.09371343
## 3 -12.39581 0.218619653 -1.219667 0.14772130
## price_flexibility delivery_speed
## 2 0.09139939 3.051464
## 3 -2.28367864 5.873471
##
## Std. Errors:
## (Intercept) firm_size1 distribution_system1 technical_support
## 2 2.101651 0.5650909 0.5911559 0.1509950
## 3 2.268935 0.6105969 0.6902757 0.1697809
## price_flexibility delivery_speed
## 2 0.3548687 0.7316407
## 3 0.4475277 0.8762036
##
## Residual Deviance: 233.7823
## AIC: 257.7823
calculating p-value
library(AER)
coeftest(multi_hb)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## 2:(Intercept) -12.7740370 2.1016508 -6.0781 1.216e-09 ***
## 2:firm_size1 -0.0032568 0.5650909 -0.0058 0.99540
## 2:distribution_system1 1.4629770 0.5911559 2.4748 0.01333 *
## 2:technical_support 0.0937134 0.1509950 0.6206 0.53484
## 2:price_flexibility 0.0913994 0.3548687 0.2576 0.79675
## 2:delivery_speed 3.0514641 0.7316407 4.1707 3.036e-05 ***
## 3:(Intercept) -12.3958071 2.2689354 -5.4633 4.674e-08 ***
## 3:firm_size1 0.2186197 0.6105969 0.3580 0.72031
## 3:distribution_system1 -1.2196671 0.6902757 -1.7669 0.07724 .
## 3:technical_support 0.1477213 0.1697809 0.8701 0.38426
## 3:price_flexibility -2.2836786 0.4475277 -5.1029 3.345e-07 ***
## 3:delivery_speed 5.8734710 0.8762036 6.7033 2.037e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef(multi_hb))
## (Intercept) firm_size1 distribution_system1 technical_support
## 2 2.833389e-06 0.9967485 4.3187974 1.098245
## 3 4.135894e-06 1.2443579 0.2953285 1.159190
## price_flexibility delivery_speed
## 2 1.0957065 21.14628
## 3 0.1019086 355.48070
Firm size Firm size doesn’t have any significant effect on promoting customers to buy products from HBAT for between 1 and 5 years rather than less than 1 year.(p = .99) Neither has firm size any significant effect on promoting customers to buy products from HBAT for more than 5 years rather than less than 1 year.(p = .72)
Distribution system The likelihood of being customers who buy HBAT’s products for between 1 and 5 year is 4.32 times larger than that of customers who buy HBAT’s products for less than 1 year, if moving from the case of selling paper products indirectly through brokers to the case of selling paper products directly.(p = .01) However, the distribution system has no significant effect on the likelihood of being customers who buy HBAT’s products for more than 5 years. (p = .08)
Technical support Technical support doesn’t have any significant effect on promoting customers to buy products from HBAT for between 1 and 5 years rather than less than 1 year.(p = .53) Neither has firm size any significant effect on promoting customers to buy products from HBAT for more than 5 years rather than less than 1 year.(p = .38)
Price flexibility Price flexibility doesn’t have any significant effect on promoting customers to buy products from HBAT for between 1 and 5 years rather than less than 1 year.(p = .79). However, one unit increase in price flexibility is associated with 89.80% decrease in the likelihood of being customers who buy HBAT products for more than 5 years compared to being those who buy HBAT products for less than 1 year. (p < .01)
Delivery speed One unit increase in delivery speed means 21.14 times increase in the likelihood of being customers who buy products from HBAT for between 1 and 5 years, compared to being those who buy products from HBAT for less than 1 year.(p < .01) While one unit increase in delivery speed means 355.48 times increase in the likelihood of beting customers who buy products from HBAT for more than 5 years, compared to being those who buy products from HBAT for less than 1 year.(p < .01)
require(rms)
modfp <- lrm(customer_type ~ firm_size + distribution_system + technical_support + price_flexibility + delivery_speed)
modfp
##
## Logistic Regression Model
##
## lrm(formula = customer_type ~ firm_size + distribution_system +
## technical_support + price_flexibility + delivery_speed)
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 200 LR chi2 140.46 R2 0.568 C 0.862
## 1 68 d.f. 5 g 2.450 Dxy 0.723
## 2 64 Pr(> chi2) <0.0001 gr 11.584 gamma 0.725
## 3 68 gp 0.343 tau-a 0.484
## max |deriv| 2e-08 Brier 0.122
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=2 -6.1336 1.0327 -5.94 <0.0001
## y>=3 -8.5305 1.1448 -7.45 <0.0001
## firm_size=1 -0.1396 0.3416 -0.41 0.6829
## distribution_system=1 -0.5343 0.3621 -1.48 0.1401
## technical_support 0.0782 0.0969 0.81 0.4196
## price_flexibility -1.3305 0.1989 -6.69 <0.0001
## delivery_speed 3.4452 0.3892 8.85 <0.0001
The model is significant considering the likelihood ratio. (chi-square = 140.46, df = 5, p < .01)
require(VGAM)
vglmfit <- vglm(customer_type ~ firm_size + distribution_system + technical_support + price_flexibility + delivery_speed, family=multinomial, data=hb)
vglm0 <- vglm(customer_type ~ 1, family=multinomial(refLevel=1), data=hb)
LLf <- VGAM::logLik(vglmfit)
LL0 <- VGAM::logLik(vglm0)
as.vector(1 - (LLf / LL0))
## [1] 0.4678104
Pseudo R squared is a way to evaluate the goodness-of-fit of logistic regression model. There are several pseudo R squared developed. It is called ‘pseudo’ R squared because they are look like R squared in the sense that they are on a similar scale, ranging from 0 to 1 with higher values indicating better model fit, but they cannot be interpreted as one would interpret an OLS R squared. The pseudo R squared, in logistic regression, is defined as 1 - LLf/LL0, where LL0 represents the log likelihood for the “constant-only” model and LLf is the log likelihood for the full model with constant and predictors.(McFadden) So, the pseudo R squared calculated with McFadden method is .4678.