library(tidyr)
## Warning: package 'tidyr' was built under R version 4.4.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(caret)
## Warning: package 'caret' was built under R version 4.4.2
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(leaps)
## Warning: package 'leaps' was built under R version 4.4.3
setwd("C:\\Users\\srini\\OneDrive\\Desktop\\Regression Analysis\\HW 8")
product_data = read.csv("6414-hw8-s25-ProductData.csv")
product_data$Age <- as.factor(product_data$Age)
product_data$Region <- as.factor(product_data$Region)
logit_model <- glm(
Purchase / Count ~ PriorWeekPurchase + LastWeekCompPurchase + Age + Region,
family = binomial(),
weights = Count,
data = product_data
)
summary(logit_model)
##
## Call:
## glm(formula = Purchase/Count ~ PriorWeekPurchase + LastWeekCompPurchase +
## Age + Region, family = binomial(), data = product_data, weights = Count)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.32746 0.12373 -2.646 0.00813 **
## PriorWeekPurchase 0.46121 0.08509 5.421 5.94e-08 ***
## LastWeekCompPurchase -0.23903 0.08544 -2.797 0.00515 **
## Age25 to 34 0.13297 0.11715 1.135 0.25637
## Age35 to 44 -0.18594 0.12070 -1.540 0.12346
## Age45+ -0.28506 0.12182 -2.340 0.01929 *
## Regionb 0.17474 0.12030 1.453 0.14634
## Regionc 0.19731 0.11939 1.653 0.09840 .
## Regiond 0.23061 0.12101 1.906 0.05669 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 139.720 on 63 degrees of freedom
## Residual deviance: 82.611 on 55 degrees of freedom
## AIC: 353.52
##
## Number of Fisher Scoring iterations: 4
The logistic regression model shows that PriorWeekPurchase has a strong and significant positive effect on the likelihood of purchasing the new product (p < 0.001). LastWeekCompPurchase has a significant negative effect (p ≈ 0.005), suggesting competitor purchases reduce the likelihood. Age and Region effects are not all significant, though Age45+ shows a significant negative impact (p ≈ 0.013).
null_model <- glm(
Purchase / Count ~ 1,
family = binomial(),
weights = Count,
data = product_data
)
anova(null_model, logit_model, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: Purchase/Count ~ 1
## Model 2: Purchase/Count ~ PriorWeekPurchase + LastWeekCompPurchase + Age +
## Region
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 63 139.720
## 2 55 82.611 8 57.109 1.714e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Hypothesis (H0): None of the predictors (PriorWeekPurchase, LastWeekCompPurchase, Age, Region) are associated with the likelihood of purchasing the new product — i.e., the intercept-only model fits as well as the full model. Alternative Hypothesis (H1): At least one predictor is significantly associated with the likelihood of purchasing the new product.
From the likelihood ratio test output:
Deviance difference: 57.109
Degrees of freedom: 8
p-value: 1.714e-09
Since the p-value is far below 0.05, we reject the null hypothesis.
deviance_val <- deviance(logit_model)
df_resid <- df.residual(logit_model)
pval_deviance <- 1 - pchisq(deviance_val, df_resid)
c(Deviance = deviance_val, DF = df_resid, p_Deviance = pval_deviance)
## Deviance DF p_Deviance
## 82.610828697 55.000000000 0.009396075
The deviance goodness-of-fit test gives a p-value of 0.0094 with 55 degrees of freedom. At α = 0.05, we reject the null hypothesis, indicating some lack of fit. However, at α = 0.01, we fail to reject the null, suggesting the model fits reasonably well at a stricter significance level.
pearson_resid <- residuals(logit_model, type = "pearson")
pearson_chisq <- sum(pearson_resid^2)
pval_pearson <- 1 - pchisq(pearson_chisq, df_resid)
c(Pearson_ChiSq = pearson_chisq, p_Pearson = pval_pearson)
## Pearson_ChiSq p_Pearson
## 81.10777954 0.01256744
The Pearson goodness-of-fit test yields a chi-square value of 81.11 with a p-value of 0.0126. At α = 0.05, we reject the null hypothesis, suggesting some evidence of lack of fit. However, at α = 0.01, we fail to reject the null, indicating acceptable model fit under a stricter criterion.
poisson_model <- glm(
PurchaseCount ~ PriorWeekPurchase + LastWeekCompPurchase + Age + Region,
family = poisson(),
offset = log(Count),
data = product_data
)
summary(poisson_model)
##
## Call:
## glm(formula = PurchaseCount ~ PriorWeekPurchase + LastWeekCompPurchase +
## Age + Region, family = poisson(), data = product_data, offset = log(Count))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.753456 0.039968 18.852 < 2e-16 ***
## PriorWeekPurchase 0.321762 0.026777 12.016 < 2e-16 ***
## LastWeekCompPurchase -0.242606 0.026709 -9.083 < 2e-16 ***
## Age25 to 34 0.029001 0.035961 0.806 0.41998
## Age35 to 44 0.007244 0.036836 0.197 0.84410
## Age45+ -0.193149 0.039293 -4.916 8.85e-07 ***
## Regionb 0.121861 0.039135 3.114 0.00185 **
## Regionc 0.255937 0.037592 6.808 9.87e-12 ***
## Regiond 0.175835 0.038765 4.536 5.73e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 438.37 on 63 degrees of freedom
## Residual deviance: 108.55 on 55 degrees of freedom
## AIC: 529.01
##
## Number of Fisher Scoring iterations: 4
The model indicates that prior company purchases significantly increase purchase rates, while competitor purchases and being older (45+) reduce them. Region also plays a role, with regions b, c, and d showing higher purchase rates than region a.
Intercept = 0.7535
PriorWeekPurchase = 0.3218
LastWeekCompPurchase = -0.2426
Age25 to 34 = 0.0290
Age35 to 44 = 0.0246
Age45+ = -0.1931
Regionb = 0.1219
Regionc = 0.2559
Regiond = 0.1758
poisson_null <- glm(
PurchaseCount ~ 1,
family = poisson(),
offset = log(Count),
data = product_data
)
anova(poisson_null, poisson_model, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: PurchaseCount ~ 1
## Model 2: PurchaseCount ~ PriorWeekPurchase + LastWeekCompPurchase + Age +
## Region
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 63 438.37
## 2 55 108.55 8 329.83 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Hypothesis (H0): None of the predictors (PriorWeekPurchase, LastWeekCompPurchase, Age, Region) are associated with the rate of purchasing the new product.
Alternative Hypothesis (H1): At least one predictor is significantly associated with the purchase rate.
From the likelihood ratio test:
Deviance difference: 329.83
Degrees of freedom: 8
p-value: < 2.2e-16
Since the p-value is far less than 0.05, we reject the null hypothesis.
poisson_deviance <- deviance(poisson_model)
poisson_df_resid <- df.residual(poisson_model)
pval_poisson_deviance <- 1 - pchisq(poisson_deviance, poisson_df_resid)
poisson_pearson <- sum(residuals(poisson_model, type = "pearson")^2)
pval_poisson_pearson <- 1 - pchisq(poisson_pearson, poisson_df_resid)
# Print values
c(Deviance = poisson_deviance, DF = poisson_df_resid, p_Deviance = pval_poisson_deviance,
Pearson_ChiSq = poisson_pearson, p_Pearson = pval_poisson_pearson)
## Deviance DF p_Deviance Pearson_ChiSq p_Pearson
## 1.085466e+02 5.500000e+01 2.262380e-05 1.089460e+02 2.035885e-05
To evaluate the fit of the Poisson regression model, we performed two goodness-of-fit tests. The deviance test produced a deviance value of 108.55 with 55 degrees of freedom, resulting in a p-value of 0.0000226. The Pearson chi-square test yielded a Pearson statistic of 108.95 with 55 degrees of freedom, and a corresponding p-value of 0.0000204. Both tests assess how well the model fits the observed data compared to what would be expected under the model.
Since the p-values from both tests are less than 0.01, we reject the null hypothesis at both the 0.05 and 0.01 significance levels. This indicates that the Poisson model does not provide a good fit for the data.
poisson_no_region <- glm(
PurchaseCount ~ PriorWeekPurchase + LastWeekCompPurchase + Age,
family = poisson(),
offset = log(Count),
data = product_data
)
anova(poisson_no_region, poisson_model, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: PurchaseCount ~ PriorWeekPurchase + LastWeekCompPurchase + Age
## Model 2: PurchaseCount ~ PriorWeekPurchase + LastWeekCompPurchase + Age +
## Region
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 58 157.55
## 2 55 108.55 3 49.003 1.302e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To test whether the coefficients associated with Region are jointly equal to zero, we performed a likelihood ratio test by comparing the full Poisson model (including Region) with a reduced model that excludes Region. The test yielded a deviance difference of 49.003 with 3 degrees of freedom, and a p-value of 1.302e-10.
Since the p-value is far below both 0.05 and 0.01, we reject the null hypothesis. This indicates that the Region variable significantly improves the model and that at least one Region coefficient is not equal to zero.
In the Poisson regression model, the coefficient for PriorWeekPurchase is 0.32176. This represents the natural logarithm of the rate ratio. By exponentiating the coefficient, we obtain a rate ratio of approximately 1.38. This means that, holding all other variables constant, customers who made a purchase in the prior week are expected to purchase the new product at a rate that is 38% higher than those who did not make a prior-week purchase.