library(readxl)
library(DT)
library(pander)
CountedData <- read_excel("C:/Users/blake/Desktop/Math 325 Notebook/Math 325 Notebook/Data/CountedData.xlsx")
CountedData <- CountedData[complete.cases(CountedData),]
WorkableData <- read_excel("C:/Users/blake/Desktop/Math 325 Notebook/Math 325 Notebook/Data/WorkableData.xlsx")
JM Photography is a photography studio located in Rigby, Idaho. They have a range of different portrait types, sizes, and collections that can be purchased for you and your family. Their analyst noticed that 8 \(\times\) 10 sized photos are the most purchased item within this last sales period. (See Table 1 below). He wants to know what the probability is of a customer including this item in their order based on their order size (See Table 2 below). He suspects that as a customer’s order size goes up a customer is more likely to buy a 8 \(\times\) 10 sized photo.
To answer this question this study will perform a logistical analysis. The mathematical model for this logistical regression will be as follows:
\(P(Y_i = 1|\, x_i) = \frac{e^{\beta_0 + \beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} = \pi_i\)
Where \(x_i\) is the number of items a customer included in their order and \(P(Y_i = 1|\, x_i)\) is the probability of customer including an 8 \(\times\) 10 sized photo. It should be noted that when \(\beta_1\) is equal to zero we believe that order size does not give insight into a person buying a 8 \(\times\) 10 sized photo.
Formally, the null and alternative hypothesis are as follows:
\(H_0:\beta_1 = 0\) \(H_a: \beta_1 \neq 0\)
For this analysis \(\alpha\) will be set at 0.1.
datatable(CountedData, options=list(lengthMenu = c(5, 21)))
datatable(WorkableData, options=list(lengthMenu = c(5, 18)), caption = "Note: 1 signifies an item was purchased at least once, while 0 means there was no purchase")
Formally, the null and alternative hypothesis will be as follows:
\(H_0: \beta_1 = 0\) \(H_a: \beta_1 \neq 0\)
In this analysis we \(\alpha\) will be set at 0.1.
Listed below is information that will help us make the best fitting logistical regression line. This information will also help us determine if we should reject or fail to reject the null hypothesis.
| Estimate | Std. Error | z value | P-value | |
|---|---|---|---|---|
| Intercept | -8.319 | 4.483 | -1.856 | 0.06351 |
| Order Size | 1.777 | 0.9464 | 1.877 | 0.06049 |
Since our p-value (0.06049) is less than our \(\alpha\) (0.1) we will reject the null hypothesis. In other words, we believe there is sufficient evidence to believe that order size is a factor that determines how likely someone purchases an 8 \(\times\) 10 sized photo.
From this data we can observe that \(\beta_1 = 1.777\) while \(\beta_0 = -8.219\). That means that our logistical equation will be as follows:
\(P(Y_i = 1|x_i) \approx \frac{e^{-8.319+1.777 x_i}}{1+e^{-8.319+ 1.777 x_i}} = \hat{\pi}_i\)
As shown below, this is a visualization of our mathematical model listed above:
It is important that we determine if this logistic regression is a good fit for our data. This analysis will conduct a Goodness of Fit Test with a Non-Central Chi Square Distribution to determine this. We are using this test instead of a Hosmer-Lemeshow Goodness of Fit Test because there were a significant number of ties in the data. The residual deviance for this data was 10.04 and had 16 residual degrees of freedom. Based on the previous two mentioned statistics, our goodness of fit test will give us a \(\chi^2\) value at approximately 0.86. Since that value is greater than 0.05 we believe that there is sufficient evidence to conclude this type of data will make a good fit for logistic regression.
There are a couple points worth discussing. If you observe the visual model you will observe that after an order size of 4 the probability of a customer buying an 8 \(\times\) 10 sized photo increases dramatically. Another point worth noting is that for every time one of your customers adds another item to their order the chances of them buying an 8 \(\times\) 10 sized photo increases by approximately 5.9 times. In my personal opinion that is a pretty significant number! Just for some food for thought, you might consider performing this same analysis but with the Wallet Brag Book. That was your second most purchased item within this last sales period and it might be worth exploring this same question but with this item.