your_name <- readline(prompt = "What is your name?      ")
## What is your name?
# Go to the Console pane and type your name AND hit Enter Key

your_name <- "Blake Gamber"

print(your_name)
## [1] "Blake Gamber"
# (2) Date Function
#     Run line 32 below (click Run button or Ctrl + Enter)
date()
## [1] "Wed May  8 12:57:13 2024"
# (3) IP Address for Window (may not work for Mac)
#     Getting and Printing IP address of your computer
#     This IP address will be unique.
#     If there are errors realted to the below commands, 
#     just include all the error messages in the output.
#     Run the followings

x <- system("ipconfig", intern=TRUE)
x[grep("IPv4", x)]
## [1] "   IPv4 Address. . . . . . . . . . . : 192.168.86.194"
z <- x[grep("IPv4", x)]
gsub(".*? ([[:digit:]])", "\\1", z)
## [1] "192.168.86.194"

Introduction

In this analysis, I am using logistic regression to examine how various attributes influence the decision to choose Office Star. The attributes considered are ‘Expensive’, ‘Convenient’, ‘Service’, and ‘Largechoice’.

Load and Prepare the Data

First, I load the data from a CSV file and examine its structure to ensure all variables are correctly formatted for analysis.

# Load the data
data <- read.csv("OfficeChoice2.csv")

# Check the structure of the data
str(data)
## 'data.frame':    30 obs. of  6 variables:
##  $ Choice       : int  0 1 1 0 0 1 0 1 1 1 ...
##  $ Pastpurchases: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Expensive    : int  2 3 3 1 5 7 5 1 4 3 ...
##  $ Convenient   : int  1 2 3 7 2 6 2 7 4 2 ...
##  $ Service      : int  3 5 3 7 2 4 7 3 4 3 ...
##  $ Largechoice  : int  4 6 6 4 4 3 4 6 4 2 ...
summary(data)
##      Choice    Pastpurchases   Expensive       Convenient      Service     
##  Min.   :0.0   Min.   :0     Min.   :1.000   Min.   :1.00   Min.   :1.000  
##  1st Qu.:0.0   1st Qu.:0     1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000  
##  Median :0.5   Median :0     Median :3.000   Median :4.00   Median :3.000  
##  Mean   :0.5   Mean   :0     Mean   :3.467   Mean   :4.00   Mean   :3.533  
##  3rd Qu.:1.0   3rd Qu.:0     3rd Qu.:5.000   3rd Qu.:5.75   3rd Qu.:4.750  
##  Max.   :1.0   Max.   :0     Max.   :7.000   Max.   :7.00   Max.   :7.000  
##   Largechoice 
##  Min.   :1.0  
##  1st Qu.:4.0  
##  Median :4.0  
##  Mean   :4.5  
##  3rd Qu.:6.0  
##  Max.   :7.0
# View the first few rows of the data
head(data)
##   Choice Pastpurchases Expensive Convenient Service Largechoice
## 1      0             0         2          1       3           4
## 2      1             0         3          2       5           6
## 3      1             0         3          3       3           6
## 4      0             0         1          7       7           4
## 5      0             0         5          2       2           4
## 6      1             0         7          6       4           3

Logistic Regression Model

Here, I fit the logistic regression model using the attributes mentioned. I then display the summary to view the coefficients and statistical significance of each predictor.

# Fit the logistic regression model
model <- glm(Choice ~ Expensive + Convenient + Service + Largechoice, data = data, family = binomial())

# View the summary of the model
summary(model)
## 
## Call:
## glm(formula = Choice ~ Expensive + Convenient + Service + Largechoice, 
##     family = binomial(), data = data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.19797    1.63231  -0.121    0.903
## Expensive   -0.28365    0.21803  -1.301    0.193
## Convenient   0.11877    0.22059   0.538    0.590
## Service      0.23079    0.21920   1.053    0.292
## Largechoice -0.02011    0.27728  -0.073    0.942
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41.589  on 29  degrees of freedom
## Residual deviance: 38.038  on 25  degrees of freedom
## AIC: 48.038
## 
## Number of Fisher Scoring iterations: 4

Interpretation of the Results

Coefficients Interpretation The coefficients from the logistic regression model describe the log-odds impact of each predictor. Here’s my interpretation:

  • Expensive: The negative coefficient suggests that as the price increases, the likelihood of choosing Office Star decreases, although this predictor is not statistically significant.
  • Convenient: A positive coefficient implies an increase in the likelihood of choosing Office Star with better convenience, though it is not statistically significant.
  • Service: This has a positive coefficient, indicating better service increases the preference for Office Star, albeit non-significantly.
  • Largechoice: The slight negative coefficient indicates a decrease in choice likelihood with more options, yet this is also not significant.

Odds Ratios

Odds ratios transform our logistic regression coefficients into a more interpretable format, showing how the odds of the outcome change with a one-unit increase in the predictor.

# Calculate odds ratios
odds_ratios <- exp(coef(model))
odds_ratios
## (Intercept)   Expensive  Convenient     Service Largechoice 
##   0.8203975   0.7530339   1.1261151   1.2595989   0.9800928

Interpretation of Odds Ratios

  • Expensive: An odds ratio less than one indicates that increased expense is associated with lower odds of choosing Office Star.
  • Convenient: An odds ratio greater than one suggests that improvements in convenience are likely to increase the odds of choosing Office Star.
  • Service: Similarly, a greater odds ratio for service implies that better service quality may increase the odds of selection.
  • Largechoice: The odds ratio close to one reflects minimal influence of a larger selection on the decision.

Most Important Variable

Based on the odds ratios and their proximity to statistical significance, while no variables are significant, ‘Service’ shows the highest odds ratio among the predictors, hinting it could be the most influential under different circumstances or with a larger dataset.

Conclusion

In conclusion, this analysis provided valuable insights into the factors influencing the choice of Office Star, though the results were not statistically significant. Increasing the sample size or adjusting model parameters could provide clearer results in future studies.