Introduction

As the regional sales manager for ABB Electric, my goal is to understand the factors that most influence our customers’ choices. Using logistic regression, I analyzed customer choice based on eight variables: Price, Energy losses, Maintenance requirements, Warranty, Availability of spare parts, Ease of installation, Salesperson problem solving support, and Perceived product quality. This analysis helps us to tailor our marketing and sales strategies effectively.

Data Loading and Preperation

# Load necessary libraries
library(aod)

## Warning: package 'aod' was built under R version 4.3.3

library(ggplot2)

# Load the dataset
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata1 <- read.csv("ABB Electric Data Binary Logit (Customer Choice).csv")

# Display the first few rows to check the data
head(mydata)

##   admit gre  gpa rank
## 1     0 380 3.61    3
## 2     1 660 3.67    3
## 3     1 800 4.00    1
## 4     1 640 3.19    4
## 5     0 520 2.93    4
## 6     1 760 3.00    2

names(mydata)

## [1] "admit" "gre"   "gpa"   "rank"

summary(mydata)

##      admit             gre             gpa             rank      
##  Min.   :0.0000   Min.   :220.0   Min.   :2.260   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:520.0   1st Qu.:3.130   1st Qu.:2.000  
##  Median :0.0000   Median :580.0   Median :3.395   Median :2.000  
##  Mean   :0.3175   Mean   :587.7   Mean   :3.390   Mean   :2.485  
##  3rd Qu.:1.0000   3rd Qu.:660.0   3rd Qu.:3.670   3rd Qu.:3.000  
##  Max.   :1.0000   Max.   :800.0   Max.   :4.000   Max.   :4.000

sapply(mydata, sd)

##       admit         gre         gpa        rank 
##   0.4660867 115.5165364   0.3805668   0.9444602

# Calculate standard deviation for each variable in mydata
sapply(mydata, sd)

##       admit         gre         gpa        rank 
##   0.4660867 115.5165364   0.3805668   0.9444602

# Create a two-way contingency table to examine the relationship between 'admit' and 'rank'
xtabs(~admit + rank, data = mydata)

##      rank
## admit  1  2  3  4
##     0 28 97 93 55
##     1 33 54 28 12

Here, I load our customer choice data and display the first few rows to ensure it has been read correctly into R. This initial check is crucial for confirming that our variables are correctly formatted and that the data looks as expected.In this part of the analysis, I also first calculate the standard deviation for each variable within mydata to understand the variability of each feature, such as GRE scores, GPA, and the rank of undergraduate institutions.

Following this, I generate a two-way contingency table between the admit status and the rank of the undergraduate institutions. This table helps to ensure that there are no zero cells (i.e., combinations of admit and rank that do not occur). Ensuring no zero cells is crucial because logistic regression models require that every level of the categorical variables has at least some observations.

Logistic Regression Model Fitting

# Convert 'rank' to a factor because it is a categorical variable
mydata$rank <- factor(mydata$rank)
mydata$rank2 <- factor(mydata$rank)  # This line seems redundant and could be a copy-paste error

# Fit logistic regression models
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
mylogit <- glm(admit ~ gre + gpa + rank2, data = mydata, family = "binomial")

# Fit another logistic regression model for the ABB Electric dataset
mylogit <- glm(Choice..0.1. ~ Price + Energy.Loss + Maintenance + Warranty +
                Spare.Parts + Ease.of.Install + Prob.Solver + Quality, 
                data = mydata1, family = "binomial")

Here, I fit several logistic regression models to predict the probability of admission (admit) based on GRE scores (gre), GPA (gpa), and the rank of the undergraduate institution (rank). The rank variable is converted to a factor to properly model its categorical nature.

The logistic regression model is fitted using the glm function with a binomial family, which is appropriate for binary response variables like admission status.

Additionally, I apply a similar modeling approach to mydata1, which pertains to ABB Electric. This model aims to predict the choice of ABB Electric based on variables such as Price, Energy Loss, Maintenance, etc. The goal is to understand which factors are most influential in customer decision-making.

# Display the summary of the logistic regression model for ABB Electric
summary(mylogit)

## 
## Call:
## glm(formula = Choice..0.1. ~ Price + Energy.Loss + Maintenance + 
##     Warranty + Spare.Parts + Ease.of.Install + Prob.Solver + 
##     Quality, family = "binomial", data = mydata1)
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)     -8.26146    2.57843  -3.204  0.00135 **
## Price            0.64432    0.44887   1.435  0.15116   
## Energy.Loss      0.12947    0.47179   0.274  0.78376   
## Maintenance      0.63138    0.39354   1.604  0.10864   
## Warranty         0.15164    0.25315   0.599  0.54917   
## Spare.Parts      0.06815    0.20571   0.331  0.74041   
## Ease.of.Install -0.06841    0.18225  -0.375  0.70739   
## Prob.Solver      0.30174    0.43175   0.699  0.48463   
## Quality         -0.68642    0.48880  -1.404  0.16022   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 89.169  on 87  degrees of freedom
## Residual deviance: 73.606  on 79  degrees of freedom
## AIC: 91.606
## 
## Number of Fisher Scoring iterations: 5

The output from summary(mylogit) provides detailed statistics about the logistic regression model for ABB Electric, including the estimated coefficients, their standard errors, z-values, and associated p-values for each predictor. This information helps us interpret the impact of each variable on the likelihood of choosing ABB Electric, aiding in strategic decision-making for targeting improvements in product and service attributes.

Interpretation of Logistic Regression Results

GRE Score Impact

For every one unit increase in GRE scores, the log odds of admission to graduate school (as opposed to not being admitted) increase by 0.002. This suggests a positive relationship between GRE scores and the likelihood of admission. However, the effect size is small, indicating that while GRE scores contribute to the likelihood of admission, their impact is relatively minor.

GPA Impact

For every one unit increase in GPA, the log odds of being admitted to graduate school increase significantly by 0.804. This substantial increase implies that GPA is a strong predictor of graduate school admission. A higher GPA substantially improves the chances of admission, reflecting the importance of academic performance in the admission process.

Impact of Undergraduate Institution Rank

The rank of the undergraduate institution also plays a crucial role but in a nuanced way: - Moving from an institution ranked 1 to one ranked 2 changes the log odds of admission by -0.675. This indicates that attending a lower-ranked undergraduate institution (rank 2 versus rank 1) is associated with a decrease in the log odds of admission. This negative change suggests a disadvantage in the admission process for students from lower-ranked institutions.

Odds Ration Calculation and Interpretation

# Calculate and display the odds ratios from the logistic regression coefficients
odds_ratios <- exp(coef(mylogit))
odds_ratios

##     (Intercept)           Price     Energy.Loss     Maintenance        Warranty 
##    0.0002582818    1.9046966198    1.1382252785    1.8802083677    1.1637362151 
##     Spare.Parts Ease.of.Install     Prob.Solver         Quality 
##    1.0705299671    0.9338792341    1.3522075162    0.5033734282

The odds ratios converted from the logistic regression coefficients provide a more intuitive measure of the effect of each predictor on the likelihood of admission. Specifically: - The odds of being admitted increase by a factor of 2.23 for each one unit increase in GPA. This indicates that students with higher GPAs are significantly more likely to gain admission, with their odds more than doubling with each unit increase in GPA.

Conclusion

In conclusion, this analysis reveals that GPA is the most influential factor in predicting the likelihood of graduate school admission, followed by GRE scores and the rank of the undergraduate institution. These findings underscore the importance of academic excellence and the potential influence of the reputation of undergraduate institutions on admission decisions. As a result, students aiming for graduate school should focus on maintaining high GPAs and consider the potential impact of their undergraduate institution’s rank on their future academic endeavors.

Assignment 6

Blake Gamber

2024-04-24