As the regional sales manager for ABB Electric, my goal is to understand the factors that most influence our customers’ choices. Using logistic regression, I analyzed customer choice based on eight variables: Price, Energy losses, Maintenance requirements, Warranty, Availability of spare parts, Ease of installation, Salesperson problem solving support, and Perceived product quality. This analysis helps us to tailor our marketing and sales strategies effectively.
# Load necessary libraries
library(aod)
## Warning: package 'aod' was built under R version 4.3.3
library(ggplot2)
# Load the dataset
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata1 <- read.csv("ABB Electric Data Binary Logit (Customer Choice).csv")
# Display the first few rows to check the data
head(mydata)
## admit gre gpa rank
## 1 0 380 3.61 3
## 2 1 660 3.67 3
## 3 1 800 4.00 1
## 4 1 640 3.19 4
## 5 0 520 2.93 4
## 6 1 760 3.00 2
names(mydata)
## [1] "admit" "gre" "gpa" "rank"
summary(mydata)
## admit gre gpa rank
## Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
## Median :0.0000 Median :580.0 Median :3.395 Median :2.000
## Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
## 3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
## Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
sapply(mydata, sd)
## admit gre gpa rank
## 0.4660867 115.5165364 0.3805668 0.9444602
# Calculate standard deviation for each variable in mydata
sapply(mydata, sd)
## admit gre gpa rank
## 0.4660867 115.5165364 0.3805668 0.9444602
# Create a two-way contingency table to examine the relationship between 'admit' and 'rank'
xtabs(~admit + rank, data = mydata)
## rank
## admit 1 2 3 4
## 0 28 97 93 55
## 1 33 54 28 12
Here, I load our customer choice data and display the first few rows
to ensure it has been read correctly into R. This initial check is
crucial for confirming that our variables are correctly formatted and
that the data looks as expected.In this part of the analysis, I also
first calculate the standard deviation for each variable within
mydata to understand the variability of each feature, such
as GRE scores, GPA, and the rank of undergraduate institutions.
Following this, I generate a two-way contingency table between the
admit status and the rank of the undergraduate
institutions. This table helps to ensure that there are no zero cells
(i.e., combinations of admit and rank that do
not occur). Ensuring no zero cells is crucial because logistic
regression models require that every level of the categorical variables
has at least some observations.
# Convert 'rank' to a factor because it is a categorical variable
mydata$rank <- factor(mydata$rank)
mydata$rank2 <- factor(mydata$rank) # This line seems redundant and could be a copy-paste error
# Fit logistic regression models
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
mylogit <- glm(admit ~ gre + gpa + rank2, data = mydata, family = "binomial")
# Fit another logistic regression model for the ABB Electric dataset
mylogit <- glm(Choice..0.1. ~ Price + Energy.Loss + Maintenance + Warranty +
Spare.Parts + Ease.of.Install + Prob.Solver + Quality,
data = mydata1, family = "binomial")
Here, I fit several logistic regression models to predict the
probability of admission (admit) based on GRE scores
(gre), GPA (gpa), and the rank of the
undergraduate institution (rank). The rank variable is
converted to a factor to properly model its categorical nature.
The logistic regression model is fitted using the glm
function with a binomial family, which is appropriate for binary
response variables like admission status.
Additionally, I apply a similar modeling approach to
mydata1, which pertains to ABB Electric. This model aims to
predict the choice of ABB Electric based on variables such as Price,
Energy Loss, Maintenance, etc. The goal is to understand which factors
are most influential in customer decision-making.
# Display the summary of the logistic regression model for ABB Electric
summary(mylogit)
##
## Call:
## glm(formula = Choice..0.1. ~ Price + Energy.Loss + Maintenance +
## Warranty + Spare.Parts + Ease.of.Install + Prob.Solver +
## Quality, family = "binomial", data = mydata1)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.26146 2.57843 -3.204 0.00135 **
## Price 0.64432 0.44887 1.435 0.15116
## Energy.Loss 0.12947 0.47179 0.274 0.78376
## Maintenance 0.63138 0.39354 1.604 0.10864
## Warranty 0.15164 0.25315 0.599 0.54917
## Spare.Parts 0.06815 0.20571 0.331 0.74041
## Ease.of.Install -0.06841 0.18225 -0.375 0.70739
## Prob.Solver 0.30174 0.43175 0.699 0.48463
## Quality -0.68642 0.48880 -1.404 0.16022
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 89.169 on 87 degrees of freedom
## Residual deviance: 73.606 on 79 degrees of freedom
## AIC: 91.606
##
## Number of Fisher Scoring iterations: 5
The output from summary(mylogit) provides detailed
statistics about the logistic regression model for ABB Electric,
including the estimated coefficients, their standard errors, z-values,
and associated p-values for each predictor. This information helps us
interpret the impact of each variable on the likelihood of choosing ABB
Electric, aiding in strategic decision-making for targeting improvements
in product and service attributes.
For every one unit increase in GRE scores, the log odds of admission to graduate school (as opposed to not being admitted) increase by 0.002. This suggests a positive relationship between GRE scores and the likelihood of admission. However, the effect size is small, indicating that while GRE scores contribute to the likelihood of admission, their impact is relatively minor.
For every one unit increase in GPA, the log odds of being admitted to graduate school increase significantly by 0.804. This substantial increase implies that GPA is a strong predictor of graduate school admission. A higher GPA substantially improves the chances of admission, reflecting the importance of academic performance in the admission process.
The rank of the undergraduate institution also plays a crucial role but in a nuanced way: - Moving from an institution ranked 1 to one ranked 2 changes the log odds of admission by -0.675. This indicates that attending a lower-ranked undergraduate institution (rank 2 versus rank 1) is associated with a decrease in the log odds of admission. This negative change suggests a disadvantage in the admission process for students from lower-ranked institutions.
# Calculate and display the odds ratios from the logistic regression coefficients
odds_ratios <- exp(coef(mylogit))
odds_ratios
## (Intercept) Price Energy.Loss Maintenance Warranty
## 0.0002582818 1.9046966198 1.1382252785 1.8802083677 1.1637362151
## Spare.Parts Ease.of.Install Prob.Solver Quality
## 1.0705299671 0.9338792341 1.3522075162 0.5033734282
The odds ratios converted from the logistic regression coefficients provide a more intuitive measure of the effect of each predictor on the likelihood of admission. Specifically: - The odds of being admitted increase by a factor of 2.23 for each one unit increase in GPA. This indicates that students with higher GPAs are significantly more likely to gain admission, with their odds more than doubling with each unit increase in GPA.
In conclusion, this analysis reveals that GPA is the most influential factor in predicting the likelihood of graduate school admission, followed by GRE scores and the rank of the undergraduate institution. These findings underscore the importance of academic excellence and the potential influence of the reputation of undergraduate institutions on admission decisions. As a result, students aiming for graduate school should focus on maintaining high GPAs and consider the potential impact of their undergraduate institution’s rank on their future academic endeavors.