Most of the following text is from the Wikipedia page on Logistic Regression, (https://en.wikipedia.org/wiki/Logistic_regression). The R code in this document was authored by John Akwei, Data Scientist, in order to demonstrate R programming for the Logistic Regression results, and graphs.

Logistic Regression

Example: Probability of passing an exam versus hours of study

A group of 20 students spend between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability that the student will pass the exam?

Hours <- c(0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25,
           2.50, 2.75, 3.00, 3.25, 3.50,    4.00,   4.25,   4.50,   4.75,
           5.00, 5.50)
Pass    <- c(0, 0, 0, 0, 0, 0, 1,   0, 1, 0, 1, 0, 1, 0, 1, 1, 1,   1, 1, 1)

HrsStudying <- data.frame(Hours, Pass)

The table shows the number of hours each student spent studying, and whether they passed (1) or failed (0).

HrsStudying_Table <- t(HrsStudying); HrsStudying_Table
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## Hours  0.5 0.75    1 1.25  1.5 1.75 1.75    2 2.25   2.5  2.75     3  3.25
## Pass   0.0 0.00    0 0.00  0.0 0.00 1.00    0 1.00   0.0  1.00     0  1.00
##       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## Hours   3.5     4  4.25   4.5  4.75     5   5.5
## Pass    0.0     1  1.00   1.0  1.00     1   1.0

The graph shows the probability of passing the exam versus the number of hours studying, with the logistic regression curve fitted to the data.

library(ggplot2)
ggplot(HrsStudying, aes(Hours, Pass)) +
  geom_point(aes()) +
  geom_smooth(method='glm', family="binomial", se=FALSE) +
  labs (x="Hours Studying", y="Probability of Passing Exam",
        title="Probability of Passing Exam vs Hours Studying")

Graph of a logistic regression curve showing probability of passing an exam versus hours studying

The logistic regression analysis gives the following output.

model <- glm(Pass ~.,family=binomial(link='logit'),data=HrsStudying)
model$coefficients
## (Intercept)       Hours 
##   -4.077713    1.504645

Coefficient Std.Error z-value P-value (Wald)
Intercept -4.0777 1.7610 -2.316 0.0206
Hours 1.5046 0.6287 2.393 0.0167

The output indicates that hours studying is significantly associated with the probability of passing the exam (p=0.0167, Wald test). The output also provides the coefficients for Intercept = -4.0777 and Hours = 1.5046. These coefficients are entered in the logistic regression equation to estimate the probability of passing the exam:

Probability of passing exam =1/(1+exp(-(-4.0777+1.5046* Hours)))

For example, for a student who studies 2 hours, entering the value Hours =2 in the equation gives the estimated probability of passing the exam of p=0.26:

StudentHours <- 2
ProbabilityOfPassingExam <- 1/(1+exp(-(-4.0777+1.5046*StudentHours)))
ProbabilityOfPassingExam
## [1] 0.2556884

Probability of passing exam =1/(1+exp(-(-4.0777+1.5046*2))) = 0.26.

Similarly, for a student who studies 4 hours, the estimated probability of passing the exam is p=0.87:
Probability of passing exam =1/(1+exp(-(-4.0777+1.5046*4))) = 0.87.

StudentHours <- 4
ProbabilityOfPassingExam <- 1/(1+exp(-(-4.0777+1.5046*StudentHours)))
ProbabilityOfPassingExam
## [1] 0.874429

This table shows the probability of passing the exam for several values of hours studying.

ExamPassTable <- data.frame(column1=c(1, 2, 3, 4, 5),
                            column2=c(1/(1+exp(-(-4.0777+1.5046*1))),
                                      1/(1+exp(-(-4.0777+1.5046*2))),
                                      1/(1+exp(-(-4.0777+1.5046*3))),
                                      1/(1+exp(-(-4.0777+1.5046*4))),
                                      1/(1+exp(-(-4.0777+1.5046*5)))))
names(ExamPassTable) <- c("Hours of study", "Probability of passing exam")
ExamPassTable
##   Hours of study Probability of passing exam
## 1              1                  0.07088985
## 2              2                  0.25568845
## 3              3                  0.60732935
## 4              4                  0.87442903
## 5              5                  0.96909067