The following is based on an R tutorial at:
http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html
This is a very simple example of logistic regression, using a single feature.
The menarche data set has 25 data points, each representing a group of girls in a particular age range. The Age field is the average age for that group, the Total field is the total number of girls in the group, and the Menarche field is the number of girls who have reached menarche.
Here is a plot of the data.
library(MASS)
data(menarche)
plot(Menarche/Total ~ Age, data=menarche)
The S-shaped curve suggests that it adheres to a sigmoid function, so logistic regression is appropriate.
glm.out is a generalized linear model of the menarche data set, indicating the probability that a girl in each age group has reached menarche. The specification family=“binomial” means that the generalized linear model used is logistic regression.
The fitted attribute of the model shows the predicted values for each of the values in the original data set. The following shows the value of the explanatory variable (Age) and its fitted value.
glm.out <- glm(cbind(Menarche, Total-Menarche) ~ Age, family="binomial", data=menarche)
# Print out predictions for each group
cbind(Age=menarche$Age, Prob=glm.out$fitted)
## Age Prob
## 1 9.21 0.002033
## 2 10.21 0.010313
## 3 10.58 0.018703
## 4 10.83 0.027864
## 5 11.08 0.041321
## 6 11.33 0.060871
## 7 11.58 0.088814
## 8 11.83 0.127838
## 9 12.08 0.180610
## 10 12.33 0.248949
## 11 12.58 0.332648
## 12 12.83 0.428435
## 13 13.08 0.529902
## 14 13.33 0.628957
## 15 13.58 0.718237
## 16 13.83 0.793102
## 17 14.08 0.852170
## 18 14.33 0.896573
## 19 14.58 0.928754
## 20 14.83 0.951464
## 21 15.08 0.967191
## 22 15.33 0.977940
## 23 15.58 0.985221
## 24 15.83 0.990123
## 25 17.58 0.999427
The following plots the data again, then shows the model in red. It then predicts values, using the model, for a few arbitrary values of x (Age), and plots them as blue points.
plot(Menarche/Total ~ Age, data=menarche)
lines(menarche$Age, glm.out$fitted, type="l", col="red")
# Now predict the value for a set of ages, plot as blue points
glm.ages <- c(14.2, 15.3, 16.1)
glm.predict <- sapply(glm.ages, function(age, model){predict(model, data.frame(Age=age), type="response")}, glm.out)
# glm.predict <- predict(glm.out, data.frame(Age=c(14.2), type="response")
points(glm.ages, glm.predict, type="p", pch=21, col="blue", bg=rgb(0,0,1,0.5))