Installing the package from Github into R Studio

To download the package from Github open your R Studio application and then navigate to a new project (File > New File > R Script). Then type in the R Studio editor the below:

##Downloads the package into R for use
#suppressWarnings(devtools::install_github("StatsGary/OddsPlotty", 
                         #dependencies = TRUE,
                         #force = TRUE))

This package has been created to create odds plot for the results of a logistic regression.

The package uses caret https://github.com/topepo to train the model and the final model parameter is used to generate the application.

Loading OddsPlotty

To use the odds_plot function you can invoke it by using:

library(OddsPlotty)

Training a GLM to use with odds plot

First we load the required packages. The example dataset we are going to use to work with OddsPlotty is the breast cancer data:

#install.packages("mlbench")
#install.packages("caret")
library(mlbench)
## Warning: package 'mlbench' was built under R version 3.6.1
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.6.1

Then we are going to load the data we need for the breast cancer data:

data("BreastCancer", package = "mlbench")
#Use complete cases of breast cancer
breast <- BreastCancer[complete.cases(BreastCancer), ] #Create a copy
breast <- breast[, -1]
head(breast, 10)
##    Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
## 1             5         1          1             1            2           1
## 2             5         4          4             5            7          10
## 3             3         1          1             1            2           2
## 4             6         8          8             1            3           4
## 5             4         1          1             3            2           1
## 6             8        10         10             8            7          10
## 7             1         1          1             1            2          10
## 8             2         1          2             1            2           1
## 9             2         1          1             1            2           1
## 10            4         2          1             1            2           1
##    Bl.cromatin Normal.nucleoli Mitoses     Class
## 1            3               1       1    benign
## 2            3               2       1    benign
## 3            3               1       1    benign
## 4            3               7       1    benign
## 5            3               1       1    benign
## 6            9               7       1 malignant
## 7            3               1       1    benign
## 8            3               1       1    benign
## 9            1               1       5    benign
## 10           2               1       1    benign
#Convert the class to a factor - Beningn (0) and Malignant (1)
breast$Class <- factor(breast$Class)
str(breast)
## 'data.frame':    683 obs. of  10 variables:
##  $ Cl.thickness   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
##  $ Cell.size      : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
##  $ Cell.shape     : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
##  $ Marg.adhesion  : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
##  $ Epith.c.size   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
##  $ Bare.nuclei    : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
##  $ Bl.cromatin    : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
##  $ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
##  $ Mitoses        : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
##  $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...

This takes care of the class encoding but now we need to code the factors to numeric

for(i in 1:9) {
  breast[, i] <- as.numeric(as.character(breast[, i]))
}
#Loops through the first columns - 1 to 9 and changes them from factors to a numerical representation
str(breast)
## 'data.frame':    683 obs. of  10 variables:
##  $ Cl.thickness   : num  5 5 3 6 4 8 1 2 2 4 ...
##  $ Cell.size      : num  1 4 1 8 1 10 1 1 1 2 ...
##  $ Cell.shape     : num  1 4 1 8 1 10 1 2 1 1 ...
##  $ Marg.adhesion  : num  1 5 1 1 3 8 1 1 1 1 ...
##  $ Epith.c.size   : num  2 7 2 3 2 7 2 2 2 2 ...
##  $ Bare.nuclei    : num  1 10 2 4 1 10 10 1 1 1 ...
##  $ Bl.cromatin    : num  3 3 3 3 3 9 3 3 1 2 ...
##  $ Normal.nucleoli: num  1 2 1 7 1 7 1 1 1 1 ...
##  $ Mitoses        : num  1 1 1 1 1 1 1 1 5 1 ...
##  $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...

This has now changed the data into a numerical value and this can now be used in the GLM model.

Training the GLM using Caret

I will use Caret to train the Generalised Linear Model (GLM) aka Logistic Regression, as this is the package that best supports the odds plot statistics. Please note: I am training on the full dataset and not undertaking a data partitioning method, as perhaps seen in logistic regression.

library(caret)
glm_model <- train(Class ~ Cl.thickness + Cell.size + Cell.shape + Marg.adhesion + Bare.nuclei + Normal.nucleoli,
                   data = breast,
                   method = "glm",
                   family = "binomial")

Once the model is trained we can inspect the results with OddsPlotty:

library(OddsPlotty)
OddsPlotty::odds_plot(glm_model$finalModel,
                      title = "Odds Plot",
                      subtitle = "Showing odds of cancer based on various factors",
                      )
## Waiting for profiling to be done...

Additional parameters for the plot can be fed in:

library(OddsPlotty)
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.1
OddsPlotty::odds_plot(glm_model$finalModel, 
                      title = "Odds Plot with ggthemes economist",
                      subtitle = "Showing odds of cancer based on various factors",
                      point_col = "#00f2ff",
                      error_bar_colour = "black",
                      point_size = .5,
                      error_bar_width = .8,
                      h_line_color = "red") + ggthemes::theme_economist() + theme(legend.position = "NULL")
## Waiting for profiling to be done...

Another example of how to use a different theme:

library(OddsPlotty)
library(ggthemes)
OddsPlotty::odds_plot(glm_model$finalModel, 
                      title = "Odds Plot with ggthemes Tufte Theme",
                      subtitle = "Showing odds of cancer based on various factors",
                      point_col = "#00f2ff",
                      error_bar_colour = "black",
                      point_size = .5,
                      error_bar_width = .8,
                      h_line_color = "red") + ggthemes::theme_tufte()
## Waiting for profiling to be done...

This package was create by Gary Hutson https://twitter.com/StatsGary and the package is part of his work.