As one of the measures I wanted to plot was the mode, and there is no mode function already programmed into R, one needed to be created.
getMode <- function(v, na.rm = FALSE) {
if(na.rm){ #if na.rm is TRUE, remove NA values from input x
v <- v[!is.na(v)]
}
uniqv <- unique(v) # get unique values from v
# first gets the positions of uniqv which match values in v
# then counts the number of times each position shows up
# then gets the position that shows up the most
# then subsets uniqv by the position that shows up the most
uniqv[which.max(tabulate(match(v, uniqv)))]
}
The next step was to create the plot itself
centralTendencyPlot <- function (data, variable){
# load required library
library(ggplot2)
# set the aesthetic to use the variable as defining the x axis
ggplot(data, aes(x = .data[[variable]])) +
# plot the density
geom_density() +
# add a line for the mean
geom_vline(aes(xintercept = mean(.data[[variable]],
na.rm= TRUE),
linetype = "Mean"), colour = "red") +
# add a line for the median
geom_vline(aes(xintercept = median(.data[[variable]],
na.rm = TRUE),
linetype = "Median"), colour = "blue") +
# add a line for the mode
geom_vline(aes(xintercept = getMode(.data[[variable]],
na.rm = TRUE),
linetype = "Mode"), colour = "green") +
# create the legend
scale_linetype_manual(name = "Measures of Central Tendency",
# whole lines
values = c("Mean" = 1, "Median" = 1,
"Mode" = 1),
# color in the legend
guide = guide_legend(
override.aes = list(
colour = c("red","blue",
"green"))))
}
This then works as follows:
centralTendencyPlot(airquality,"Solar.R")
It makes it easy to use in other dataframes:
library(readxl)
df <- read_xlsx("C:\\Users\\Pedro Henrique\\Desktop\\L8_dataset.xlsx")
centralTendencyPlot(df,"ertime")
ertimelog <- log(df$ertime)
centralTendencyPlot(as.data.frame(ertimelog),"ertimelog")
In summary, this was a fun function to construct, and useful at that.