If you have ever come across violin plots, you may be wondering whether, like boxplots, they display outliers or extreme values. [For more background on boxplots see this link, and for violin plots see this link].

In short, by default they usually don’t, but it is possible to customise violin plots so that they do display outliers or extreme values. What is displayed by default, and customisation options, may also depend on the software program being used. To demonstrate, we will consider an example using both jamovi and R. The example data set we will use contains the average income per person in 163 surveyed countries for 2019 - i.e. the GDP (gross domestic product) per person, adjusted for purchasing power differences (Gapminder 2021).

1 Violin plot example using jamovi

In the following plot, we have created a violin plot of the Income variable using jamovi. Although there are some outliers in this variable, they are not displayed in the violin plot by default. The thinner density at the upper end of the violin plot, however, does give us a clue that these outliers are present:

While the outliers are not displayed by default, it is simple to customise our violin plot to display the outliers - we can do this simply by selecting the Box plot option as displayed in the below image. By doing this, a boxplot is overlaid onto the violin plot, and 4 outliers are identified. If desired, we could also select the Label outliers option, and this would tell us which specific countries are outliers in terms of their average income per person (for interest, the four countries are Ireland, Singapore, Luxembourg, and Qatar).

2 Violin plot example using R

In the following plot, we have produced a violin plot in R using the vioplot package. By default, the violin plot does not display outliers, however it does display a simple boxplot. We can also see from the violin plot that the density extends much further than the end of the upper whisker of the boxplot, indicating that outliers are present.

library(vioplot) # load the vioplot package
par(cex = 0.8, mex = 0.9)
vioplot(world.data$income_2019, las = 1, main = "Income 2019", xaxt="n", col = 3)

cutoff <- 2*IQR(world.data$income_2019, na.rm = TRUE)/sd(world.data$income_2019, na.rm = TRUE)

It is possible to display the outliers on the violin plot by using the add_outliers function as follows:

library(vioplot) # load the vioplot package
par(cex = 0.8, mex = 0.9)
vioplot(world.data$income_2019, las = 1, main = "Income 2019", xaxt="n", col = 3)
add_outliers(world.data$income_2019, categories = "", bars = "white", cutoff = cutoff)

3 Conclusion

As we have seen, although violin plots generally do not display outliers or extreme values by default, it is possible to customise violin plots to display them. Whether you are learning to use jamovi or R, you can have a go at producing and customising violin plots for yourself in STM1001 Computer Lab 3.


References

Gapminder. 2021. “Income Per Person [.csv File].” 2021. http://gapm.io/dgdppc.


These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.