3/30/2022

Motivation

A standard normal, or z-distribution has a mean of zero and a standard deviation of 1. Any normal distribution can be standardized by subtracting the mean from each value in the data and dividing the result by the standard deviation. The result is a distribution of “Z-scores”, where each Z corresponds with a value \(x\) from the original data with mean \(\mu\) and standard deviation \(\sigma\): \[Z=\frac{x-\mu}{\sigma}\] It is often required to calculate the likelhood of a certain outcome within a normallly distributed data set. Calculating the Z-score of an outcome \(x\) is one way of doing this, presenting how many standard deviations that outcome is away from the mean.

Web App Function

This application standardizes a normal distribution when given the mean and standard deviation of the data set. It provides a graph centered at the mean along with markers indicating the boundaries of up to three standard deviations in both the positive and negative direction. Finally, given a value it graphs its position with a red line and reports its Z-score, in other words its distance from the mean as a number of standard deviations. Let’s demonstrate on a German Student dataset1, looking at heights:

students <- read.csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")
hist(students$height, xlab = "Student Height (cm)", main = "Normality of Height Data")

Example Usage

With the mean and standard deviation of the German height data, observe how the calculator visualizes and calculates the Z-score for an obervation of 193cm (which happens to be that of the author of this app):

Server Code and References

1 German Student Data: https://userpage.fu-berlin.de/soga/200/2010_data_sets/

Server code for Z-score calculation and plot generation:

output$text1 = renderText((input$val - input$mean)/input$dev)
output$plot = renderPlot({
    ggplot(data = data.frame(x = c(input$mean - 3.2 * input$dev, input$mean + 3.2 *
        input$dev)), aes(x)) + stat_function(fun = dnorm, n = 101, args = list(mean = input$mean,
        sd = input$dev)) + ylab("") + scale_y_continuous(breaks = NULL) + scale_x_continuous() +
        geom_vline(xintercept = input$mean, color = "black") + geom_vline(xintercept = input$val,
        color = "red") + geom_vline(xintercept = input$mean + input$dev, color = "grey") +
        geom_vline(xintercept = input$mean + 2 * input$dev, color = "grey") + geom_vline(xintercept = input$mean +
        3 * input$dev, color = "grey") + geom_vline(xintercept = input$mean - input$dev,
        color = "grey") + geom_vline(xintercept = input$mean - 2 * input$dev, color = "grey") +
        geom_vline(xintercept = input$mean - 3 * input$dev, color = "grey")
})