Lecture 10

Measuring Dispersion

Assume numList is a list of numerical values. A simple measure of dispersion is the difference between the largest and smallest values in the list. This statistic is known as the range. The example below demonstrates the calculation, making use of the builtin functions max() and min().

numList = [1,2,3,4,5]
myRange = max(numList) - min(numList)
print(myRange)

## 4

Create a Function

Write a function myRange(), which accepts a list of numerical values and returns the range. Test your function with the list given above. Do this in cocalc.

Answer

def myRange(nl):
    range = max(nl) - min(nl)
    return range
    
numList = [1,2,3,4,5]
r = myRange(numList)
print(r)

## 4

Exercise

Run the following code in cocalc. Explain why it fails.

def myRange(nl):
range = max(nl) - min(nl)

numList = [1,2,3,4,5]
r = myRange(numList)
print(r)

Answer

Since nothing was returned by the function, there was really no value on the right side of the assignment statement.

Exercise

Run the following code in cocalc and explain why it fails.

def myRange(nl):
mrange = max(nl) - min(nl)

numList = [1,2,3,4,5]
r = myRange(numList)
print(mrange)

Answer

The variable mrange has no value outside of the function where it was defined.

The bottom line is that the return statement is crucial to get a value out of a function.

Just printing can be done without a return statement.

def myRange(nl):
    mrange = max(nl) - min(nl)
    print(mrange)
numList = [1,2,3,4,5]
myRange(numList)

## 4

Central Tendency (Location)

A good measure of central tendency answers a particular question. What single number best describes all of the numbers in the list?

The most commonly used measure of location is the mean. It is defined as the sum of the numbers in the list divided by the length of the list. The following code fragment uses the accumulator pattern to compute the mean of a list without reference to any other functions.

numList = [1,2,3,4,5]
n = 0
total = 0
for x in numList:
    n = n + 1
    total = total + x
mean = total/n   
print(mean)

## 3.0

Exercise

Convert the code fragment above to a function mean(). The function has a single argument, a list of numbers. It returns the mean value.

Test the function using the numList from above.

Answer

def mean(nl):
    n = 0
    total = 0
    for x in nl:
        n = n + 1
        total = total + x
    result = total/n
    return result
 
numList = [1,2,3,4,5]
mu = mean(numList)
print(mu)

## 3.0

More on Dispersion (Variation)

The range only considered the two most extreme values of the list of numbers.

Other measures of variation consider all of the numbers.

These measures are naturally based on the distances between the individual numbers and the measure of location. These numbers are called deviations. The following code displays the deviations from the mean for the list of numbers we have been using.

# In this environment, we need to repeat
# the definition of the mean function.

def mean(nl):
    n = 0
    total = 0
    for x in nl:
        n = n + 1
        total = total + x
    result = total/n
    return result
 
numList = [1,2,3,4,5]
mu = mean(numList)

# Build a list of the deviations

devList = []
for x in numList:
    dev = x - mu
    devList.append(dev)
    print("A deviation",dev)
    
# How do we describe the deviations with a single number.  It seems natural to take the mean of the deviations.

meanDev = mean(devList)
print(" ")
print("The mean of the deviations", meanDev)

# Note that the simple mean of the deviations
# from the mean will always be 0.

# Thinking more carefully we don't want the 
# signed differences, just the size of the 
# difference.  We can capture this idea by 
# getting the absolute values of the
# deviations. The mean of the absolute values
# of the deviations is known as the MAD.

print(" ")
absDevList = []
for x in devList:
    absDevList.append(abs(x))
    print("An absolute deviation",abs(x))
print(" ")
MAD = mean(absDevList)
print("MAD", MAD)

## A deviation -2.0
## A deviation -1.0
## A deviation 0.0
## A deviation 1.0
## A deviation 2.0
##  
## The mean of the deviations 0.0
##  
## An absolute deviation 2.0
## An absolute deviation 1.0
## An absolute deviation 0.0
## An absolute deviation 1.0
## An absolute deviation 2.0
##  
## MAD 1.2

Variance and Standard Deviation (Section 4.7)

We solved the problem of cancelling out by making all deviations positive using the absolute value function. Another function which would do this is squaring. For deep theoretical reasons, statistical work almost always takes the squaring path. To illustrate a problem, we’ll use a slightly different list with larger numbers. The variance might be defined as the average squared deviation from the mean. However, again for deep theoretical reasons, the divisor is \(n-1\) rather than n.

The following code calculates the variance of a list of numbers. Note that we use the mean() function from the numpy module, which we have given the alias np.

import numpy as np
import math
numList = [10,20,30,40,50]
mu = np.mean(numList)
print(mu)

sumSqDev = 0
n = 0
for x in numList:
    dev = x - mu
    devSq = dev**2
    sumSqDev = sumSqDev + devSq
    n = n + 1
variance =  sumSqDev/(n-1) 
print(" ")
print("Variance", variance)

#The number 250 is essentially impossible to interpret in terms of the original list of numbers. 

# We normally report the square root of the variance, known as the standard deviation.

stdDev = math.sqrt(variance)
print(" ")
print("Standard Deviation", stdDev)

## 30.0
##  
## Variance 250.0
##  
## Standard Deviation 15.811388300841896

Exercise

Use the code fragments above to create a function that accepts a list of numbers as its only argument and returns the value of the variance.

Then make use of that function to create a function stdDev() which produces a standard deviation instead of a variance. Hint: The second function can call the first. Test the second function on the list of larger numbers used above.

Answer

import numpy as np
import math
def variance(numList):
    mu = np.mean(numList)
    sumSqDev = 0
    n = 0
    for x in numList:
        dev = x - mu
        devSq = dev**2
        sumSqDev = sumSqDev + devSq
        n = n + 1
    var =  sumSqDev/(n-1) 
    return(var)
    
def stdDev(numList):
    return(math.sqrt(variance(numList)))
    
sd = stdDev([10,20,30,40,50]) 
print(sd)

## 15.811388300841896