Steve Hoffman
Measurement and Evaluation in Psychology and Education
8th Edition (2010)
Robert M. Thorndike & Tracy Thorndike-Christ
Pages 39 - 57
# A tibble: 52 × 7
First Last Gender Class Reading Spelling Math
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Aaron Andrews male Johnson 32 64 43
2 Byron Biggs male Johnson 40 64 37
3 Charles Cowen male Johnson 36 40 38
4 Donna Davis female Johnson 41 74 40
5 Erin Edwards female Johnson 36 69 28
6 Fernando Franco male Johnson 41 67 42
7 Gail Galaraga female Johnson 40 71 37
8 Harpo Henry male Johnson 30 51 34
9 Irrida Ignacio female Johnson 37 68 35
10 Jack Johanson male Johnson 26 56 26
# … with 42 more rows
The Thorndike’s explain three different measures of the center of a group of scores. (Measures of Central Tendency)
Let’s look at scores on Johnson and Cordero’s spelling test
Sort the data set by spelling scores from lowest to highest
# A tibble: 52 × 7
First Last Gender Class Reading Spelling Math
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Bellinda Brown female Cordero 33 38 41
2 Charles Cowen male Johnson 36 40 38
3 Larry Lewis male Cordero 29 40 34
4 Thelma Thwaites female Cordero 38 43 45
5 Quadra Quickly female Johnson 21 44 19
6 Nancy Nowits female Cordero 28 44 44
7 Nathan Natts male Johnson 22 47 22
8 Charlotta Cowen female Cordero 33 47 50
9 Zephina Zoro female Cordero 30 47 38
10 Quincy Quirn male Cordero 33 48 33
# … with 42 more rows
Which score occurred most frequently?
[1] 38 40 40 43 44 44 47 47 47 48 49 50 51 51 51 52 52 53 53 53 54 54 55 55 56
[26] 57 57 58 59 59 59 61 61 61 63 64 64 64 64 64 65 65 66 67 68 68 68 69 71 73
[51] 74 76
# R does not have a standard in-built function to calculate mode.
# Create a user function to calculate mode of a data set in R.
Mode <- function(x) {
uniqx <- unique(x)
uniqx[which.max(tabulate(match(x, uniqx)))]
}
Mode(sorted$Spelling)
[1] 64
So in this case, the Mode is 64
No need to build a function for median (the middle value), as this is already built into R.
Note that I’m asking R to look at the dataset called Table.2.1 – and then look specifically at the column called Spelling.
Calculate the median value – 57 in this case.
How does R calculate? “Under the hood” R will sort the values from smallest to largest, and then find the value that is halfway down the list. If there is an even number of values, then R takes the average of the middle two.
Known also as the average, R calculates this by adding up all of the values in the column (e.g. all 52 of the spelling scores: 64 + 64 + 40 + 74 + …) and divide this sum by the number of scores (52 in this case)
Just a fraction over 57 in this case
When a distribution of a variable is NOT symmetrical, then it is described as skewed
Negative skewness indicates that the distribution is left skewed and the mean of the data (like Spelling scores) is less than the median value. Positive skewness means the opposite – that a distribution is right skewed, with the mean is higher than the median.
There’s a package called “moments” that calculates this statistic. If you wanted to calculate the value of skewness, install it on your computer. install.packages(“moments”)
This value is pretty small (and only a touch negative) indicating that our Spelling scores are fairly symmetrical
An example of a positively skewed variable is the engine displacment from the mpg dataset from our R for Data Science book.
This is an example where the median is 3.3 while the mean is a bit higher at 3.47
Our authors identify three different ways to describe how much something like spelling scores are spread out:
And if you somehow needed to put this in a singal statistic, you might subract the minimum value from the maximum value.
IQR is the middle half of the scores – from the 25th percentile to the 75th percentile.
25% 50% 75%
51 57 64
[1] 13
In this case, the 25th percentile is 51 (only a quarter of the scores are below 51). The 50th percentile (the median) is 57. And the 75th percentile is 64.
The IQR in this case is 13 points between the 25th percentile and the 75th percentile
Boxplots (also called box & whiskers plots) are constructed by drawing a box showing the lower quartile, Q1 – the spot identifying where 25% of the scores are at and the upper quartile – Q3, marking the 75% of scores. And then the median is marked with a line, showing the middle score (for spelling scores, in our case).
On page 48, “There are also measures of variability that belong to the family of the arithmetic mean and are based on score deviations from the mean. The most commonly used one is called the standard deviation.”
Our Spelling scores from the Table 2.1 data can be calculated this way:
Easy enough to find out that the mean (the average) Spelling Score is 57.15 and the standard deviation of these spelling scores it’s 9.35. But what does this mean?
In the book they talked about Math scores (Figure 2-8 on page 51), but we were already talking about Spelling, so let’s produce a similar plot.
The superimposed normal distribution is meant to demonstrate that roughly two-thirds (68%) of the spelling scores fall between one standard deviation below the mean and one standard deviation above the mean.
In our case, two thirds of spelling scores fell between 47.8 and 66.5
If we line up our spelling scores in order from lowest to highest.
How many are below 47? I see six scores below 47.
And how many are above 66.5? I see nine.
That leaves 37 spelling scores (out of 52) within the standard deviation of 9.35
Is 37 out of 52 about two-thirds? Yes: 37/52 = 71%