I wanted to look at F2 values for each vowel and see how much variation there was. First, I imported the necessary packages into my code and read in the data.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
data <- read.csv('Peterson-Barney.csv')
Then, I used dplyr to group the data rows by speaker.
by_speaker = data %>%
group_by(Speaker)
I used ggplot to make a box plot where different vowels are along the x-axis, and their recorded F2 values are on the y-axis. (I had to look up how to order x-values based on some value using reorder().) I also color-coded by sex. I did not expand the limits to y=0, because I figured I was only interested in the relative F2 values anyway.
ggplot(by_speaker, aes(x = reorder(Vowel, F2.hz), y=F2.hz, color=Sex)) +
geom_boxplot() +
ggtitle("Boxplot of F2 Values per Speaker")
It looks like some of the vowels have greater variation than others. For example, the box and range for “ao” is smaller that of “uw”. Something I noticed is that there are generally more outliers for vowels with higher average F2, and almost no outliers that occur below the standard range. Gender-wise, there does not seem to be a consistent difference in the F2-variation between the male and female speakers (the boxes and whiskers are about the same size for each vowel), but interestingly, there were significantly more male outliers for every single vowel.