I, a music connoisseur would like to determine whether there is a difference between the average length of songs for two bands, Metallica and Pink Floyd, and to quantify that difference if one exists.
library(s20x)
songlengths.df = read.table('SongLengths.txt', header = TRUE)
layout20x(1,2)
boxplot(Length ~ Band, data = songlengths.df, main = "Boxplot of Song Lengths")
twosampPlot(Length ~ Band, data = songlengths.df)
Centre: Song lengths for Metallica is higher than the song lengths of Pink Floyd.
Spread: Pink Floyd has more spread in song length than Metallica.
Skew: Metallica’s distribution in song lengths is quite symmetrical, Pink Floyd is more right skewed.
modcheck(lm(Length ~ Band, data = songlengths.df))
songlengths2.df = songlengths.df[1:23, ]
modcheck(lm(Length ~ Band, data = songlengths2.df))
summary(lm(songlengths.df))
##
## Call:
## lm(formula = songlengths.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4992 -1.9217 -0.3325 1.1275 10.4708
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.2575 0.8819 7.096 4.07e-07 ***
## BandPinkFloyd -0.2083 1.2472 -0.167 0.869
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.055 on 22 degrees of freedom
## Multiple R-squared: 0.001267, Adjusted R-squared: -0.04413
## F-statistic: 0.0279 on 1 and 22 DF, p-value: 0.8689
summary(lm(songlengths2.df))
##
## Call:
## lm(formula = songlengths2.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5473 -1.1674 -0.4473 1.4376 3.8027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.2575 0.5832 10.729 5.57e-10 ***
## BandPinkFloyd -1.1602 0.8433 -1.376 0.183
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.02 on 21 degrees of freedom
## Multiple R-squared: 0.08268, Adjusted R-squared: 0.03899
## F-statistic: 1.893 on 1 and 21 DF, p-value: 0.1834
t.test(Length ~ Band, var.equal = TRUE, data = songlengths.df)
##
## Two Sample t-test
##
## data: Length by Band
## t = 0.16704, df = 22, p-value = 0.8689
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.378182 2.794848
## sample estimates:
## mean in group Metallica mean in group PinkFloyd
## 6.257500 6.049167
The p-value of 0.87 means that there is no evidence against the null-hypothesis that the means of song lengths of the two bands are different. ie; they may be the same.
Methods and Assumptions Check
We have a numerical measurement on two distinct groups. Therefore, we should do a two-sample t-test to compare the average length of songs between them.
We assume that the two bands are independent of each other, so they aren’t collaborating or ‘plagiarising’ with their songs. The equality of variance assumption is reasonable, (apart from one point). The normality assumption looks good, as most of the data lie on the line of the Q-Q plot, and the histogram plot is fairly normal. (can see a very slight right skew)
The Cook’s distance plot suggests we remove point 24, as it has a Cook’s distance over 0.4. However upon removing the point, we can see that the point is insignificant, as it does not change either coefficient by more than one standard error. (B1: -0.2083 - -1.1602 = 0.8799, compared to the standard of 0.8819)
Since all assumption checks are fair, (we won’t let that one point affect our assumptions), we will fit a standard two sample t-test.
And so our model fitted is: Length = μ + α[i] + ε[i,j], where α[i] is the effect of whether a song is from Metallica or not. Either Metallica or Pink Floyd, and ε[i,j] ~ iid N(0,σ^2).
Executive Summary
This random sample was taken to assess whether there was a difference of song lengths (minutes) between the bands Metallica and Pink Floyd, as requested by a music connoisseur.
There is no evidence that the average song length of Metallica and Pink Floyd are different.
We estimate that mean song length of Metallica to be up to 2 minutes and 48 seconds longer than Pink Floyd. (And down to 2min and 24seconds shorter than PF).