pulitzer.data<-read.csv('https://raw.githubusercontent.com/WigodskyD/data-sets/master/pulitzer-circulation-data.csv',stringsAsFactors = FALSE)
pulitzer.data[[2]]<- as.numeric(gsub(",", "", pulitzer.data[[2]]))
pulitzer.data[[3]]<- as.numeric(gsub(",", "", pulitzer.data[[3]]))
pulitzer.data%<>%
mutate(average.circulation = (.5*(Daily.Circulation..2004 + Daily.Circulation..2013)))
linear.model<-lm(pulitzer.data$Pulitzer.Prize.Winners.and.Finalists..2004.2014~pulitzer.data$average.circulation)
summary(linear.model)
##
## Call:
## lm(formula = pulitzer.data$Pulitzer.Prize.Winners.and.Finalists..2004.2014 ~
## pulitzer.data$average.circulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.768 -3.784 -1.949 0.766 39.626
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.405e-01 2.155e+00 0.344 0.732671
## pulitzer.data$average.circulation 1.450e-05 3.723e-06 3.894 0.000304
##
## (Intercept)
## pulitzer.data$average.circulation ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.69 on 48 degrees of freedom
## Multiple R-squared: 0.2401, Adjusted R-squared: 0.2242
## F-statistic: 15.16 on 1 and 48 DF, p-value: 0.0003045
plot(resid(linear.model))
Our residuals show that a linear model is not the most appropriate. The variance begins large and tapers off.
bptest(linear.model)
##
## studentized Breusch-Pagan test
##
## data: linear.model
## BP = 16.463, df = 1, p-value = 4.962e-05
A Breusch-Pagan test shows that heteroskedasticity is indeed a problem.
caret::BoxCoxTrans(pulitzer.data$Pulitzer.Prize.Winners.and.Finalists..2004.2014)
## Box-Cox Transformation
##
## 50 data points used to estimate Lambda
##
## Input data summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 3.00 6.72 6.75 62.00
##
## Lambda could not be estimated; no transformation is applied
An attempt at a Box-Cox Transformation has failed. In order to perform a regression that is more appropriate, another method like a weighted least squares or a log transformation would be required.