Load www.playbill.com, data for week of October 11-17, 2004. Data is downloaded from http://www.stat.tamu.edu/~sheather/book/data_sets.php website.
library(knitr) # Report display, table format
library(kableExtra)
#Load data
playbill <- read.csv("https://raw.githubusercontent.com/akulapa/Data621-Week02-Discussion/master/playbill.csv", header= TRUE, stringsAsFactors = F)
(a) Find a 95% confidence interval for the slope of the regression model, \(\beta_1\). Is 1 a plausible value for \(\beta_1\) ? Give a reason to support your answer.
options("scipen"=100, "digits"=4)
#Generate linear model
#We will be predicting current week results using last week results
play.lm <- lm(data = playbill, CurrentWeek ~ LastWeek)
summary(play.lm)
##
## Call:
## lm(formula = CurrentWeek ~ LastWeek, data = playbill)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36926 -7525 -2581 7782 35443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6804.8860 9929.3178 0.69 0.5
## LastWeek 0.9821 0.0144 68.07 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18000 on 16 degrees of freedom
## Multiple R-squared: 0.997, Adjusted R-squared: 0.996
## F-statistic: 4.63e+03 on 1 and 16 DF, p-value: <0.0000000000000002
Linear model in the form of \(Y = \beta_0 + \beta_1X\) is \(CurrentWeek = 6804.886 + 0.9821LastWeek\).
Slope of the line is \(\beta_1 = 0.98\). 95% confidence interval for the slope = \(\beta_1 \pm (t^* \times SE(\beta_1))\), where \(t^*\) is t-distribution for 16 degrees of freedom.
#t-distribution for 16 DF. 95% is 2.5% and 97.5% for 2 tail
tdis <- qt(0.975, 16)
b1 <- play.lm$coefficients[2]
se <- coef(summary(play.lm))[, "Std. Error"][2]
CI <- b1 + c(-1,1) * tdis * se
95% confidence interval is [0.9515, 1.0127]
Confidence interval can directly obtained using confint() R function.
CI <- confint(play.lm)["LastWeek", ]
95% confidence interval is [0.9515, 1.0127]
Yes, 1 is a plausible value as it falls between lower and upper values of 95% confidence interval.
(b) Test the null hypothesis \(H_0\) : \(\beta_0\) = 10000 against a two-sided alternative. Interpret your result.
Using regression model, \(\beta_0 = 6804.886\)
Null hypothesis \(H_0\) : \(\beta_0 = 10000\) Alternative hypothesis \(H_A\) : \(\beta_0 \ne 10000\)
Lets derive values for 95% confidence interval for \(\beta_0\) = \(\beta_0 \pm (t^* \times SE(\beta_0))\). Where \(t^*\) is t-distribution for 16 degrees of freedom.
#t-distribution for 16 DF. 95% is 2.5% and 97.5% for 2 tail
tdis <- qt(0.975, 16)
b0 <- play.lm$coefficients[1]
se <- coef(summary(play.lm))[, "Std. Error"][1]
CI <- b0 + c(-1,1) * tdis * se
95% confidence interval for \(\beta_0\) is [-14244.3274, 27854.0994].
Using confint() R function
CI <- confint(play.lm)["(Intercept)", ]
95% confidence interval for \(\beta_0\) is [-14244.3274, 27854.0994].
Since \(\beta_0 = 10000\) falls between upper and lower values of 95% confidence interval, we accept null hypothesis \(H_0\). In other words, we are 95% confident that the intercept \(\beta_0\) can assume a value of \(10000\).
(c) Use the fitted regression model to estimate the gross box office results for the current week (in $) for a production with $400,000 in gross box office the previous week. Find a 95% prediction interval for the gross box office results for the current week (in $) for a production with $400,000 in gross box office the previous week. Is $450,000 a feasible value for the gross box office results in the current week, for a production with $400,000 in gross box office the previous week? Give a reason to support your answer.
To answer above question, I will be using predict() R function.
newdata = data.frame(LastWeek=400000)
PI <- predict(play.lm, newdata, interval="predict", level=.95)
The prediction interval for current week for a production with $400,000 in gross box office the previous week is [399637.4777, 359832.7517, 439442.2036]. Fitted value: 399637.4777, Lower value: 359832.7517 and Upper value: 439442.2036. Since $450,000 does not fall between lower and upper values of prediction interval, it is not feasible.
(d) Some promoters of Broadway plays use the prediction rule that next week’s gross box office results will be equal to this week’s gross box office results. Comment on the appropriateness of this rule.
#Create data frame with current week values as last week
newdata = data.frame(LastWeek=playbill$CurrentWeek)
#Generate confidence inverval
PI <- predict(play.lm, newdata, interval="confidence", level=.95)
PI <- data.frame(PI)
PI$LastWeek <- newdata$LastWeek
PI$Play <- playbill$Production
#Display data
PI %>%
kable(format="html", caption = "Confidence Invervals") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left")
| fit | lwr | upr | LastWeek | Play |
|---|---|---|---|---|
| 679497 | 670297 | 688698 | 684966 | 42nd Street |
| 500170 | 490455 | 509886 | 502367 | Avenue Q |
| 590627 | 581589 | 599664 | 594474 | Beauty and Beast |
| 526619 | 517183 | 536054 | 529298 | Bombay Dreams |
| 566841 | 557704 | 575978 | 570254 | Chicago |
| 321031 | 308131 | 333930 | 319959 | Dracula |
| 575554 | 566460 | 584647 | 579126 | Fiddler on the Roof |
| 138445 | 121014 | 155876 | 134042 | Forever Tango |
| 110761 | 92586 | 128936 | 105853 | Golda’s Balcony |
| 814837 | 803947 | 825727 | 822775 | Hairspray |
| 939254 | 925795 | 952713 | 949462 | Mamma Mia! |
| 605881 | 596876 | 614887 | 610007 | Movin’ Out |
| 386671 | 375148 | 398194 | 386797 | Rent |
| 1119537 | 1101507 | 1137566 | 1133034 | The Lion King |
| 623168 | 614169 | 632167 | 627609 | The Phantom of the Opera |
| 902195 | 889571 | 914820 | 911727 | The Producers |
| 1165922 | 1146627 | 1185217 | 1180266 | Wicked |
| 477374 | 467369 | 487379 | 479155 | Wonderful Town |
I do not 100% agree with promoters because next week’s gross box office results could be close to the lower value of confidence interval, which is less than current week’s gross box office results.