Load www.playbill.com, data for week of October 11-17, 2004. Data is downloaded from http://www.stat.tamu.edu/~sheather/book/data_sets.php website.

library(knitr)    # Report display, table format
library(kableExtra)

#Load data
playbill <- read.csv("https://raw.githubusercontent.com/akulapa/Data621-Week02-Discussion/master/playbill.csv", header= TRUE, stringsAsFactors = F)

(a) Find a 95% confidence interval for the slope of the regression model, \(\beta_1\). Is 1 a plausible value for \(\beta_1\) ? Give a reason to support your answer.

options("scipen"=100, "digits"=4)
#Generate linear model
#We will be predicting current week results using last week results 
play.lm <- lm(data = playbill, CurrentWeek ~ LastWeek)
summary(play.lm)
## 
## Call:
## lm(formula = CurrentWeek ~ LastWeek, data = playbill)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -36926  -7525  -2581   7782  35443 
## 
## Coefficients:
##              Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) 6804.8860  9929.3178    0.69                 0.5    
## LastWeek       0.9821     0.0144   68.07 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18000 on 16 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.996 
## F-statistic: 4.63e+03 on 1 and 16 DF,  p-value: <0.0000000000000002

Linear model in the form of \(Y = \beta_0 + \beta_1X\) is \(CurrentWeek = 6804.886 + 0.9821LastWeek\).

Slope of the line is \(\beta_1 = 0.98\). 95% confidence interval for the slope = \(\beta_1 \pm (t^* \times SE(\beta_1))\), where \(t^*\) is t-distribution for 16 degrees of freedom.

#t-distribution for 16 DF. 95% is 2.5% and 97.5% for 2 tail

tdis <- qt(0.975, 16)
b1 <- play.lm$coefficients[2]
se <- coef(summary(play.lm))[, "Std. Error"][2]
CI <- b1 + c(-1,1) * tdis * se

95% confidence interval is [0.9515, 1.0127]

Confidence interval can directly obtained using confint() R function.

CI <- confint(play.lm)["LastWeek", ]

95% confidence interval is [0.9515, 1.0127]

Yes, 1 is a plausible value as it falls between lower and upper values of 95% confidence interval.

(b) Test the null hypothesis \(H_0\) : \(\beta_0\) = 10000 against a two-sided alternative. Interpret your result.

Using regression model, \(\beta_0 = 6804.886\)

Null hypothesis \(H_0\) : \(\beta_0 = 10000\) Alternative hypothesis \(H_A\) : \(\beta_0 \ne 10000\)

Lets derive values for 95% confidence interval for \(\beta_0\) = \(\beta_0 \pm (t^* \times SE(\beta_0))\). Where \(t^*\) is t-distribution for 16 degrees of freedom.

#t-distribution for 16 DF. 95% is 2.5% and 97.5% for 2 tail

tdis <- qt(0.975, 16)
b0 <- play.lm$coefficients[1]
se <- coef(summary(play.lm))[, "Std. Error"][1]
CI <- b0 + c(-1,1) * tdis * se

95% confidence interval for \(\beta_0\) is [-14244.3274, 27854.0994].

Using confint() R function

CI <- confint(play.lm)["(Intercept)", ]

95% confidence interval for \(\beta_0\) is [-14244.3274, 27854.0994].

Since \(\beta_0 = 10000\) falls between upper and lower values of 95% confidence interval, we accept null hypothesis \(H_0\). In other words, we are 95% confident that the intercept \(\beta_0\) can assume a value of \(10000\).

(c) Use the fitted regression model to estimate the gross box office results for the current week (in $) for a production with $400,000 in gross box office the previous week. Find a 95% prediction interval for the gross box office results for the current week (in $) for a production with $400,000 in gross box office the previous week. Is $450,000 a feasible value for the gross box office results in the current week, for a production with $400,000 in gross box office the previous week? Give a reason to support your answer.

To answer above question, I will be using predict() R function.

newdata = data.frame(LastWeek=400000)
PI <- predict(play.lm, newdata, interval="predict", level=.95)

The prediction interval for current week for a production with $400,000 in gross box office the previous week is [399637.4777, 359832.7517, 439442.2036]. Fitted value: 399637.4777, Lower value: 359832.7517 and Upper value: 439442.2036. Since $450,000 does not fall between lower and upper values of prediction interval, it is not feasible.

(d) Some promoters of Broadway plays use the prediction rule that next week’s gross box office results will be equal to this week’s gross box office results. Comment on the appropriateness of this rule.

#Create data frame with current week values as last week
newdata = data.frame(LastWeek=playbill$CurrentWeek)
#Generate confidence inverval

PI <- predict(play.lm, newdata, interval="confidence", level=.95)
PI <- data.frame(PI)
PI$LastWeek <- newdata$LastWeek
PI$Play <- playbill$Production
#Display data
PI %>% 
  kable(format="html", caption = "Confidence Invervals") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left")
Confidence Invervals
fit lwr upr LastWeek Play
679497 670297 688698 684966 42nd Street
500170 490455 509886 502367 Avenue Q
590627 581589 599664 594474 Beauty and Beast
526619 517183 536054 529298 Bombay Dreams
566841 557704 575978 570254 Chicago
321031 308131 333930 319959 Dracula
575554 566460 584647 579126 Fiddler on the Roof
138445 121014 155876 134042 Forever Tango
110761 92586 128936 105853 Golda’s Balcony
814837 803947 825727 822775 Hairspray
939254 925795 952713 949462 Mamma Mia!
605881 596876 614887 610007 Movin’ Out
386671 375148 398194 386797 Rent
1119537 1101507 1137566 1133034 The Lion King
623168 614169 632167 627609 The Phantom of the Opera
902195 889571 914820 911727 The Producers
1165922 1146627 1185217 1180266 Wicked
477374 467369 487379 479155 Wonderful Town

I do not 100% agree with promoters because next week’s gross box office results could be close to the lower value of confidence interval, which is less than current week’s gross box office results.

References