This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
library(s20x)
bookcost.df = read.table("bookcost.txt", header= TRUE)
bookcost.df$Format= factor(bookcost.df$Format)
format.lm= lm(Cost ~ Pages * Format , data=bookcost.df)
plot(Cost~Pages, data=bookcost.df, main="The number of Pages vs Cost", col=ifelse (Format=="Hard","blue","green"), pch=ifelse(Format=="Hard", 1,2))
legend('topright', legend=c("Hard", "Paper"), pch=c(1,2), col=c("blue", "green"))
abline(format.lm$coef[1],format.lm$coef[2],col='blue')
abline(format.lm$coef[1]+format.lm$coef[3],format.lm$coef[2]+format.lm$coef[4],col='green')
modcheck(format.lm)
summary(format.lm)
##
## Call:
## lm(formula = Cost ~ Pages * Format, data = bookcost.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.4766 -2.2143 -0.8453 1.0037 19.4456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.500428 0.913658 21.343 < 2e-16 ***
## Pages 0.016468 0.002734 6.023 7.93e-09 ***
## FormatPaper -7.921170 1.386921 -5.711 3.95e-08 ***
## Pages:FormatPaper -0.009543 0.004072 -2.344 0.0201 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.549 on 203 degrees of freedom
## Multiple R-squared: 0.6109, Adjusted R-squared: 0.6051
## F-statistic: 106.2 on 3 and 203 DF, p-value: < 2.2e-16
confint(format.lm)
## 2.5 % 97.5 %
## (Intercept) 17.69895189 21.301904283
## Pages 0.01107683 0.021859512
## FormatPaper -10.65578735 -5.186551842
## Pages:FormatPaper -0.01757139 -0.001514958
bookcost.df$Formatpaper=factor(bookcost.df$Format,levels=c("Paper", "Hard"))
Format.lm2=lm(Cost~Pages *Formatpaper, data=bookcost.df)
summary(Format.lm2)
##
## Call:
## lm(formula = Cost ~ Pages * Formatpaper, data = bookcost.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.4766 -2.2143 -0.8453 1.0037 19.4456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.579258 1.043446 11.097 < 2e-16 ***
## Pages 0.006925 0.003017 2.295 0.0227 *
## FormatpaperHard 7.921170 1.386921 5.711 3.95e-08 ***
## Pages:FormatpaperHard 0.009543 0.004072 2.344 0.0201 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.549 on 203 degrees of freedom
## Multiple R-squared: 0.6109, Adjusted R-squared: 0.6051
## F-statistic: 106.2 on 3 and 203 DF, p-value: < 2.2e-16
confint(Format.lm2)
## 2.5 % 97.5 %
## (Intercept) 9.5218772013 13.63663978
## Pages 0.0009764142 0.01287359
## FormatpaperHard 5.1865518420 10.65578735
## Pages:FormatpaperHard 0.0015149581 0.01757139
Methods and Assumptions check: To explain the book’s cost, we fitted a linear regression line through the data with explanatory variables Format, Pages and their interaction.
We were interested in finding the relationship between the book cost, the number of pages in the book and the book format (hard vs. paper). The interaction term enabled us to observe different relationships between the cost and variables, which we found to be significant due to a p-value of 0.02, indicating that we cannot reject the null hypothesis. The relationship between the price and the number of pages varies between the two formats, ensuring that the interaction remains. Therefore, the relationship between book cost and page numbers depends on the book’s format.
The equality of variance and normality checks don’t show a quadratic or linear trend and appear to be randomly scattered around 0.
By keeping the paper covers as the baseline intially , we perfomed a factor rotation to retrieve the data for the hard cover with reference to the paper cover.
Our final model is therefore Cost\(_i\)=\(\beta_0+\beta_1\times\text{Pagenumber}_i+\beta_2\times\text{format}_i+\beta_3\times\text{Pagenumber}_i\times\text{format}_i+\epsilon_i\) where \(\epsilon_i\stackrel{\text{iid}}{\sim}N(0,\sigma^2)\) using dummy variables 0 and 1,the format will be 0 for the hard cover and 1 for the paper cover.
Executive Summary: We were interested in comparing the cost of printing a book with a paper cover versus a hard cover and wanted to check whether the cost of the book varied by book format.
From our analysis, we found that the cost of producing a book depends on the number of pages and the book’s format, with hardcover books being more expensive than paperbacks.
We are 95% confident that the production cost increased from $1.1 to $2.1 per 10 pages with hard covers, with paper covers increasing between $0.015 to $0.18 for 10 additional pages, with their overall costs differing between $5.18 to $10.65
Our R^2 terms illustrate that our model explain 61% of the variation in the book cost.