This is a Draft

Assignment

  1. Polynomial regression model
  2. Truncated polynomial splines of degree 2 (consider k=2, 3 and 5 knots)
  3. B-splines of degree 2 (consider m=3, 5 and 8 knots)
  4. Cubic P-splines (consider k=5, 8 and 20 knots)
Descriptive

Should we add a dummy to know if the movie turned a profit or not?

Distribution of Budgets

Min. 1st Qu. Median Mean 3rd Qu. Max.
8.00 32.00 50.00 65.96 80.00 300.00

Models

Budget only

For some help with poly in R I used this link
For help on the cubic splines I used: link

plot(budget,profit,col="grey",xlab="Budget",ylab="Profit", main = "We want to predict profit by only using budget")+
  abline(h=0, col = "green")

## integer(0)
b.3<-lm(profit ~ bs(budget,knots = 2),data = train)
b.5 <- lm(profit ~ bs(budget,knots = 5),data = train)
b.8 <- lm(profit ~ bs(budget,knots = 8),data = train)

b.donaldcuts <- lm(profit ~ bs(budget,knots = c(32,50,65.96,80)),data = train)

AIC(poly.3,b.5,b.8,b.donaldcuts)

All these models have the same AIC except for my model where I used the distribution of budget to define where to cut. The value is still quite close. Looks strange.
Let’s look at some residuals