In this assignment, you will create your own RMarkdown document and your own code. This code should be closely based on code we have recently used in class, but don’t hesitate to use the internet and others sources as well. All code and answers should be in an RMarkdown document.

1. Load the small dataset Fertilizer Experiment.csv and print it out. There are three columns:

Nitrogen - This shows the pounds per acre of nitrogen applied in an experiment to judge the effect of fertilizer on crop yields.

Nit_by_20 - Shis shows the same data as Column 1 but in units of 20 pounds per acre.

Yield - My source just says ‘bushels’ but I suspect the true unit is bushels per acre.

library("car")

2. Create a linear model to predict crop yield as a function of amount of fertilizer applied. Include …

a. Exploratory plots and relevant comments.



b. The linear model, including summary output, a statement of the model, diagnostic plots and your comments.

The model is:

residual = resid(linear_fit)
plot(Nitrogen$Nitrogen, residual, main = "residuals vs x")
plot(linear_fit, which = 1:2)
  1. Include a plot of the regression line with the data. Comments.
comment???

3. Create a quadratic model to predict crop yield as a function of amount of fertilizer applied. Include …

a. The quadratic model, including summary output, a statement of the model, diagnostic plots and your comments.

The model is:

comment?
  1. Include a plot of the regression line with the data. Comments.
xvalues = seq(0, 200, 1)
newNitrogen = data.frame(Nitrogen=xvalues, Nitrogen2=xvalues^2)
head(newNitrogen)
predYields = predict(quad_fit, newNitrogen)

plot(Nitrogen$Nitrogen, Nitrogen$Yield)
lines(xvalues, predYields)
Explain what this code is doing: 
  1. Include a VIF diagnostic. If necessary, center the X-values and re-run the regression.

  1. State clearly why you think the model is or is not a good fit.

    answer
  2. Run the model again with centered X-values. Write down the model and include a VIF diagnostic.

Did we just do this?
  1. The models produced by Parts (a) and (e) are algebraically equivalent. Why might it be better nonetheless to use a model with centered values?



4. Create a cubic model

a. First create the model using and un-centered X-variable. Include summary output, the written regression function, diagnostics and a VIF analysis.


comments? QQ plot??
#vif(cubic_fit)
  1. Repeat Part (a) using a centered polynomial.
Aha moment here
comments???
QQ plot always the same?

  1. Did centering the polynomial have an appreciable effect on VIF?

    Answer

5. Re-run the quadratic model from Number 3 using the smaller ‘units of 20 pounds’ variable as X. a. Write the equation of the quadratic model. Include diagnostics as you wish.

The model is:
yield_hat = 
residual patterns?
QQ plot?
  1. Compare the VIF of this model to the VIF obtained using the larger values for X (uncentered) in Part 2(c).

  2. Some statisticians say that centering is most important when the x-values are large (far from zero). Use centered values to run the quadratic model one more time and note the new VIF.

xbar_20 = mean(Nitrogen$Nit_by_20)
Nitrogen$Nit_by_20_C = Nitrogen$Nit_by_20 - xbar_20
Nitrogen$Nit_by_20_C2 =Nitrogen$Nit_by_20_C^2
Nitrogen

quad_fit_20_C = lm(Nitrogen$Yield ~ Nitrogen$Nit_by_20_C + Nitrogen$Nit_by_20_C2)
summary(quad_fit_20_C)

vif(quad_fit_20_C)
Why? What does 'far from zero' mean?