This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Discussion Instructor Bowen Hu 1335 I email: bhu35@wisc.edu her office hours are: MF 11-12 Data: we are studiying the influence of soil temperature on early growth of read oak seedligns, the following data were collected: x=daily degree hours of soil heat, y= shoot elongation cm per seedling.
discx <- c(344 ,494, 612, 467, 358, 587, 335, 499, 560, 682, 307, 562, 436, 635, 322, 532)
discy <- c(4.7, 9.3, 12.8, 9.8, 3.7, 11.5, 3.3, 8.2 ,10.3, 13.0, 4.9, 10.7, 6.0, 11.1, 3.4 ,10.7)
Creating dataframe out of the 2 vectors above
Naming dataframe
names(heat_growth) <- c("degree_hours", "elongation")
Optionally Print the dataframe that we just created
heat_growth
Calling package ggplot2 into our global environment (you might need to install it first)
library(ggplot2)
Creating scatter plot using the ggplot2 R package
disc_scatter_plot <- ggplot(heat_growth, aes(x= degree_hours,
y = elongation)) +geom_point()+
labs(x= "Daily degree hours of soil Heat", y = "Shoot elongation (cm)")+
geom_smooth(method = lm, se=F)
disc_scatter_plot
Using R based functions to plot
plot(heat_growth$degree_hours,heat_growth$elongation, xlab = "Daily Degrees hours of soil heat", ylab= "Shoot elongation (cm)")
abline(fit_model)
# Ploting using R based functions
#adding a fitting line
Fiting a model #the “velocity~load” statement below is called a “formula” it indicates that we’ll use load to explain velocity
summary(fit_model)# summary of the fitted model
Call:
lm(formula = elongation ~ degree_hours, data = heat_growth)
Residuals:
Min 1Q Median 3Q Max
-1.37300 -0.81473 0.01413 0.74061 1.90535
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.832051 1.086548 -4.447 0.000552 ***
degree_hours 0.027252 0.002185 12.474 5.68e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.028 on 14 degrees of freedom
Multiple R-squared: 0.9175, Adjusted R-squared: 0.9116
F-statistic: 155.6 on 1 and 14 DF, p-value: 5.678e-09
The least squares estimates for the intercept and slope coefficients are given in the Estimates column. The t statistics are obtained by dividing each coefficient by its corresponding standard error. The p-values in the Pr(T>|t|) column correspond to tests of a coefficient being 0 vs. the two-sided alternative. Intercept corresponds to Bo, the slope corresponds to the name of the x variable.
Constructing QQplot of residuals and “residuals vrs fitted values” plot and comment. Residuals are extracted with the function residuals(). QQ plot on ggplot2 based function residuals (fit_model) extracts the residuals from the model QQnorm can be done using this code: qqnorm(residuals(fit_model)) Residuals vrs fitted was created using this code: plot(residuals(fit_model)~fitted(fit_model), ylab = “fitted values”, xlab= “residuals”) abline(h=0)
residuals(fit_model)
1 2 3 4 5 6 7 8 9 10
0.1573474 0.6695405 0.9537991 1.9053457 -1.2241813 0.3351003 -0.9973842 -0.5667197 -0.1290945 -0.7538441
11 12 13 14 15 16
1.3656730 0.2164014 -1.0498408 -1.3729979 -0.5431076 1.0339628
qqnorm(residuals(fit_model), main = "QQplot of residuals")
plot(residuals(fit_model)~fitted(fit_model), ylab = "Fitted values", xlab= "Residuals", main= "Residuals vrs Fitted value")
abline(h=0)
Constructing ANOVA table for this dataset and testing whether the slope is significantly different than zero
anova(fit_model)
Analysis of Variance Table
Response: elongation
Df Sum Sq Mean Sq F value Pr(>F)
degree_hours 1 164.369 164.369 155.61 5.678e-09 ***
Residuals 14 14.788 1.056
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Predicting We can use the predict() function and the fitted model object fit to make predictions of the y values for new data