Read up on the data.
The CO2 data frame has 84 rows and 5 columns of data
from an experiment on the cold tolerance of the grass species
Echinochloa crus-galli.
The CO2​ uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2​ concentration.
conc - a numeric vector of ambient carbon dioxide
concentrations (mL/L).
uptake - a numeric vector of carbon dioxide uptake
rates
?datasets
library(help = "datasets")
help(CO2)
head(CO2)
## Grouped Data: uptake ~ conc | Plant
## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## 3 Qn1 Quebec nonchilled 250 34.8
## 4 Qn1 Quebec nonchilled 350 37.2
## 5 Qn1 Quebec nonchilled 500 35.3
## 6 Qn1 Quebec nonchilled 675 39.2
tail(CO2)
## Grouped Data: uptake ~ conc | Plant
## Plant Type Treatment conc uptake
## 79 Mc3 Mississippi chilled 175 18.0
## 80 Mc3 Mississippi chilled 250 17.9
## 81 Mc3 Mississippi chilled 350 17.9
## 82 Mc3 Mississippi chilled 500 17.9
## 83 Mc3 Mississippi chilled 675 18.9
## 84 Mc3 Mississippi chilled 1000 19.9
str(CO2)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of 5 variables:
## $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
## $ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
## $ conc : num 95 175 250 350 500 675 1000 95 175 250 ...
## $ uptake : num 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
## - attr(*, "formula")=Class 'formula' language uptake ~ conc | Plant
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "outer")=Class 'formula' language ~Treatment * Type
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Ambient carbon dioxide concentration"
## ..$ y: chr "CO2 uptake rate"
## - attr(*, "units")=List of 2
## ..$ x: chr "(uL/L)"
## ..$ y: chr "(umol/m^2 s)"
describe(CO2)
## vars n mean sd median trimmed mad min max range skew
## Plant* 1 84 6.50 3.47 6.5 6.50 4.45 1.0 12.0 11.0 0.00
## Type* 2 84 1.50 0.50 1.5 1.50 0.74 1.0 2.0 1.0 0.00
## Treatment* 3 84 1.50 0.50 1.5 1.50 0.74 1.0 2.0 1.0 0.00
## conc 4 84 435.00 295.92 350.0 408.53 259.46 95.0 1000.0 905.0 0.72
## uptake 5 84 27.21 10.81 28.3 27.33 14.83 7.7 45.5 37.8 -0.10
## kurtosis se
## Plant* -1.26 0.38
## Type* -2.02 0.05
## Treatment* -2.02 0.05
## conc -0.68 32.29
## uptake -1.35 1.18
We will estimate the following regression.
\[ CO2_i = \beta_0 + \beta_1 Uptake_i + \epsilon_i \]
lm(data = CO2,
formula = conc ~ uptake)
##
## Call:
## lm(formula = conc ~ uptake, data = CO2)
##
## Coefficients:
## (Intercept) uptake
## 73.71 13.28
\(\beta_1\) is 13.28 in the chart above.
An interesting interpretation of the slope parameter.
beta1 <- cov(CO2$conc, CO2$uptake)/var(CO2$uptake)
round(beta1, digits = 2)
## [1] 13.28
NOT A CO-INCIDENCE THAT THE TWO NUMBERS ARE THE SAME !
Formal Proof.
# Plot
?plot
## Help on topic 'plot' was found in the following packages:
##
## Package Library
## graphics /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## base /Library/Frameworks/R.framework/Resources/library
##
##
## Using the first match ...
plot(x = CO2$uptake,
y = CO2$conc,
xlab = "CO2 Uptake",
ylab = "C02 Concentration",
main = "Best Fit Line",
sub = "",
bg = "lightblue", # a vector of background colors (Graphical Parameters)
col = "black", # the colors for lines and points (Graphical Parameters)
cex = .9, # a numerical vector giving the amount by which plotting characters and symbols should be scaled relative to the default = 1 (Graphical Parameters)
pch = 21, # a vector of plotting characters or symbols (Graphical Parameters) {triangle, empty circle, filled circle, square,...}
frame = TRUE # frame.plot - a logical indicating whether a box should be drawn around the plot.
)
?abline
abline(reg = lm(data = CO2,
formula = conc ~ uptake),
lwd = 2, # line width, default = 1
col = "blue"
)
We know that \[\begin{align*} y_i & = \beta_0 + \beta_1 x_i \\ <=> \bar{y} & = \beta_0 + \beta_1 \bar{x} \\ \end{align*}\]
Rearrange the terms and solve for \(\beta_0\):
\[ \beta_0 = \bar{y}-\bar{x}*\beta_1 \]
beta0 <- mean(CO2$conc) - beta1 * mean(CO2$uptake)
beta0 # print the value of beta_0
## [1] 73.71
# beta_0 value should match the intercept value from the lm command
lm(data = CO2,
formula = conc ~ uptake)
##
## Call:
## lm(formula = conc ~ uptake, data = CO2)
##
## Coefficients:
## (Intercept) uptake
## 73.71 13.28
Intercept values are also the same (as expected) !!!
We will plot the data and best fit line below.
?plot
## Help on topic 'plot' was found in the following packages:
##
## Package Library
## graphics /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## base /Library/Frameworks/R.framework/Resources/library
##
##
## Using the first match ...
plot(x = CO2$uptake,
y = CO2$conc,
xlab = "CO2 Uptake",
ylim = c(0, 1000),
xlim = c(0,50),
ylab = "C02 Concentration",
main = "Best Fit Line",
sub = "",
bg = "lightblue", # a vector of background colors (Graphical Parameters)
col = "black", # the colors for lines and points (Graphical Parameters)
cex = .9, # a numerical vector giving the amount by which plotting characters and symbols should be scaled relative to the default = 1 (Graphical Parameters)
pch = 21, # a vector of plotting characters or symbols (Graphical Parameters) {triangle, empty circle, filled circle, square,...}
frame = TRUE # frame.plot - a logical indicating whether a box should be drawn around the plot.
)
?abline
abline(reg = lm(data = CO2,
formula = conc ~ uptake),
lwd = 2, # line width, default = 1
col = "blue"
)
This formula works for simple linear regression, where you have one independent variable (X) and one dependent variable (Y). However, when you move to multivariate regression, which involves multiple independent variables, the formula for calculating the coefficients becomes more complex.