The following data set contains carbon dioxide levels for the years from 1990 to 2010. The two variables are years and CO\( _2 \) levels. The question is: Is there a relationship between year and CO\( _2 \) level?
library(resampledata)
head(Maunaloa)
ID Year Level
1 1 1990 357.08
2 2 1991 359.00
3 3 1992 359.45
4 4 1993 360.07
5 5 1994 361.48
6 6 1995 363.62
Plot CO\( _2 \) as a function of year:
plot(Maunaloa$Level~Maunaloa$Year)
There appears to be a “strong positive linear relationship”.
plot(NBA1617$PercFG~NBA1617$OffReb)
There is a positive relationship, but not strong.
cor(Maunaloa$Year, Maunaloa$Level)
[1] 0.9951825
So this correlation is quite strong.
the correlation is usually denoted with \( r \)
cor(NBA1617$PercFG, NBA1617$OffReb)
[1] 0.4898055
lm(Level~Year, data=Maunaloa)
Call:
lm(formula = Level ~ Year, data = Maunaloa)
Coefficients:
(Intercept) Year
-3279.593 1.826
So the least-squares regression line is \[ Level=-3279.6+1.826\cdot Year \].
Find the regression line for the BB example.
lm(PercFG~OffReb, data=NBA1617)
Call:
lm(formula = PercFG ~ OffReb, data = NBA1617)
Coefficients:
(Intercept) OffReb
42.82337 0.05762
So, the least-squares regression line is \[ PercFG=42.823+0.058\cdot OffReb \]
levelregression <- lm(Level~Year, data=Maunaloa)
predict(levelregression, newdata=data.frame(Year=2015))
1
400.6084
library(dplyr)
NBA1617 %>%
select(Name, PercFG, OffReb) %>%
top_n(n=5, OffReb) %>%
arrange(desc(OffReb))
Name PercFG OffReb
1 Tristan Thompson 60.0 286
2 LaMarcus Aldridge 47.7 172
3 Michael Kidd-Gilchrist 47.7 156
4 David Lee 59.0 149
5 Kevin Love 42.7 148
bbregression <- lm(PercFG~OffReb, data=NBA1617)
predict(bbregression, newdata=data.frame(OffReb=300))
1
60.1099