title: “Regression_Correlation” output: html_document —
To create a scatter plot: See Graphing with R file.
To calculate regression equation: lm(dependent variable ~ independent variable)
You can also do: lm.out = lm(dependent variable ~ independent variable) –calculates the linear model (you can call it anything you want. It doesn’t have to be lm.out) lm.out –prints out the linear model
Example: Find the linear model for the amount of gas used based on temperature
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm(gas_consumed~temperature, data=Gas)
##
## Call:
## lm(formula = gas_consumed ~ temperature, data = Gas)
##
## Coefficients:
## (Intercept) temperature
## 4.571 -0.223
To plot the linear model on the scatter plot gf_point(dependent variable~independent variable, data=Data Frame, title=”type a title for the graph”)%>% gf_lm()–plots the linear model on the scatter plot
Example, draw the scatter plot and linear model on the scatter plot for gas consumed versus temperature.
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm.out<-lm(gas_consumed~temperature, data=Gas)
gf_point(gas_consumed~temperature, data=Gas, title="Gas Consumed vs Temperature")%>%
gf_lm()
To find and plot residuals: residuals(lm.out) –calculates the residuals gf_point(residuals(lm.out)%>% ~independent variable, data=Data Frame) –plots the residuals against the independent variable gf_hline(yintercept = 0) - plots a horizontal line through (0,0)
Example: Find and plot the residuals for gas consumed vs temperature.
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm.out<-lm(gas_consumed~temperature, data=Gas)
residuals(lm.out)
## 1 2 3 4 5 6
## 0.07256170 0.20706857 0.35166949 -0.25912868 -0.03682822 -0.01452777
## 7 8 9 10 11 12
## 0.04157544 -0.01382365 -0.51382365 -0.68002090 0.19838276 0.82068322
## 13 14 15 16 17 18
## 0.02068322 -0.13471586 -0.11241541 0.15448597 -0.02321357 -0.07861266
gf_point(residuals(lm.out)~temperature, data=Gas)%>%
gf_hline(yintercept = 0)
cor(dependent variable~independent variable, data=Data Frame) Example: Find the correlation coefficient for the amount of gas used based on temperature
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
cor(gas_consumed~temperature, data= Gas)
## [1] -0.7484644
The coefficient of determination is found by doing lm.out<-lm(dependent variable ~ independent variable, data=Data Frame) summary(lm.out) - this gives more information than you need, but the coefficient of determination is there (Multiple R Squared)
Example: Find the coefficient of determination for the amount of gas used based on temperature
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm.out<-lm(gas_consumed~temperature, data=Gas)
summary(lm.out) # look for Multiple R-squared
##
## Call:
## lm(formula = gas_consumed ~ temperature, data = Gas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68002 -0.10396 -0.01418 0.13400 0.82068
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5713 0.1592 28.719 3.41e-15 ***
## temperature -0.2230 0.0494 -4.514 0.000353 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3301 on 16 degrees of freedom
## Multiple R-squared: 0.5602, Adjusted R-squared: 0.5327
## F-statistic: 20.38 on 1 and 16 DF, p-value: 0.0003529
If you are testing for a correlation, you can use the command cor.test(dependent variable~independent variable, data=Data Frame)
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm.out<-lm(gas_consumed~temperature, data=Gas)
summary(lm.out)
##
## Call:
## lm(formula = gas_consumed ~ temperature, data = Gas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68002 -0.10396 -0.01418 0.13400 0.82068
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5713 0.1592 28.719 3.41e-15 ***
## temperature -0.2230 0.0494 -4.514 0.000353 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3301 on 16 degrees of freedom
## Multiple R-squared: 0.5602, Adjusted R-squared: 0.5327
## F-statistic: 20.38 on 1 and 16 DF, p-value: 0.0003529
You can also find the standard error of the estimate by using the command summary(lm.out). You will see the coefficients of the linear model, the t and p-value of the hypothesis test, the coefficient of determination, and the standard error of the estimate. In the row of the output that says your independent variables name are the t value and p-value. The standard error of the estimate is Residual standard error, and the coefficient of determination is Multiple R-squared.
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm.out<-lm(gas_consumed~temperature, data=Gas)
summary(lm.out)
##
## Call:
## lm(formula = gas_consumed ~ temperature, data = Gas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68002 -0.10396 -0.01418 0.13400 0.82068
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5713 0.1592 28.719 3.41e-15 ***
## temperature -0.2230 0.0494 -4.514 0.000353 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3301 on 16 degrees of freedom
## Multiple R-squared: 0.5602, Adjusted R-squared: 0.5327
## F-statistic: 20.38 on 1 and 16 DF, p-value: 0.0003529
This output also gives you the regression line, hypothesis test p-value, the coefficient of determination, and the standard error of the estimate. The regression line is formed using the numbers next to the (Intercept), which is the y-intercept, and next to temperature, which is the slope. This gives y-hat=4.5713+(-0.2230)x. The t value 28.719. The p-value is 3.41e-15. The coefficient of determination is 0.5602. The standard error of the estimate is 0.3301.
To calculate a C% prediction interval perform the commands lm.out = lm(dependent variable ~ independent variable, data= Data Frame) predict(lm.out, newdata=list(independent variable =value), interval=“prediction”, level=C) –will compute a prediction interval for the independent variable set to a particular value (put that value in place of the word value), at a particular C level (given as a decimal)
Example, find the 95% prediction interval for the amount of gas consumed when the temperature is 3.5 degrees
Gas <-read.csv("https://krkozak.github.io/MAT160/consumption.csv")
lm.out <-lm(gas_consumed ~ temperature, data=Gas)
predict(lm.out, newdata=list(temperature=3.5), interval="prediction", level=0.95)
## fit lwr upr
## 1 3.790819 3.06823 4.513408