Running Multiple Regression in R: Example

Jose M. Fernandez

Introduction to Econometrics

Example

In this presentation we will learn how to

Preperation

If this is your first time on R, there are some useful packages you should consider installing

Loading data

R can import data from many different data types, but the easiest is the comma-separated textfile or .csv.

I have provided a dataset called Grades.

download the file here “https://dl.dropboxusercontent.com/u/25290502/Grades.csv

Then replace the location in the command below.

Grades<- read.csv("C:/Users/Jose/Dropbox/Public/Grades.csv")

SUMMARY OF THE DATA

An easy way to become familiar with the data is to use

summary(Grades)
##      BOOKS       ATTEND         GRADE     
##  Min.   :0   Min.   : 6.0   Min.   :37.0  
##  1st Qu.:1   1st Qu.:10.0   1st Qu.:51.0  
##  Median :2   Median :15.0   Median :60.5  
##  Mean   :2   Mean   :14.1   Mean   :63.5  
##  3rd Qu.:3   3rd Qu.:18.2   3rd Qu.:73.0  
##  Max.   :4   Max.   :20.0   Max.   :97.0

This code gives you the mean, median, and quartile values.

We can also calculate the mean and standard deviation of individual variables

mean(Grades$BOOKS) #Mean number of Books
## [1] 2
sd(Grades$BOOKS) #Standard Deviation of Books
## [1] 1.432

Boxplots (1)

A boxplot is useful to look at the relative distribution of each variable.

boxplot(Grades, col="lightgray")

plot of chunk unnamed-chunk-4

Plot Y and X

A simple two dimensional plot can tell us a little about the relationship between the variables.

plot(Grades$ATTEND,Grades$GRADE, ylab="Course Grades",xlab="Classes Attended")

plot of chunk unnamed-chunk-5

Simple Linear Regression

In class, I will show you how to run a regression from R Commander, but here is the command line code for our example

regress.results <- lm(GRADE ~ ATTEND, data = Grades)
summary(regress.results)
## 
## Call:
## lm(formula = GRADE ~ ATTEND, data = Grades)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -27.78 -10.90   2.02  12.43  31.76 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   36.998      8.169    4.53  5.7e-05 ***
## ATTEND         1.883      0.555    3.39   0.0016 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.8 on 38 degrees of freedom
## Multiple R-squared:  0.233,  Adjusted R-squared:  0.212 
## F-statistic: 11.5 on 1 and 38 DF,  p-value: 0.00163

Plot Part 2

We can now graph a line through our scatter plot

plot(Grades$ATTEND,Grades$GRADE, ylab="Course Grades",xlab="Classes Attended")
abline(coef = coef(regress.results))

plot of chunk unnamed-chunk-7

Multiple Regression

In class, I will show you how to run a regression from R Commander, but here is the command line code for our example

regress.results <- lm(GRADE ~ ATTEND+BOOKS, data = Grades)
summary(regress.results)
## 
## Call:
## lm(formula = GRADE ~ ATTEND + BOOKS, data = Grades)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20.80 -13.37   0.06   9.17  32.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   37.379      7.745    4.83  2.4e-05 ***
## ATTEND         1.283      0.587    2.19    0.035 *  
## BOOKS          4.037      1.753    2.30    0.027 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.1 on 37 degrees of freedom
## Multiple R-squared:  0.329,  Adjusted R-squared:  0.292 
## F-statistic: 9.06 on 2 and 37 DF,  p-value: 0.000628