Jose M. Fernandez
Introduction to Econometrics
In this presentation we will learn how to
plot
summarize
estimate a linear regression in R.
If this is your first time on R, there are some useful packages you should consider installing
R commander
AER package
R can import data from many different data types, but the easiest is the comma-separated textfile or .csv.
I have provided a dataset called Grades.
download the file here “https://dl.dropboxusercontent.com/u/25290502/Grades.csv”
Then replace the location in the command below.
Grades<- read.csv("C:/Users/Jose/Dropbox/Public/Grades.csv")
An easy way to become familiar with the data is to use
summary(Grades)
## BOOKS ATTEND GRADE
## Min. :0 Min. : 6.0 Min. :37.0
## 1st Qu.:1 1st Qu.:10.0 1st Qu.:51.0
## Median :2 Median :15.0 Median :60.5
## Mean :2 Mean :14.1 Mean :63.5
## 3rd Qu.:3 3rd Qu.:18.2 3rd Qu.:73.0
## Max. :4 Max. :20.0 Max. :97.0
This code gives you the mean, median, and quartile values.
We can also calculate the mean and standard deviation of individual variables
mean(Grades$BOOKS) #Mean number of Books
## [1] 2
sd(Grades$BOOKS) #Standard Deviation of Books
## [1] 1.432
A boxplot is useful to look at the relative distribution of each variable.
boxplot(Grades, col="lightgray")
A simple two dimensional plot can tell us a little about the relationship between the variables.
plot(Grades$ATTEND,Grades$GRADE, ylab="Course Grades",xlab="Classes Attended")
In class, I will show you how to run a regression from R Commander, but here is the command line code for our example
regress.results <- lm(GRADE ~ ATTEND, data = Grades)
summary(regress.results)
##
## Call:
## lm(formula = GRADE ~ ATTEND, data = Grades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.78 -10.90 2.02 12.43 31.76
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.998 8.169 4.53 5.7e-05 ***
## ATTEND 1.883 0.555 3.39 0.0016 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.8 on 38 degrees of freedom
## Multiple R-squared: 0.233, Adjusted R-squared: 0.212
## F-statistic: 11.5 on 1 and 38 DF, p-value: 0.00163
We can now graph a line through our scatter plot
plot(Grades$ATTEND,Grades$GRADE, ylab="Course Grades",xlab="Classes Attended")
abline(coef = coef(regress.results))
In class, I will show you how to run a regression from R Commander, but here is the command line code for our example
regress.results <- lm(GRADE ~ ATTEND+BOOKS, data = Grades)
summary(regress.results)
##
## Call:
## lm(formula = GRADE ~ ATTEND + BOOKS, data = Grades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.80 -13.37 0.06 9.17 32.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.379 7.745 4.83 2.4e-05 ***
## ATTEND 1.283 0.587 2.19 0.035 *
## BOOKS 4.037 1.753 2.30 0.027 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.1 on 37 degrees of freedom
## Multiple R-squared: 0.329, Adjusted R-squared: 0.292
## F-statistic: 9.06 on 2 and 37 DF, p-value: 0.000628