install.packages('tidyverse')Start Up, Import, and Regress
In this short exercise, you will be introduced to installing and loading R packages. You will also import a dataset, estimate a regression model, and create a graphic. This tutorial is intended to follow along with the getting started video on the course site. Try to follow along, but don’t worry if you don’t understand. We will learn more about each of these steps later in the course. If you can get the code below to run, you are good to go for the first day of class!
Installing and loading packages
After you have installed R and R Studio, install the tidyverse package. To do so, open RStudio and execute the following code in the R console by hitting enter:
Next, try loading some of the tidyverse libraries (there are many!). Tip: it is best to keep track of your code in an R script (File > New File > R Script). Highlight the lines of code you want to run and select “Run” (or Ctrl + Enter).
library(tidyverse)
library(broom)
library(readxl)Importing data
Import the sample excel file: ansc_quart1.xlsx. First, save the file from the course website to your machine. Next, use the read_excel function to import the file to your R environment, make sure to reference the file path on your machine.
Make sure to use “/” rather than “\” when specifying a file path. Alternatively, in RStudio, you can select File > Import Dataset > From Excel (make sure to copy the code preview to your script for future reference).
ansc_quart1 <- read_excel('FILEPATH/ansc_quart1.xlsx')ansc_quart1| x | y |
|---|---|
| 10 | 8.04 |
| 8 | 6.95 |
| 13 | 7.58 |
| 9 | 8.81 |
| 11 | 8.33 |
| 14 | 9.96 |
| 6 | 7.24 |
| 4 | 4.26 |
| 12 | 10.84 |
| 7 | 4.82 |
| 5 | 5.68 |
Regression
Take a quick look at the dataset above. Let’s use a statistical technique called regression to predict Y using the values in X. In regression parlance, we call this “a regression of Y on X.”
lin_mod <- lm(y ~ x, data = ansc_quart1)
summary(lin_mod)
Call:
lm(formula = y ~ x, data = ansc_quart1)
Residuals:
Min 1Q Median 3Q Max
-1.92127 -0.45577 -0.04136 0.70941 1.83882
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0001 1.1247 2.667 0.02573 *
x 0.5001 0.1179 4.241 0.00217 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.237 on 9 degrees of freedom
Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
Output predictions and residuals for each observation:
augment(lin_mod)| y | x | .fitted | .resid | .hat | .sigma | .cooksd | .std.resid |
|---|---|---|---|---|---|---|---|
| 8.04 | 10 | 8.001000 | 0.0390000 | 0.1000000 | 1.311535 | 0.0000614 | 0.0332440 |
| 6.95 | 8 | 7.000818 | -0.0508182 | 0.1000000 | 1.311479 | 0.0001042 | -0.0433179 |
| 7.58 | 13 | 9.501273 | -1.9212727 | 0.2363636 | 1.056460 | 0.4892093 | -1.7779327 |
| 8.81 | 9 | 7.500909 | 1.3090909 | 0.0909091 | 1.218483 | 0.0616370 | 1.1102882 |
| 8.33 | 11 | 8.501091 | -0.1710909 | 0.1272727 | 1.310017 | 0.0015993 | -0.1481007 |
| 9.96 | 14 | 10.001364 | -0.0413636 | 0.3181818 | 1.311496 | 0.0003829 | -0.0405092 |
| 7.24 | 6 | 6.000636 | 1.2393636 | 0.1727273 | 1.219936 | 0.1267565 | 1.1019046 |
| 4.26 | 4 | 5.000455 | -0.7404545 | 0.3181818 | 1.272721 | 0.1226999 | -0.7251598 |
| 10.84 | 12 | 9.001182 | 1.8388182 | 0.1727273 | 1.099742 | 0.2790296 | 1.6348730 |
| 4.82 | 7 | 6.500727 | -1.6807273 | 0.1272727 | 1.147055 | 0.1543412 | -1.4548813 |
| 5.68 | 5 | 5.500546 | 0.1794545 | 0.2363636 | 1.309605 | 0.0042680 | 0.1660660 |
To better understand regression, we can visualize with a scatter plot.
ansc_quart1 %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)`geom_smooth()` using formula = 'y ~ x'
Bam! Congrats on your first R code! You just cleared your first hurdle to getting into the world of R!