Using RStudio to Learn R

*with a nod to Stata users

This example taken from a Stata demo [http://www.stata.com/capabilities/session.html] [Stata Website 9/6/2012]. That was re-written in R [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r][R Wiki]. It has been modified for use with RStudio.

The 4 RStudio panes

Upper-left (UL) pane – Source where you edit your code
Lower-left (LL) pane – Console where you enter commands and view results.
Upper-right (UR) pane – Workspace,History where you can view the objects in your workspace and access previously run commands
Lower-right (LR) pane – Files, Plots, Packages, Help

A Sample Session

Here, we load the “foreign” library, which is used to read and write data sets in formats used by SAS, SPSS, Stata and others.
The package “ggplot2” is an alternate to default graphic commands in R. “hmisc” has some utilities that are handy. [Note: lines in the shaded boxes on this page are the actual R code.]

library(foreign)
library(ggplot2)
library(Hmisc)

Alternatively you could load the packages using the point-and-click tools in RStudio

In the LR pane, click the Packages tab
You will see a list of packages installed, if foreign, ggplot2 and Hmisc are not there, click Install packages to download and install them.
Packages that are installed may still not be loaded. To load, simply 'check' them.

Thousands of user written commands are available in R. For more information visit the [http://cran.r-project.org/web/packages/][R-Project website]
and check out the CRAN TASK VIEWS.

Getting the Data

The dataset we will use for this session is about vintage 1978 automobiles sold in the United States.

The next line downloads the “auto” dataset from the Stata website, converts it from Stata to R and stores it in a temporary R dataframe named “auto”.

auto <- read.dta("http://www.stata-press.com/data/r9/auto.dta")

In the UR pane of the RStudio screen you will see the “auto” dataframe listed with 74 obs of 12 variables. Click on it and a data browser will appear in the UL pane.

browser

The History tab in the UR pane keeps track of commands R has run, successfully and unsuccessfully. The commands can then easily be rerun. After opening the auto dataframe, you can see that the command to browse the data is: View(auto)

History

Note: If a command intrigues you, you can precede any command with a question mark on the command line in Console pane to find help e.g. ?View This will open the help tab in the LR quadrant. Or you can begin typing the command and press 'tab' to view syntax.

Help

Exploring the Data Further

Attributes

We can get more information about the structure of the “auto” dataframe including the type of variables and general information on the dataframe itself.

str(auto)

## 'data.frame':    74 obs. of  12 variables:
##  $ make        : chr  "AMC Concord" "AMC Pacer" "AMC Spirit" "Buick Century" ...
##  $ price       : int  4099 4749 3799 4816 7827 5788 4453 5189 10372 4082 ...
##  $ mpg         : int  22 17 22 20 15 18 26 20 16 19 ...
##  $ rep78       : int  3 3 NA 3 4 3 NA 3 3 3 ...
##  $ headroom    : num  2.5 3 3 4.5 4 4 3 2 3.5 3.5 ...
##  $ trunk       : int  11 11 12 16 20 21 10 16 17 13 ...
##  $ weight      : int  2930 3350 2640 3250 4080 3670 2230 3280 3880 3400 ...
##  $ length      : int  186 173 168 196 222 218 170 200 207 200 ...
##  $ turn        : int  40 40 35 40 43 43 34 42 43 42 ...
##  $ displacement: int  121 258 121 196 350 231 304 196 231 231 ...
##  $ gear_ratio  : num  3.58 2.53 3.08 2.93 2.41 ...
##  $ foreign     : Factor w/ 2 levels "Domestic","Foreign": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "datalabel")= chr "1978 Automobile Data"
##  - attr(*, "time.stamp")= chr "13 Apr 2005 17:45"
##  - attr(*, "formats")= chr  "%-18s" "%8.0gc" "%8.0g" "%8.0g" ...
##  - attr(*, "types")= int  18 252 252 252 254 252 252 252 252 252 ...
##  - attr(*, "val.labels")= chr  "" "" "" "" ...
##  - attr(*, "var.labels")= chr  "Make and Model" "Price" "Mileage (mpg)" "Repair Record 1978" ...
##  - attr(*, "version")= int 8
##  - attr(*, "label.table")=List of 1
##   ..$ origin: Named num  0 1
##   .. ..- attr(*, "names")= chr  "Domestic" "Foreign"

To further examine a particular variable we can ask for specific attributes of a variable by name. R can hold multiple data sets in memory at the same time, so each reference to a variable needs to specify which dataset it belongs to. When running a command that references many columns, repeating the name of the data set can become quite tedious.

However, if you are using RStudio, you can type the first letter(s) of a column name and press the 'tab' key to select the column name from a list.

str(auto$foreign)

##  Factor w/ 2 levels "Domestic","Foreign": 1 1 1 1 1 1 1 1 1 1 ...

class(auto$foreign)

## [1] "factor"

length(auto$foreign)

## [1] 74

An alternative way of avoiding repetition is to use the attach() function. Attach tells R that variables names will refer to the specified dataframe (until onecalls detach()). The attraction of this approach is that it cuts down even more on typing in the name of the data set and makes many commands more similar to how they look in Stata. A drawback is that if one is making changes to the dataset, it is easy to get confused if it is the old or new data set that is attached. Most R afficianadoes will advise against the use of attach, but it is certainly up to you.

Summaries

summary(auto)

##      make               price            mpg           rep78     
##  Length:74          Min.   : 3291   Min.   :12.0   Min.   :1.00  
##  Class :character   1st Qu.: 4220   1st Qu.:18.0   1st Qu.:3.00  
##  Mode  :character   Median : 5006   Median :20.0   Median :3.00  
##                     Mean   : 6165   Mean   :21.3   Mean   :3.41  
##                     3rd Qu.: 6332   3rd Qu.:24.8   3rd Qu.:4.00  
##                     Max.   :15906   Max.   :41.0   Max.   :5.00  
##                                                    NA's   :5     
##     headroom        trunk          weight         length         turn     
##  Min.   :1.50   Min.   : 5.0   Min.   :1760   Min.   :142   Min.   :31.0  
##  1st Qu.:2.50   1st Qu.:10.2   1st Qu.:2250   1st Qu.:170   1st Qu.:36.0  
##  Median :3.00   Median :14.0   Median :3190   Median :192   Median :40.0  
##  Mean   :2.99   Mean   :13.8   Mean   :3019   Mean   :188   Mean   :39.6  
##  3rd Qu.:3.50   3rd Qu.:16.8   3rd Qu.:3600   3rd Qu.:204   3rd Qu.:43.0  
##  Max.   :5.00   Max.   :23.0   Max.   :4840   Max.   :233   Max.   :51.0  
##                                                                           
##   displacement   gear_ratio       foreign  
##  Min.   : 79   Min.   :2.19   Domestic:52  
##  1st Qu.:119   1st Qu.:2.73   Foreign :22  
##  Median :196   Median :2.96                
##  Mean   :197   Mean   :3.02                
##  3rd Qu.:245   3rd Qu.:3.35                
##  Max.   :425   Max.   :3.89                
##

describe(auto)

## auto 
## 
##  12  Variables      74  Observations
## ---------------------------------------------------------------------------
## make 
##       n missing  unique 
##      74       0      74 
## 
## lowest : AMC Concord AMC Pacer   AMC Spirit  Audi 5000   Audi Fox   
## highest: Volvo 260   VW Dasher   VW Diesel   VW Rabbit   VW Scirocco 
## ---------------------------------------------------------------------------
## price 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      74    6165    3780    3913    4220    5006    6332 
##     .90     .95 
##   11081   13157 
## 
## lowest :  3291  3299  3667  3748  3798
## highest: 12990 13466 13594 14500 15906 
## ---------------------------------------------------------------------------
## mpg 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      21    21.3   14.00   14.30   18.00   20.00   24.75 
##     .90     .95 
##   28.70   32.05 
## 
## lowest : 12 14 15 16 17, highest: 30 31 34 35 41 
## ---------------------------------------------------------------------------
## rep78 
##       n missing  unique    Mean 
##      69       5       5   3.406 
## 
##           1  2  3  4  5
## Frequency 2  8 30 18 11
## %         3 12 43 26 16
## ---------------------------------------------------------------------------
## headroom 
##       n missing  unique    Mean 
##      74       0       8   2.993 
## 
##           1.5  2 2.5  3 3.5  4 4.5 5
## Frequency   4 13  14 13  15 10   4 1
## %           5 18  19 18  20 14   5 1
## ---------------------------------------------------------------------------
## trunk 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      18   13.76    7.00    8.00   10.25   14.00   16.75 
##     .90     .95 
##   20.00   20.35 
## 
##           5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23
## Frequency 1 1 3 5 4  5  8  3  4  4  5 12  8  1  6  2  1  1
## %         1 1 4 7 5  7 11  4  5  5  7 16 11  1  8  3  1  1
## ---------------------------------------------------------------------------
## weight 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      64    3019    1895    2026    2250    3190    3600 
##     .90     .95 
##    4051    4186 
## 
## lowest : 1760 1800 1830 1930 1980, highest: 4130 4290 4330 4720 4840 
## ---------------------------------------------------------------------------
## length 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      47   187.9   154.7   158.2   170.0   192.5   203.8 
##     .90     .95 
##   218.0   221.0 
## 
## lowest : 142 147 149 154 155, highest: 220 221 222 230 233 
## ---------------------------------------------------------------------------
## turn 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      18   39.65   33.65   34.00   36.00   40.00   43.00 
##     .90     .95 
##   45.00   46.00 
## 
##           31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 48 51
## Frequency  1  1  2  6  6  9  4  3  1  6  4  7 12  3  3  3  2  1
## %          1  1  3  8  8 12  5  4  1  8  5  9 16  4  4  4  3  1
## ---------------------------------------------------------------------------
## displacement 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      31   197.3   87.95   97.00  119.00  196.00  245.25 
##     .90     .95 
##  340.40  350.00 
## 
## lowest :  79  85  86  89  90, highest: 304 318 350 400 425 
## ---------------------------------------------------------------------------
## gear_ratio 
##       n missing  unique    Mean     .05     .10     .25     .50     .75 
##      74       0      36   3.015   2.365   2.442   2.730   2.955   3.352 
##     .90     .95 
##   3.714   3.780 
## 
## lowest : 2.19 2.24 2.26 2.28 2.41, highest: 3.73 3.74 3.78 3.81 3.89 
## ---------------------------------------------------------------------------
## foreign 
##       n missing  unique 
##      74       0       2 
## 
## Domestic (52, 70%), Foreign (22, 30%) 
## ---------------------------------------------------------------------------

Examining Values

List the value of the columns “make” and “mpg” for the first ten lines of the data set. Data sets in R are referenced using matrix notation, either using row/column numbers, or using names.

auto[1:10, c("make", "mpg")]

##             make mpg
## 1    AMC Concord  22
## 2      AMC Pacer  17
## 3     AMC Spirit  22
## 4  Buick Century  20
## 5  Buick Electra  15
## 6  Buick LeSabre  18
## 7     Buick Opel  26
## 8    Buick Regal  20
## 9  Buick Riviera  16
## 10 Buick Skylark  19

auto$foreign

##  [1] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
##  [8] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
## [15] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
## [22] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
## [29] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
## [36] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
## [43] Domestic Domestic Domestic Domestic Domestic Domestic Domestic
## [50] Domestic Domestic Domestic Foreign  Foreign  Foreign  Foreign 
## [57] Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign 
## [64] Foreign  Foreign  Foreign  Foreign  Foreign  Foreign  Foreign 
## [71] Foreign  Foreign  Foreign  Foreign 
## Levels: Domestic Foreign

as.numeric(auto$foreign)

##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [71] 2 2 2 2

Look more closely at the factor variable 'foreign' by copying it into a new variable

x <- auto$foreign

In the UR pane click on x to examine its attributes more closely.
factor

Missing values

Missing values in R are represented by 'NA' (value is Not Available). Unlike other stat packages, missing values in R are not less than or greater than other values, they simple aren't.
There are lots of useful 'na' functions to deal with them though.
Here we can see which rows have a missing value for repair record (rep78).

auto[is.na(auto$rep78), ]

##             make price mpg rep78 headroom trunk weight length turn
## 3     AMC Spirit  3799  22    NA      3.0    12   2640    168   35
## 7     Buick Opel  4453  26    NA      3.0    10   2230    170   34
## 45 Plym. Sapporo  6486  26    NA      1.5     8   2520    182   38
## 51 Pont. Phoenix  4424  19    NA      3.5    13   3420    203   43
## 64   Peugeot 604 12990  14    NA      3.5    14   3420    192   38
##    displacement gear_ratio  foreign
## 3           121       3.08 Domestic
## 7           304       2.87 Domestic
## 45          119       3.54 Domestic
## 51          231       3.08 Domestic
## 64          163       3.58  Foreign

By tabulating it and specifying useNA = “ifany”, we can see that it contains some missing values, indicated by NA (without useNA, missingvalues are silently omitted).

table(auto$rep78, useNA = "ifany")

## 
##    1    2    3    4    5 <NA> 
##    2    8   30   18   11    5

table(auto$rep78)

## 
##  1  2  3  4  5 
##  2  8 30 18 11

Combining commands

Here, the summary is genereated first for all cars, then for cars where MPG is below mean, and finally for cars where MPG is above mean. The results show that the more fuel-efficient cars cost less on average (they are likely to be smaller).

summary(auto$price)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3290    4220    5010    6170    6330   15900

summary(auto$price[auto$mpg < mean(auto$mpg)])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3290    4730    5720    7090    9250   15900

summary(auto$price[auto$mpg >= mean(auto$mpg)])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3300    3990    4480    4880    5250    9740

The with() command can be used to shorten the typing of the summary command above. The parameters to this function are the dataframe name followed by the R command. Any variables are assumed to be in this dataframe.

with(auto, summary(price[mpg >= mean(mpg)]))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3300    3990    4480    4880    5250    9740

The by() command can be used to execute commands on subsets of data, similarly to the by option in Stata. The last parameter can be the name of any function that can handle subsets of the first parameter as arguments

by(auto[, c("price", "mpg")], auto$foreign, summary)

## auto$foreign: Domestic
##      price            mpg      
##  Min.   : 3291   Min.   :12.0  
##  1st Qu.: 4186   1st Qu.:16.8  
##  Median : 4782   Median :19.0  
##  Mean   : 6072   Mean   :19.8  
##  3rd Qu.: 6200   3rd Qu.:22.0  
##  Max.   :15906   Max.   :34.0  
## -------------------------------------------------------- 
## auto$foreign: Foreign
##      price            mpg      
##  Min.   : 3748   Min.   :14.0  
##  1st Qu.: 4522   1st Qu.:21.0  
##  Median : 5759   Median :24.5  
##  Mean   : 6385   Mean   :24.8  
##  3rd Qu.: 7068   3rd Qu.:27.5  
##  Max.   :12990   Max.   :41.0

Other packages provide additional bells and whistles to standard R commands. For example, the epicalc package has a 'tab1' command that generates a table and histogram in one shot.

library(epicalc)
tab1(auto$rep78)

plot of chunk unnamed-chunk-13

## auto$rep78 : 
##         Frequency   %(NA+)   %(NA-)
## 1               2      2.7      2.9
## 2               8     10.8     11.6
## 3              30     40.5     43.5
## 4              18     24.3     26.1
## 5              11     14.9     15.9
## <NA>            5      6.8      0.0
##   Total        74    100.0    100.0

tableStack(headroom:length, by = foreign, dataFrame = auto)

##                       Domestic       Foreign      Test stat.            
## Total                 52             22                                 
##                                                                         
## Headroom (in.)                                    Ranksum test          
##   median(IQR)         3.5 (2.4,4)    2.5 (2.5,3)                        
##                                                                         
## Trunk space (cu. ft.)                             t-test  (72 df) = 3.27
##   mean(SD)            14.8 (4.3)     11.4 (3.2)                         
##                                                                         
## Weight (lbs.)                                     t-test  (72 df) = 6.25
##   mean(SD)            3317.1 (695.4) 2315.9 (433)                       
##                                                                         
## Length (in.)                                      t-test  (72 df) = 5.89
##   mean(SD)            196.1 (20)     168.5 (13.7)                       
##                                                                         
##                       P value
## Total                        
##                              
## Headroom (in.)        0.011  
##   median(IQR)                
##                              
## Trunk space (cu. ft.) 0.002  
##   mean(SD)                   
##                              
## Weight (lbs.)         < 0.001
##   mean(SD)                   
##                              
## Length (in.)          < 0.001
##   mean(SD)                   
##

detach("package:epicalc")

Hypothesis Testing

Below, we run a t-test to examine if the average gas mileage is higher or lower for foreign cars compared to domestic cars. To run a t test using a dummy variable (rather than comparing two columns), one must use a “formula”, which is a general way to represent variable relationships, and used throughout the statistical procedures in R.

with(auto, t.test(mpg ~ foreign, var.equal = TRUE))

## 
##  Two Sample t-test
## 
## data:  mpg by foreign 
## t = -3.631, df = 72, p-value = 0.0005254
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -7.661 -2.230 
## sample estimates:
## mean in group Domestic  mean in group Foreign 
##                  19.83                  24.77 
##

Correlation

Here we examine the correlation between mpg and weight. The two function calls pass the same data to the cor() function. However, in the first case the arguments are two vectors, while in the second case we pass a single argument of type data.frame. This has an effect on the output.

cor(auto$mpg, auto$weight)

## [1] -0.8072

cor(auto[, c("weight", "mpg", "length", "turn", "displacement")])

##               weight     mpg  length    turn displacement
## weight        1.0000 -0.8072  0.9460  0.8574       0.8949
## mpg          -0.8072  1.0000 -0.7958 -0.7192      -0.7056
## length        0.9460 -0.7958  1.0000  0.8643       0.8351
## turn          0.8574 -0.7192  0.8643  1.0000       0.7768
## displacement  0.8949 -0.7056  0.8351  0.7768       1.0000

We can also use by() to generate separate correlations for foreign and domestic cars, and we conclude the correlation exploration by examining a correlation table of a number of variables

by(auto[, c("weight", "mpg")], auto$foreign, cor)

## auto$foreign: Domestic
##         weight     mpg
## weight  1.0000 -0.8759
## mpg    -0.8759  1.0000
## -------------------------------------------------------- 
## auto$foreign: Foreign
##         weight     mpg
## weight  1.0000 -0.6829
## mpg    -0.6829  1.0000

Graphing Data

A plot of mileage by weight shows the expected, decreasing, relationship.

qplot(mpg, weight, data = auto)

plot of chunk unnamed-chunk-17

We can look at this relationship for foreign vs domestic vehicles by adding 'facets'.

qplot(weight, mpg, data = auto, facets = . ~ foreign)

plot of chunk unnamed-chunk-18

qplot(weight, mpg, data = auto, color = foreign, main = "Total")

plot of chunk unnamed-chunk-18

Model fitting: OLS

We want to model the relationship between MPG and weight, and we will estimate a linear regression model using the lm() command.

Based on the graphs, we judge the relationship nonlinear and will model MPG as a quadratic in weight. Also, based on the graphs, we judge the relationship to be different for domestic and foreign cars. We will include an indicator (dummy) variable for foreign and evaluate afterwards whether this adequately describes the difference. Thus, we will fit the model:

\[ mpg = \beta_0 + \beta_1 weight + \beta_2 weight^2 + \beta_3 foreign + \hat{l} \]

Note that the lm() command returns an object containing the regression results, so we will need to assign the results to a variable and call summary() on them. (If we forget to assign the results to a variable, the default printing method will be used to output the estimated coefficients, but that format is typically not what we want.

mod1 <- lm(mpg ~ weight + I(weight^2) + foreign, auto)
summary(mod1)

## 
## Call:
## lm(formula = mpg ~ weight + I(weight^2) + foreign, data = auto)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.636 -1.899 -0.302  0.810 13.852 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.65e+01   6.20e+00    9.12  1.6e-13 ***
## weight         -1.66e-02   3.97e-03   -4.18  8.4e-05 ***
## I(weight^2)     1.59e-06   6.25e-07    2.55    0.013 *  
## foreignForeign -2.20e+00   1.06e+00   -2.08    0.041 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 3.28 on 70 degrees of freedom
## Multiple R-squared: 0.691,   Adjusted R-squared: 0.678 
## F-statistic: 52.3 on 3 and 70 DF,  p-value: <2e-16 
##

In RStudio, you can examine the mod1 object. It holds lots of information that you can access directly. We can pull the predicted (fitted) values from the lm results in mod1 and assign them to a column in the data set.

auto$mpghat <- predict(mod1)

Now plot


qplot(weight, mpg, data = auto, facets = . ~ foreign, geom = c("point")) + 
    geom_line(data = auto, aes(weight, mpghat, color = "Mpghat")) + scale_colour_manual("Mileage -- Fitted values", 
    breaks = "Mpghat", values = "red") + opts(legend.position = "bottom")

plot of chunk unnamed-chunk-21

Now, we show our results to an engineer. “No,” he says. “It should take twice as much energy to move 2,000 pounds 1 mile compared with moving 1,000 pounds, and therefore twice as much gasoline. Miles per gallon is not quadratic in weight; gallons per mile is a linear in weight.”

We go back to the computer to estimate the “engineerically correct” model. First we do a quick plot of gpm vs weight.

auto$gpm <- 1/auto$mpg

qplot(gpm, weight, data = auto, color = foreign, xlab = {
    "Gallons per mile (gpm)"
}, geom = c("point", "smooth"))

## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-22

Satisfied that the engineer is indeed correct, we rerun the regression.

mod2 <- lm(gpm ~ weight + foreign, data = auto)
summary(mod2)

## 
## Call:
## lm(formula = gpm ~ weight + foreign, data = auto)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.021376 -0.002258  0.000529  0.004324  0.010898 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -7.35e-04   4.02e-03   -0.18   0.8555    
## weight          1.62e-05   1.18e-06   13.74   <2e-16 ***
## foreignForeign  6.22e-03   2.00e-03    3.11   0.0027 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.00632 on 71 degrees of freedom
## Multiple R-squared: 0.762,   Adjusted R-squared: 0.756 
## F-statistic:  114 on 2 and 71 DF,  p-value: <2e-16 
##

Lo and behold, after specifying the model correctly, we find that foreign cars in 1978 were actually less efficient. Foreign cars may have yielded better gas mileage than domestic cars in 1978, but this was only because they were so light. Long live American manufacturing!

Keeping track of your work

Use the Project menu in RStudio to create separate projects for your work. RStudio will save everything in your workspace and command history so you can pick-up exactly where you left off.

Conclusion

useR