knitr::opts_chunk$set(echo = TRUE)
library(car)
## Loading required package: carData
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
there is a good chance this will be a pain to learn now and might take a while to get the hang of, but it’s a useful tool with lots of sweet functions and will make it easier for us to grade homeworks.
first things first, you can type anywhere without #s as long as you’re not in a code chunk! #s change the font to be a header. see what # does vs ##.
similarly, but not as critical for this class, you can bold things or italicize things for emphasis. see [https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf] for more tips on formatting!
create a code chunk with command+option+i for macs and windows+alt+i for PCs
# this is a code chunk!
# now, anything you write is treated as code, just as in the console or script editor. so you need #s to comment out text. the text formatting from above does not work in here
# you can type many things in the {r} at the top of the chunk, but it is sometimes helpful to add text (here I added 'intro chunk') so you can refer back to them at the 'chunk' section in the bottom of this rmarkdown.
i like to load my packages in this first set up chunk. you do need to have packages loaded in and files read in in chronological order. that means, if I tried to calculate the mean of a variable in the heart dataset in the above intro chunk it WOULD NOT work when I tried to save the file and ‘knit’ it. more on knitting in a minute…
but basically the code might work in R Studio (because a data set might already be loaded in) but just practice good habits and read in data first!
for future assignments when you have a lot of packages to load in, try adding ‘echo=F, include=F’ to the {r setup} chunk so it becomes {r setup echo=F, include=F}. echo=F means when you knit, it will not include any of the text you get after loading a package (i.e Attaching package: ‘psych’ The following object is masked from ‘package:car’: logit). include=F means you won’t even see what packages you loaded when you knit.
knitting a document is basically how you are seeing this. i chose to knit my document to html so you can easily click the link, but you can also knit to pdf or word doc. I recommend knitting to pdf because it looks nice, but you need latex installed to knit to pdf, and this can be a huge headache and not totally the point of the class.
so, try to knit to PDF, if you get an error, try to resolve it yourself :-) if you can’t! just use word docs. in order to knit to html and have someone else be able to see it, you need to knit, then hit publish, and i use rpubs to publish it.
question: do the cars in the mtcars dataset have an average efficiency that is different than 17 mph?
head(mtcars) # look at the data. should be loaded in from our packages earlier
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# the mean mpg in the mtcars dataset:
mean(mtcars$mpg)
## [1] 20.09062
write out your model comparison:
model C: \(mpg_i=17+\epsilon_i\) model A: \(mpg_i=beta+\epsilon_i\)
(you can actually do in-line equations which is neat but don’t stress or get carried away with it if its annoying)
## writing this in a code chunk to show you how i wrote that inline equation:
## model C: $mpg_i=17+\epsilon_i$
## model A: $mpg_i=beta+\epsilon_i$
run model comparison. basically how we’re going to do this is create a new variable that asks whether everyones mpg is different from 17.
mtcars$mpg17<- mtcars$mpg-17 ### this is creating a variable that we will test! we are not doing the math here -- will let the computer do that
model.c<- lm(mtcars$mpg17 ~ 0) # this is saying run a linear model (lm) setting up the null (the 0)
model.a<- lm(mtcars$mpg17 ~ 1) # this is saying run a linear model (lm) testing the mpg 17 variable against only an intercept (the 1)
anova(model.c, model.a)
## Analysis of Variance Table
##
## Model 1: mtcars$mpg17 ~ 0
## Model 2: mtcars$mpg17 ~ 1
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 32 1431.7
## 2 31 1126.0 1 305.66 8.4149 0.006788 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model.c)
##
## Call:
## lm(formula = mtcars$mpg17 ~ 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.600 -1.575 2.200 5.800 16.900
##
## No Coefficients
##
## Residual standard error: 6.689 on 32 degrees of freedom
summary(model.a)
##
## Call:
## lm(formula = mtcars$mpg17 ~ 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.6906 -4.6656 -0.8906 2.7094 13.8094
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.091 1.065 2.901 0.00679 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.027 on 31 degrees of freedom
notice how above in the summary output for model C there are no coefficients (not even an intercept), it is simply the null model that predicts 17 for every observation.
anyway! lots of ways to look at output here. try to see where we got our numbers from in the power point example.