Danny Kaplan
NIMBIOS Computing in the Cloud Workshop
A tutorial is … more interactive and specific than a book or a lecture; a tutorial seeks to teach by example and supply the information to complete a certain task. — Wikipedia
RStudio, mosaic, editor and Rmd.
These apply to many different software packages.
Using mainly mosaic and Rmd
save()ing, etc.Markdown is a straightforward markup language for producing documents quickly.
The RStudio editor provides easy facilities for working with Markdown and integrating it with R. (But doing this doesn't require RStudio.)
You are generally switching among contexts: teach class A, research project B, revise paper C, teach class D, test out epiphany E
Projects in RStudio are a simple way to keep these contexts distinct and organized, and to be able to switch back and forth. Documents being edited stay open.
Other useful tools for instructors:
It could be you, a historic figure, etc.
Include:
New File/R MarkdownKnit HTML buttonOne time: Install knitr package
.Rmd is a nice suffix.knit2html() function to compile to HTML, e.g.knit2html('GeorgeWashington.Rmd')
Available on CRAN: Install in the usual way, e.g.
install.packages('mosaic')
Example:
mean( height ~ sex, data=Galton )
Created by DTK, Randall Pruim (Calvin College), Nick Horton (Amherst College)
An idea that forms the basis of something
A thought, idea, or opinion formed or a remark made as a result of meditation
These are not a self-consistent and complete set of statements.
Man soll die Dinge so einfach machen wie möglich - aber nicht einfacher.
Everything should be made as simple as possible, but not simpler. — A. Einstein
Galtonmosaic syntax: graphics, basic descriptive states, modeling.
data=height in Galton)height broken down by sex )mother and father)mosaic extends some existing functions. Look at mean().
mean( height ~ sex, data=Galton)
F M
64.11 69.23
bwplot( height ~ sex, data=Galton )
xyplot( height ~ father, data=Galton )
lm( height ~ father, data=Galton )
Mean is so widely used, we didn't want to break it. So, mm() for modeling with groupwise means.
mm( height ~ sex, data=Galton )
Groupwise Model Call:
height ~ sex
Coefficients:
F M
64.1 69.2
Try fitted(), resid(), confint(), rsquared()
Many know about functions and their graphs.
mod <- lm( height ~ mother * sex, data=Galton)
f <- makeFun(mod)
f(mother=65,sex='F') - f(mother=64,sex='F')
1
0.3266
plotFun( f(mother=x, sex='F') ~ x, x.lim=c(40,75),
col='red')
plotFun( f(mother=x, sex='M') ~ x, add=TRUE)
At Macalester, we use R in the first calculus course (which is multivariable with a modeling theme).
D( sqrt(x*sin(x^2)) ~ x )
function (x)
0.5 * ((sin(x^2) + x * (cos(x^2) * (2 * x))) * (x * sin(x^2))^-0.5)
antiD( 1/x ~ x )
function (x, C = 0)
1 * log((x)) + C
Also: model fitting (with nonlinear parameters), differential equations, units and dimensions, …
Symbolic for the easy ones. Numeric otherwise.
But both return a function.
f <- antiD( x*sin(x) ~ x)
f
function (x, C = 0)
{
numerical_integration(.newf, .wrt, as.list(match.call())[-1],
formals(), from, ciName = intC, .tol)
}
<environment: 0x1084dda38>
f(3) - f(0)
[1] 3.111
If you're teaching data cleaning or wrangling, give students the raw data and have them produce a new set of data.
If you're teaching statistics or the content of the data, give them clean data in an easy to use form.
You can always say later, “Here's where that simple dataset came from.”
Data management and cleaning with nhanesOriginal
NHANES cleaned up
Something of a contradiction:
You can't use Word very well for reporting on R.
A Google Doc is much like Word.
mosaic package, fetchGoogle( link )Example: Bird species names from Macalester's Ordway Conservation Area logs from the 1970s and 80s
Resources/NamesForCleaning.csv.
Google broke my system: Use the old stuff. I'll try to figure something out with RXML or RJSONIO, or RGoogleDocs.
Packages are straightforward to write and install.
install_github('dtkaplan/NIMBIOS')
This package has an example of a collaborative editor:
f <- collaborate(doc="NIMBIOS")
f('edit')
Any time you want to bring this into R:
f('capture')
Least squares fitting:
mod1 <- mm( height ~ sex, data=Galton)
sum( resid(mod1)^2 )
[1] 5640
mod2 <- mm( height ~ sex, data=Galton, fun=median )
sum( resid(mod2)^2 )
[1] 5646
A template for the model:
myf <- makeFun( ifelse(sex=='F', 64, 69)~sex)
myf(sex='M')
[1] 69
Look at the residuals:
resids <- with(Galton, height - myf(sex))
sum( resids^2 )
[1] 5670
Are you using paper-and-pencil techniques but having the students save time and avoid error by doing them on the computer, e.g.
The statistical concepts are easier to understand if they are presented in their essence: the 3 Rs of statistical inference.
resample(), shuffle() and rand()mean( height ~ sex, data=resample(Galton) )
F M
64.10 69.33
lm( height ~ father, data=resample(Galton) )
(Intercept) father
41.507 0.365
mean( shuffle(height) ~ sex, data=Galton )
F M
66.73 66.79
lm( shuffle(height) ~ father, data=Galton )
(Intercept) father
62.03259 0.06829
ANOVA is not essentially about within-groups and between-groups variability.
It's about whether added terms in a model pull their weight:
mod1 <- lm( height ~ sex,
data=Galton )
mod2 <- lm( height ~ sex + family,
data=Galton )
mod2 includes a categorical variable giving the family.
rsquared( mod1 )
rsquared( mod2 )
length(coef(mod2))
[1] 198
rsquared(mod2)
[1] 0.7666
rsquared(lm( height ~ sex +
rand(196), data=Galton ) )
[1] 0.6133
To continue with the ANOVA example: generate a large number of the randomized model and compare with the value from mod2.
Loops? Accumulators? Application to vectors, or is it lists?
Just do it!
do(5)*rsquared(lm( height ~ sex + rand(196), data=Galton ) )
result
1 0.6092
2 0.6058
3 0.6242
4 0.6129
5 0.6077
samps <- do(200)*rsquared(lm(height ~ sex + rand(196), data=Galton ))
Use the test statistic directly:
pdata( .73, ~result, data=samps)
[1] 1
anova( mod1, mod2 )
Analysis of Variance Table
Model 1: height ~ sex
Model 2: height ~ sex + family
Res.Df RSS Df Sum of Sq F Pr(>F)
1 896 5640
2 700 2688 196 2953 3.92 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Example: I often start with a graph-reading exercise. Partly, this is to get students to realize that even if they think they know, they often do not.
Your scaffolds can do this. Make them attractive.
This is where R/Markdown is fantastic.
Code examples:
subset(), transform(), …Example: Bootstrapping
do() and resample() functions
Example: Hypothesis Testing
We need to document, which implies code. But the code for graphics gets difficult, and it's hard for students to play with the possibilities.
Generates the code needed to recreate the plot. (mBar() to come.)
mScatter( Galton )
mPlot( Galton, default='density')
Get started on these. See what progress you can make and then we'll return to them tomorrow.
lm(), when glm() is important. Example: Cardiac death and smoking in the NHANES-III data. Or: why and when when modeling yes/no outcomes is logistic regression more appropriate than linear modeling?