Off Campus Scholarly Assignment After Action Report

Alan T. Arnholt
4/10/15

Thank you Department of Mathematical Sciences for supporting my OCSA!

I visited colleagues in Spain summer 2014
I spent most of my OCSA in my office writing
Skype and version control are great!
- Version Control: is a system that records changes to a file or set of files over time so that you can recall specific versions later.
I wrote over 1700 pages of peer reviewed materials

Picture Proof

History

I had planned to spend the summer of 2012 in Spain working with colleagues.
April 2012 I was diagnosed with an auto-immune disease with unknown prognosis, so Spain plans were shelved.

Time for Catch Up

Once I stopped taking all medication, energy levels improved
Requested an OCSA to catch up on what I had planned to start Summer 2012
OCSA granted for Fall 2014!

What I did during my OCSA

I worked on the second edition of Probability and Statistics with R
- The actual text (should appear in print sometime this summer) (\( \approx \) 1000 pages)
- Complete solutions manual (for all adopters) (\( \approx \) 700 pages)
- The software package PASWR2 - Published
- Students solutions manual (\( \approx \) 300 pages)
- Web page for book

Book Cover:

Tentative Cover

How my work flow has changed:

Everything I do uses version control
- I use git for version control
- I use GitHub and Bitbucket for both public and private repositories
- My students are using git with private repositories on GitHub this semester
I am using RMarkdown to write class materials
- My students are using RMarkdown for all of their work
- This presentation was written using RMarkdown
I use knitr for everything

knitr

The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package.

Small Example:

library(PASWR2)
library(ggplot2)
ggplot(data = GRADES, aes(x = sat, y = gpa)) +
  geom_point() + stat_smooth(method = "lm")

plot of chunk unnamed-chunk-1

Tweaked Example:

plot of chunk unnamed-chunk-2

Some python:

x = 'hello, python world!'
print(x)
print(x.split(' '))
fibs = (0, 1, 1, 2, 3) # tuple
print(fibs[3])

hello, python world!
['hello,', 'python', 'world!']
2

Other things:

Run bash scripts
Run perl scripts
Ok, just about any scripting language

Select Book Additions:

ggplot2
Coverage Probability
Case and residual resampling
Cross validation

ggplot2 Code:

library(dplyr)
EPI <- EPIDURALF %>% 
  mutate(BMI = kg/(cm/100)^2)
g1 <- ggplot(data = EPI, aes(x = BMI)) + 
  geom_density(fill = "pink") + 
  theme_bw() + 
  labs(x = expression(kg/m^2))
MD <- median(EPI$BMI)
SP <- IQR(EPI$BMI)
c(MD, SP)

[1] 30.827622  6.362825

The median BMI is 30.83 \( \text{kg}/\text{m}^2 \) and the BMI interquartile range is 6.36 \( \text{kg}/\text{m}^2 \).

ggplot2 Density Plot:

plot of chunk unnamed-chunk-5

ggplot2 Density Plot using fill:

ggplot(data = EPI, aes(x = BMI, fill = ease)) +
  geom_density(alpha = 0.5) + 
  theme_bw()

plot of chunk unnamed-chunk-6

R Code:

levels(EPI$ease) <- c("Easy", "Difficult", "Impossible")
g2 <- ggplot(data = EPI, aes(x = treatment, fill = ease)) + 
  geom_bar(position = "fill", colour = "black") +
  theme_bw() + 
  labs(x = "", y = "Fraction") + 
  guides(fill = guide_legend("Ease of\nPalpating\nPatient")) +
  facet_grid(.~ doctor) + 
  theme(axis.text.x  = element_text(angle = 75, hjust = 1.0))

ggplot2 Segmented Bar Graph:

plot of chunk unnamed-chunk-8

ggplot2 Boxplot Code:

g3 <- ggplot(data = EPI, aes(x = treatment, y = BMI, fill = treatment)) + 
    geom_boxplot() +
    theme_bw() + 
    facet_grid(ease ~ doctor) + 
    theme(axis.text.x  = element_text(angle = 75, hjust = 1.0)) +
    guides(fill = FALSE) + 
    labs(x = "")

ggplot2 Boxplot with Facets:

plot of chunk unnamed-chunk-10

Confidence Intervals for a Proportion:

Wald Confidence Interval

\[ \text{CI}_{1-\alpha}(\pi)=\left[p-z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}},\: p+z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}} \right] \]

Agresti-Coull

\[ \text{CI}_{1-\alpha}(\pi)=\left[\tilde{p}-z_{1-\alpha/2} \sqrt{\frac{\tilde{p}(1-\tilde{p})}{\tilde{n}}},\: \tilde{p}+z_{1-\alpha/2} \sqrt{\frac{\tilde{p}(1-\tilde{p})}{\tilde{n}}} \right] \]

Confidence Intervals (continued):

Clopper-Pearson/Exact

\[ \text{CI}_{1-\alpha}(\pi)=\left[\beta_{\alpha/2, x, n - x + 1}, \beta_{1 - \alpha/2, x + 1, n - x} \right] \]

Wilson/Score Confidence Interval

\[ \text{CI}_{1-\alpha}(\pi) = \left[\dfrac{p+\frac{z^2_{1-\alpha/2}}{2n}-z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}+\frac{z^2_{1-\alpha/2}}{4n^2}}}{\left(1+\frac{z^2_{1-\alpha/2}}{n} \right)}\right., \]

\[ \left. \dfrac{p+\frac{z^2_{1-\alpha/2}}{2n}+z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}+\frac{z^2_{1-\alpha/2}}{4n^2}}}{\left(1+\frac{z^2_{1-\alpha/2}}{n} \right)} \right] \]

Coverage Probability:

The coverage probability of a confidence interval procedure for estimating \( \pi \) at a fixed value of \( \pi \) is

\[ C_n(\pi)=\sum_{k=0}^{n}I(k,\pi)\binom{n}{k}\pi^k(1 - \pi)^{n-k} \]

where \( I(k,\pi) \) equals 1 if the interval contains \( \pi \) when \( X = k \) and equals 0 if it does not contain \( \pi \).

In other words, the coverage probability of a confidence interval is the proportion of all possible confidence intervals for a fixed \( \pi \) that contain \( \pi \).

Coverage Probability for Different CIs:

plot of chunk unnamed-chunk-11

Resources:

RStudio cheatsheets — Cheatsheets available for ggplot2, package development, data wrangling (dplyr, tidyr), RMarkdown, and shiny
knitr documentation — Information about using knitr
Your local UseR group (coming soon?)

This Semester:

Private GitHub repositories
- STT 2810
- STT 3851
Project based courses
- STT 2810 will present research (9:30-12:15, April 30, PLEMMONS STUDENT UNION - 137A Calloway Peak)
- STT 3851 will present research (2-3:15, April 30, PLEMMONS STUDENT UNION - 137A Calloway Peak)
CrowdGrader
Thats All Folks