RMDFILE=test

html :
Rscript -e “require(knitr); require(markdown); knit('$ (RMDFILE).rmd', ' $(RMDFILE).md'); markdownToHTML('$ (RMDFILE).md', ' $(RMDFILE).html', options=c('use_xhml', 'base64_images')); browseURL(paste('file://', file.path(getwd(),'$(RMDFILE).html'), sep=''))”

Title

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).

When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

You can also embed plots, for example:

plot(cars)

plot of chunk unnamed-chunk-2

that ends the pre-packaged .rmd help code..now to jeremy anglim's blog
first xihui table

library(xtable)
print(xtable(head(iris)), type = "html")

	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
1	5.10	3.50	1.40	0.20	setosa
2	4.90	3.00	1.40	0.20	setosa
3	4.70	3.20	1.30	0.20	setosa
4	4.60	3.10	1.50	0.20	setosa
5	5.00	3.60	1.40	0.20	setosa
6	5.40	3.90	1.70	0.40	setosa

Exercise 2.11

To quote Problem 2.11 from Gelman et al (2004)

Posterior inference: suppose there is Beta(4,4) prior distribution on the probability $ \theta $ that a coin will yield a 'head' when spun in a specified manner. The coin is independently spun ten times, and 'heads' appear fewer than 3 times. You are not told how many heads were seen, only that the number is less than 3. Calculate your exact posterior density ( up to a proportionality consant) for $ \theta $ and sketch it.

library(xtable)

Explore Beta distribution

First let's examine the specific Beta distribution used in the exercise $ y_i = \alpha + \beta x_i + e_i $.

$ \text{Beta}(4,4) $.

curve(dbeta(x, 4, 4), from = 0, to = 1)

plot of chunk unnamed-chunk-4

The distribution appears unimodal, continuous on the interval $ [0, 1] $ and symmetric.
From the plot, the mode and mean appear to be 0.50. This is consistent with the formula for the mean, $ E[X] = \frac{\alpha}{\alpha + \beta} = \frac{4}{4+4} = 0.5 $, and the formula for the mode, $ \frac{\alpha - 1}{\alpha + \beta -2} = \frac{4 -1}{4 + 4 -2} = \frac{3}{6} = 0.5 $.

Now let's write a function to represent the beta distribution from scratch:

beta_distribution <- function(x, Alpha, Beta) {
    gamma(Alpha + Beta)/(gamma(Alpha) * gamma(Beta)) * x^(Alpha - 1) * (1 - 
        x)^(Beta - 1)
}

x <- seq(0, 1, 0.1)
p_x <- beta_distribution(x, 4, 4)

cbind(x, p_x, dbeta = dbeta(x, 4, 4))

##         x    p_x  dbeta
##  [1,] 0.0 0.0000 0.0000
##  [2,] 0.1 0.1021 0.1021
##  [3,] 0.2 0.5734 0.5734
##  [4,] 0.3 1.2965 1.2965
##  [5,] 0.4 1.9354 1.9354
##  [6,] 0.5 2.1875 2.1875
##  [7,] 0.6 1.9354 1.9354
##  [8,] 0.7 1.2965 1.2965
##  [9,] 0.8 0.5734 0.5734
## [10,] 0.9 0.1021 0.1021
## [11,] 1.0 0.0000 0.0000

We can confirm that it is a probability distribution by showing that it numerically integrates to one.

step_size <- 0.001
x <- seq(0, 1, step_size)
dbeta_x <- dbeta(x, 4, 4)
dbeta_x_step_size <- dbeta_x * step_size
sum(dbeta_x_step_size)

## [1] 1


plot(dbeta_x ~ x, pch = ".")
lines(x, x)
lines(x, cumsum(dbeta_x_step_size))

plot of chunk unnamed-chunk-6

This code shows how with a step_size of .001, the are numerically integrates to 1.
The figure shows the density, cumulative distribution, and x.

Explore Binomial likelihood

The problem states that after ten spins, there have been 0, 1, or 2 heads. We can assume that the observations are independent and identically distributed.

First, let us write an equation to produce the probability mass of a given set of parameters.

probability_binomial <- function(x, n, p) {
    choose(n = n, k = x) * p^x * (1 - p)^(n - x)
}

x <- 0:10
p_x <- probability_binomial(x, 10, 0.5)

Table <- cbind(x, p_x, dbinom = dbinom(x, 10, 0.5))
print(xtable(Table), type = "html", include.rownames = FALSE)

x	p_x	dbinom
0.00	0.00	0.00
1.00	0.01	0.01
2.00	0.04	0.04
3.00	0.12	0.12
4.00	0.21	0.21
5.00	0.25	0.25
6.00	0.21	0.21
7.00	0.12	0.12
8.00	0.04	0.04
9.00	0.01	0.01
10.00	0.00	0.00

plot(x, p_x, ylab = expression(p(theta)))

plot of chunk unnamed-chunk-8

Now let us examine the probability of at least 2 heads from 10 spins for varying values of $ \theta $.

x <- seq(0, 1, 0.01)
p_x <- pbinom(2, 10, x)

plot(x, p_x, type = "l", xlab = expression(theta), ylab = "p")
lines(x, dbinom(0, 10, x), col = "red")
lines(x, dbinom(1, 10, x), col = "blue")
lines(x, dbinom(2, 10, x), col = "purple")
legend(0.6, 0.8, c("p(x<3)", "p(x=0)", "p(x=1)", "p(x=2)"), lty = 1, 
    col = c("black", "red", "blue", "purple"))

plot of chunk likelihood

The probability of getting 0, 1, or 2 heads is the sum of the individual probabilities.
The plot shows the individual probabilities and the overall probabilities across the full range of possible values of $ \theta $.
The mode of $ p(x=k) $ is $ k/n=\theta $. For example, when $ \theta=0 $, $ p(x=0) = 1 $.

Calculate posterior numerically

The following plots the prior, likelihood, and posterior based on a grid of values of $ \theta $.

Theta <- seq(0, 1, 0.001)

Likelihood <- pbinom(2, 10, Theta)
Prior <- dbeta(Theta, 4, 4)
Numeric_Posterior <- Likelihood * Prior/sum(Likelihood * Prior)/0.001

plot(Theta, Numeric_Posterior, type = "l", ylim = c(0, 4), xlab = expression(theta), 
    ylab = "")
lines(Theta, Prior, col = "red")
lines(Theta, Likelihood, col = "blue")
legend(0.5, 3.5, c("Posterior", "Likelihood", "Prior"), lty = 1, 
    col = c("black", "blue", "red"))

abline(v = Theta[which.max(Numeric_Posterior)])

plot of chunk posterior

The vertical line shows the numerically estimated posterior mode (i.e., 0.281). Naturally, the posterior mode (and mean) is between the mode (and mean) of the prior and the likelihood.

Thus, we can write the following:

\[ \begin{align} p(\theta|y \in \{0,1,2\}) & \propto p(y \in \{0,1,2\}|\theta) p(\theta)\\ & \propto \sum_{i=0}^2 \binom{10}{i} \theta^i (1-\theta)^{10-i} \times \theta^{4-1}(1-\theta)^{4-1} \\ & = \sum_{i=0}^2 \binom{10}{i} \theta^{i + 3} (1-\theta)^{13-i} \\ \end{align} \]

The likelihood is the binomial distribution summed over the possible values of $ y $.
The prior is the beta distribution. The $ \frac{\Gamma(4+4)}{\Gamma(4)\Gamma(4)} $ was removed from the beta distribution because it does not vary with $ \theta $.
The equation was simplified by grouping terms.

We can use this equation in R to plot the posterior.

posterior_direct <- function(Theta) {
    sum(sapply(0:2, function(i) choose(10, i) * Theta^(i + 3) * (1 - Theta)^(13 - 
        i)))
}

Theta <- seq(0, 1, 0.001)
Equation_Posterior <- sapply(Theta, function(X) posterior_direct(X))
plot(Theta, Equation_Posterior, type = "l")

plot of chunk unnamed-chunk-9

The following confirms that the numeric and equation estimates of the posterior agree up to a proporitionality constant.

plot(Numeric_Posterior, Equation_Posterior, pch = ".")

plot of chunk unnamed-chunk-10

cor(Numeric_Posterior, Equation_Posterior)

## [1] 1

so ends jeremy anglim's stats homework example…
now we go to mage's blog which includes new features like googlevis chart

My first examples with knitr

Let's include some simple R code:

1 + 2

## [1] 3

That worked.

Let's include a plot:

df <- data.frame(x = 1:10, y = 1:10)
plot(y ~ x, data = df, pch = 19, col = "blue")

plot of chunk unnamed-chunk-12

Nice.

Now let's insert an interactive chart with googleVis:

## load the googleVis package
suppressPackageStartupMessages(library(googleVis))
## create the scatter chart
sc <- gvisScatterChart(data=df,
                        options=list(width=300, height=300, 
                                     legend='none',
                                     hAxis="{title:'x'}",
                                     vAxis="{title:'y'}")

          )

The output of gvisScatterChart is a complete web page, including <html> and <body> tags, but here I only need the chart itself which is stored in sc$html$chart. To include the chart in the output we have to set the result parameter for the code chunk to 'asis', or otherwise the html code generated by googleVis will be displayed as a string.

## {r results='asis'}
print(sc, "chart")  ## same as cat(sc$html$chart)

Hurray, it works! Hover over the dots to get more information about the values. I am sure this could be simplified further in the same way that I can use ggplot2 and lattice plots without the explicit print statement.

Next, let's create a geo chart:

geo <- gvisGeoChart(CityPopularity, locationvar = "City", colorvar = "Popularity", 
    options = list(region = "US", height = 350, displayMode = "markers", colorAxis = "{colors: ['orange','blue']}"))
print(geo, "chart")

Wonderful!

Unfortunately for charts which still rely on Flash, such as gvisMotionChart, gvisAnnotatedTimeLine and gvisGeoMap, the preview browser of RStudio will not display those charts at the moment. You have to open the output html-file in your system browser. But that is easy, just hit the browser button in the preview window, circled in red below.

RStudio preview window

Additionally you may have to add your working directory to the trusted location in the global secturity settings of your Flash Player.

Here is a little motion chart example:

M <- gvisMotionChart(Fruits, "Fruit", "Year", options = list(width = 550, 
    height = 450))
print(M, "chart")

Conclusion

RStudio and knitr might be the way forward to create quick analysis reports.

The markdown language should be sufficient for most tasks to draft a report, and the integration with RStudio makes it a real pleasure to work with knitr.

Unfortunately a spell checker is currently missing for knitr files, but I wouldn't be surprised if this will be added to a future version of RStudio.