Using Sweave in RStudio

Create a New Document

To create a new Sweave document, go to File | New | R Sweave. This provides a default template to get you started. Sweave is the combination of latex and R code, but we haven't put anything in there yet. Start by putting in some text on line 5 just to see what happens when we compile. Put in “Housing Prices” as we'll soon be adding more information and code. Now save your document inside the Sweave directory as sweaveHousing and RStudio will append a .Rnw

Side note on the file extension

Remember that a LaTeX document (using .TeX file extension) is just a plain document and does not have any R code. We're using Sweave and this combines a LaTeX document and allows you to “weave” in R code, thus Sweave (R is based off the S language). Again for Sweave, we use the .Rnw extension.

Compile to PDF

Typically in R, you would need to call various commands to get a .Rnw file into a PDF. This is because you first convert from Sweave to TeX and then finally to PDF. Thus .Rnw | .tex | .pdf. For this, you would use commands like R CMD Sweave <documentName>.Rnw and pdflatex <documentName>. RStudio makes this process easier by allowing you to click on button to go from .Rnw to PDF. Since we've got a saved file, sweaveHousing.Rnw, let's copile this to PDF by clicking the compile PDF button at the top of the editor. You'll notice RStudio running commands in the background and then it will display your PDF (very basic right now).

Add R commands

To put R commands into an Sweave file, you need to define them inside of a “R Code Chunk” with the tag << >>= and finish with @. For example:

<<chunkName>>=
x <- rnorm(100)
x
@

Enter this into the editor on the line below where you wrote “Housing Prices”. Compile the docuemnt again and you'll notice this is displayed as R code and it's output. To make inserting code chunks easier, RStudio has the Insert Chunk option from the Chunks menu at the top of the editor. This will help you insert a code chunk. Also notice that the background is a different color for code chunks. Test this out by entering another code chunk, naming it options, then add a comma after the chunk name. This will look like the following:

<<options, >>=

@

You'll notice the completion menu pops up and is there to help you with chunk options. Lets use the echo command and set it to FALSE. Enter the following R code and run the compile button again.

<<options, echo=FALSE>>=
x <- rnorm(n=100, mean=100, sd=5)
x
@

This time notice that the output is there, however the original R commands are not shown. This is because the echo command determines whether the R code will be displayed. Experiment some more with the eval, and echo options. Both are set to TRUE by default (if you don't specify these options as in our first chunk), but what happens if we set them both to FALSE? What about if you set results=hide as an option, but leave eval=TRUE (again this is default so you don't need to define it)? Does the computation still run? It does! The results are just not shown.

Add a plot

Lets add a plot, remember to use the Insert Chunk feature. Name the chunk examplePlot and don't define any chunk options:

<<examplePlot>>=
x <- rnorm(100)
y <- rnorm(100)
plot(x,y)
@

Compile again and see what happens. Why isn't a plot produced? This is just how Sweave works, and you need to add the fig=TRUE option for this to work. Try this again and add the fig=TRUE option to produce a plot.

plot of chunk unnamed-chunk-1

Load Data

Now let's do some real work and load in our data for homePrice.csv:

<<data>>=
homes <- read.csv("data/homePrice.csv")
summary(homes)
@

Why doesn't this work? Did you check your working directory to ensure you're insdie the rstudioTraining directory? Even if you are, this still won't work. This is becuase Sweave runs in a seperate process so your current working directory and your global environment are not recognized. This ensures that your documents will be reproducible when you give them to others to work with. All you need to do is define the full path to the data such as the following on my system:

<<data>>=
homes <- read.csv("~/Desktop/rstudioTraining/data/homePrice.csv")
summary(homes)
@

Add you analysis

Now add some parts of your analysis to produce plots and break this up into separate chunks. Between chunks you can enter text (using LateX) to describe what is going on in your work. At the bottom of this document is the R code from homePrices.R that we used earlier.

Use RStudio's Features

Now that you have a more substantial document (a few pages long with various plots) you can utalize RStudio's various features for Sweave. If you are not able to get this document to work, don't worry, an example one is provided in the sweave directory called homePrices.Rnw. You'll notice a few drawbacks of Sweave (that are nicer in knitr), such as there is only allowed one plot per chunk. Also if you don't call print() on a ggplot2 command, it won't be shown.

Chunk Navigation

After compiling your document, you can now use the chunk menu at the botton to navigate between chunks

Chunks act like R editor

You'll notice that code completion (using tab) work inside chunks. Also syntax highlighting shows your R code as if it was in a standard R environment. You can also use the run command to test your commands inside the console.

Error Navigation

If you haven't already seen, RStudio also has an interface for handling errors in .Rnw files. Usually R will only provide a log file that you have to dig through to understand the problem.

SynTeX

You can use Control + Click to navigate between your .Rnw and the resulting PDF document. Test this out even inside R chunks.

Cleaning Up

You'll notice a lot of leftover files inside the directory where your .Rnw file is saved (the sweave directory). These are all the intermediary files that are created to produce the PDF. These files can safely be deleted such as .aux, .log, etc. after you are done working with your file. Make sure to always save the .Rnw file though and usually you'll want the resulting .pdf file.

# Home Prices In Mid-West

# load in the data via the Import Dataset feature in the workspace
# Change "rating" to be treated as a factor
homes <- read.csv("~/Desktop/rstudioTraining/data/homePrice.csv")

# view the variable names and get a quick summary of the data
names(homes)
summary(homes)
hist(homes$price)

class(homes$rating)
homes$rating <- as.factor(homes[,3])
class(homes$rating)

# plot price based on the rating
plot(homes$rating, homes$price)

# make a more advanced plot with layers
homes$rating <- as.integer(homes[,3])
class(homes$rating)
ratingMeans <- rep(0,5)
for (i in 1:5) {
  ratingMeans[i] <- mean(subset(homes, rating == i)$price)
}

plot(homes$rating, homes$price, main = "Breakdown of Price by Rating",
     xlab = "Rating", ylab = "Price", las=2, labels=FALSE)
axis(1)
axis(2, at = c(0, 2e5, 4e5, 6e5, 8e5), labels = c("0", "200k", "400k", "600k", "800k"), las=2)

points(1:5, ratingMeans, col="red", pch=19)
text(1:4, ratingMeans[1:4], round(ratingMeans[1:4], -3), pos=4)
text(5, ratingMeans[5], round(ratingMeans[5], -3), pos=2)

# install the ggplot2 package from the packages pane
library(ggplot2)


# plot price based on the rating
qplot(rating, price, color = age, data = homes)


# make a more advanced plot with nice formatting
p <- qplot(rating, price, color = age, data = homes,
           main = "Breakdown of Price by Rating", xlab="Rating")
p <- p + opts(plot.title = theme_text(size = 23))
p <- p + opts(axis.title.x=theme_text(size=16, vjust=-0.05))
p <- p + opts(axis.title.y=theme_text(size=16))
p <- p + opts(legend.title=theme_text(size=14))
p <- p + scale_y_continuous(name="Price", labels=c("200k", "400k", "600k", "800k"))
p <- p + geom_point()

print(p)

# use the extract function feature to wrap the formatting commands into a function called formatPlot
# put it into a new file called formatPlot.R and source the file
source("~/Desktop/rstudioTraining/R/formatPlot.R")

# use the new function (use tab completion)
p <- qplot(age, price, color = rating, data = homes,
           main = "Breakdown of Prices by Age and rating", xlab="Age") + geom_smooth()
formatPlot(p)

p <- qplot(state, price, color = rating, data = homes,
           main = "Breakdown of Prices by State and Rating", xlab="State")
formatPlot(p)