Reading data into R

## Settings for RMarkdown http://yihui.name/knitr/options#chunk_options
opts_chunk$set(comment = "", warning = FALSE, message = FALSE, tidy = FALSE, 
    echo = TRUE, fig.width = 5, fig.height = 5)
options(width = 116, scipen = 10)

setwd("~/Documents/HSPH/useR.at.HSPH/")

Example files

csv: http://academic.cengage.com/resource_uploads/downloads/0538733497_245020.zip (Rosner zip file)
space-separated: http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
tab-separated: http://academic.cengage.com/resource_uploads/downloads/0495384968_88615.zip (Rosner zip file)
Excel file: http://academic.cengage.com/resource_uploads/downloads/0538733497_245022.zip (Rosner zip file)
SAS native file: http://www.biostat.harvard.edu/~fitzmaur/ala2e/smoking.sas7bdat
SAS xport file: ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2009-2010/DEMO_F.xpt
Stata native file: http://www.biostat.harvard.edu/~fitzmaur/ala2e/headache.dta
HTML table: http://www.drugs.com/top200_2003.html

Configuration of RStudio

Configure RStudio following instructions in: http://www.slideshare.net/kaz_yos/install-and-configure-r-and-rstudio

Create a dedicated R study group folder, set it as the default working directory in RStudio, and put everything related in it. This will save you confusion if you are not sure what a working directory is.

CSV, space-separated files, and tab-separated files: read.___() functions in the default installation

You can also use RStudio menus: Workspaces - Import Dataset - From Text Files…

## Read CSV file (header assumed), then put that into "csv.data" data object (any name is ok).
csv.data <- read.csv("file.csv")

## This gives you a dialogue to choose a file, then the file is passed to read.csv() function
csv.data <- read.csv(file.choose())

## Space-separated (no header assumed)
ssf.data <- read.table("file.dat") # No header
ssf.data <- read.table("file.dat", header = TRUE) # With header

## tab-separated (header assumed)
tsv.data <- read.delim("file.tsv")

Excel file (Mac OS X): gdata package is the simplest. Use read.xls() function.

install.packages("gdata", dep = T)      # If you have not installed it before
library(gdata)

excel.data <- read.xls("file.xls")

Excel file (Windows): xlsx package is relatively easy.

install.packages("xlsx", dep = T)      # If you have not installed it before
library(xlsx)

## You need to specifiy the sheetIndex (sheet number)
excel.data <- read.xlsx("file.xlsx", sheetIndex = 1)

SAS files: sas7bdat package's read.sas7bdat() for native files, foreign package's read.xport() for xport files.

## Native files
install.packages("sas7bdat", dep = T)      # If you have not installed it before
library(sas7bdat)

sas7bdat.data <- read.sas7bdat("file.sas7bdat")

## xport files
install.packages("foreign", dep = T)      # If you have not installed it before
library(foreign)

xport.data <- read.xport("file.xpt")

Stata native files: foreign package's read.dta() function.

install.packages("foreign", dep = T)      # If you have not installed it before
library(foreign)                          # If you have not loaded the package in the current session.

dta.data <- read.dta("file.xpt")

Fixed-width text files: read.fwf() in the default installation.

## Single line per observation
fwf.data <- read.fwf(“file.txt”, width = c(3, 5, ...))

## Multiple lines per observation
fwf.data <- read.fwf(“file.txt”, width = list(c(3, 5, ...), c(5, 7, ...)))

HTML tables: XML package's readHTMLTable()

install.packages(“XML”, dep = T)
library(XML)

html.table.data <- readHTMLTable("http://www.drugs.com/top200_2003.html", which = 2, skip.rows = 1)

For detailed example of html table extraction: http://rpubs.com/kaz_yos/1273