Moving Around in R

Matthias Bannert (KOF, ETH Zurich)

Excursion: Introduction to R Studio

What we will see:

console, and script window: 4 basic panes
auto-suggest, code highlighting
projects and working directories (use getwd())

Basic Assignment

# assign a value, = is possible, but  '<-'  is R-ish
a <- 1
is.vector(a)

[1] TRUE

# a vector with multiple elements, 
# !! watch out 'c' is a base R function!!
b <- c(1,2,3)
# careful! c is a bad name for a custom object !
d <- 1:10
d

 [1]  1  2  3  4  5  6  7  8  9 10

Basic R classes

vector
matrix (columns have same length AND same class)
data.frame (columns have the same length)
list (elements can have arbitrary length and classes, lists can also be nested)

Exploring data in R

# what's in my R Session 
ls()
# look at the first few lines of dataset
data(swiss)
head(swiss) # what might tail() do ?
# what type of data am I dealing with
str(swiss)

Basic Indexing

# get 2nd element of a vector
d[2]

[1] 2

# get 2nd line (row) of a matrix or data.frame
swiss[2,] # guess what second column would be ?

         Fertility Agriculture Examination Education Catholic
Delemont      83.1        45.1           6         9    84.84
         Infant.Mortality
Delemont             22.2

# all but first element: negative indexing
d[-1]

[1]  2  3  4  5  6  7  8  9 10

TASK 1

Assign the 2nd and 3rd column to a an R Object called sec_third.
Find out the class of the newly created object.
Concatenate both columns of sec_third into a single R object called sng.

Name Based Indexing

head(swiss[,c("Education","Catholic")])

             Education Catholic
Courtelary          12     9.96
Delemont             9    84.84
Franches-Mnt         5    93.40
Moutier              7    33.77
Neuveville          15     5.16
Porrentruy           7    90.57

# conveniently extract single vectors
swiss$Fertility[1:3]

[1] 80.2 83.1 92.5

Missing Values (NA)

na <- c(1,2,3,4,NA,6)
# sum(na) # NA
sum(na,na.rm=T) # 16

[1] 16

# set NA
is.na(na) <- 1
na[is.na(na)] <- c(1,5)
na

[1] 1 2 3 4 5 6

Data from the Outside

native R binaries: .RData (load())
comma separated values: .csv (seperator often: “;”) (read.table(), read.csv())
third party formats: .xlsx, .xls, .dta (stata), SPSS, …
databases (RPostgreSQL,ROracle, RMySQL, …)

-> Hint: Most read/write functions can also handle data from a remote server.

Data from the Outside

# let's save our work
save(d,swiss,file="session1.RData")
# clear the entire memory
rm(list=ls())
ls()

character(0)

# load 
load("session1.RData")
ls()

[1] "d"     "swiss"

Read CSV format (more live examples)

# find our working directory
getwd()

[1] "/Users/mbannert/Phd/teaching/exploring_stats_with_R/course_slides"

# read narcissistic personality inventory into R
# previously downloaded from http://personality-testing.info/_rawdata/
# 
npi <- read.csv2("../data/data-session-1.csv",sep=",")
# how would you see what you have got?

Subsetting

# using indexing to subset
children <- npi[npi$age < 18,]
# use subset to subset
children2 <- subset(npi,age < 18)
identical(children,children2)

[1] TRUE

TASK 2

Create a new R Studio project called 'tasks' and create an Rmarkdown file task-1.Rmd. Use this file to document what you are doing and to present your results.
Name 3 potentially useful packages for pyschological measurement that don't ship with a standard R installation.
Create a matrix called m with 4 rows and 3 columns. Create a data.frame d with 3 rows and 4 columns. Find out what the function t() can do for you.
Use the example dataset swiss. a) What's the average education? Explain the measurement. b) Display the last 10 rows of the entire dataset.
What's the minimal Agriculture? Which Province has it?