Module 1 Lesson 3 :Intro to R Functions and Preview of R Data types

Preview of R Functions:

In the following example, we first define a function oddcount() in the text editor. We then call the function on a couple of test cases. The function is supposed to count the number of odd numbers in its argument vector.

#create the function
oddcount<-function(x) {k<-0 
    for (n in x){ if (n%%2==1) k<-k+1}
    return(k)}
#test the function
oddcount(c(1,3,5))

## [1] 3

oddcount(c(1,2,3,7,9))

## [1] 4

We first told R to define a function oddcount() of one argument x.
The left brace demarcates the start of the body of the function and the right most brace demarcates the end of the function body.
Note that arguments in R functions are read-only, in that a copy of the argument is made to a local variable, and changes to the latter don’t affect the original variable. Thus changes to the original variable are typically made by reassigning the return value of the function.

Preview of Some R Data Structures

Basic Data Types

Basic data types are numeric, character and logical.

# Numeric object: How old are you?
my_age <- 18
# Character  object: What's your name?
my_name <- "Nicolas"
# logical object: Are you a data scientist?
# (yes/no) <=> (TRUE/FALSE)
is_datascientist <- TRUE

*Note that, character vector can be created using double (“) or single (’) quotes.

"My Friend's name is Jerome"

## [1] "My Friend's name is Jerome"

It’s possible to use the function class() to see what type a variable is:

class(my_age)

## [1] "numeric"

class(my_name)

## [1] "character"

You can also use the functions is.numeric(), is.character(), is.logical() to check whether a variable is numeric, character or logical, respectively. For instance:

is.numeric(my_age)

## [1] TRUE

is.numeric(my_name)

## [1] FALSE

If you want to change the type of a variable to another one, use the as.* functions, including: as.numeric(), as.character(), as.logical(), etc.

my_age

## [1] 18

# Convert my_age to a character variable
as.character(my_age)

## [1] "18"

Note that the conversion of a character to a numeric will output NA (for not available). R doesn’t know how to convert a numeric variable to a character variable.

Vectors

The vector type is the R workhorse.
We already have presented an example fo a vector in the previous slides.

Matrices

A matrix corresponds to the mathematical concept, i.e. a rectangular array
It is a vector with two attributes added - the numbers of rows and columns.
Here is an example code to create a matrix:

    m<-rbind(c(1,4),c(2,2))
    m

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    2

Matrices

    m%*% c(1,1)

##      [,1]
## [1,]    5
## [2,]    4

We used the rbind() function to build a matrix from two vectors, storing the result in m.
We the print the matrix to check if we produced the intended matrix.
We use the %*% operator to perform matrix multiplication.

Lists

An R list is like a container whose contents can be items of diverse data types.
A common usage is to package the return values of elaborate statistical functions. For example, the lm() function performs regression analysis, computing not only the estimated coefficient but also residuals, hypothesis tests, and so on. These are pacakged into a list, thus, enabling a single return value.
List members are indicated in R with dollar signs. Thus, x$u is the u component in teh list x.

Data Frames

A typical data set contains data of diverse types, e.g. numerical and character string.
A data frame is technically a list, with each component being a vector corresponding to a column in our data “matrix.”
The designers of R have set things up so that many matrix operations can also be applied to data frames.

Analysis Preview: Regression Analysis

There won’t be much actual programming in this example, but it will illustrate usage of some of the data types from the last section, will introduce R’s style of object-oriented programming, and will and serve as the basis for several of our programming examples in subsequent lessons.

Midterm	Finals	Quiz
2.0	3.3	4.0
3.3	2.0	3.7
4.0	4.3	4.0
2.3	0.0	3.3
2.3	1.0	3.3
3.3	3.7	4.0

Midterm<-c(2.0,3.3,4.0,2.3,2.3,3.3)
Finals<-c(3.3,2.0,4.3,0.0,1.0,3.7)
Quiz<-c(4.0,3.7,4.0,3.3,3.3,4.0)
score<-data.frame(Midterm, Finals, Quiz)
score

Now let us try to predict final exam score from midterm exam score:

lma<-lm(score$Finals~score$Midterm)

The lm() (linear model) function call instructs R to fit the prediction equation:

$predicted-Finals=\beta_0 +\beta_1 * Midterm$ using least squares.

The results are returned in the object we’ve named lma of class “lm”. We can see the various components of that object by calling attributes():

attributes(lma)

## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"

For instance, the estimated values of the $\beta_i$ are stored in lma$coefficients. As usual, we can print them, by typing the name, and by the way save some typing by abbreviating:

lma$coef

##   (Intercept) score$Midterm 
##     -1.293886      1.282751

Since lma$coefficients is a vector, printing it is simple. But consider what happens when we print the object lma itself:

lma

## 
## Call:
## lm(formula = score$Finals ~ score$Midterm)
## 
## Coefficients:
##   (Intercept)  score$Midterm  
##        -1.294          1.283

We can get a more detailed printout of the contents of lma by calling summary(), another generic function, which in this case triggers a call to summary.lm() behind the scenes:

summary(lma)

## 
## Call:
## lm(formula = score$Finals ~ score$Midterm)
## 
## Residuals:
##       1       2       3       4       5       6 
##  2.0284 -0.9392  0.4629 -1.6564 -0.6564  0.7608 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)    -1.2939     2.5308  -0.511    0.636
## score$Midterm   1.2828     0.8567   1.497    0.209
## 
## Residual standard error: 1.497 on 4 degrees of freedom
## Multiple R-squared:  0.3592, Adjusted R-squared:  0.199 
## F-statistic: 2.242 on 1 and 4 DF,  p-value: 0.2087

Session Data

As you proceed through an interactive R session, R will record the commands you submit.
And as you long as you answer yes to the question “Save workspace image?” put to you when you quit the session, R will save all the objects you created in that session, and restore them in your next session. You thus do not have to recreate the objects again from scratch if you wish to continue work from before.
The saved workspace file is named .Rdata, and is located either in the directory from which you invoked this R session (Linux) or in the R installation directory (Windows).
You can also save the image yourself, to whatever file you wish, by calling save.image(). You can restore the workspace from that file later on by calling load().

Module 1 Lesson 3 :Intro to R Functions and Preview of R Data types

Stat 312 Statistical Computing

Roel Ceballos

Preview of R Functions:

Preview of Some R Data Structures

Basic Data Types

Vectors

Matrices

Matrices

Lists

Data Frames

Analysis Preview: Regression Analysis

Session Data