In the following example, we first define a function oddcount() in the text editor. We then call the function on a couple of test cases. The function is supposed to count the number of odd numbers in its argument vector.
#create the function
oddcount<-function(x) {k<-0
for (n in x){ if (n%%2==1) k<-k+1}
return(k)}
#test the function
oddcount(c(1,3,5))## [1] 3
## [1] 4
We first told R to define a function oddcount() of one argument x.
The left brace demarcates the start of the body of the function and the right most brace demarcates the end of the function body.
Note that arguments in R functions are read-only, in that a copy of the argument is made to a local variable, and changes to the latter don’t affect the original variable. Thus changes to the original variable are typically made by reassigning the return value of the function.
Basic data types are numeric, character and logical.
# Numeric object: How old are you?
my_age <- 18
# Character object: What's your name?
my_name <- "Nicolas"
# logical object: Are you a data scientist?
# (yes/no) <=> (TRUE/FALSE)
is_datascientist <- TRUE*Note that, character vector can be created using double (“) or single (’) quotes.
## [1] "My Friend's name is Jerome"
It’s possible to use the function class() to see what type a variable is:
## [1] "numeric"
## [1] "character"
You can also use the functions is.numeric(), is.character(), is.logical() to check whether a variable is numeric, character or logical, respectively. For instance:
## [1] TRUE
## [1] FALSE
If you want to change the type of a variable to another one, use the as.* functions, including: as.numeric(), as.character(), as.logical(), etc.
## [1] 18
## [1] "18"
Note that the conversion of a character to a numeric will output NA (for not available). R doesn’t know how to convert a numeric variable to a character variable.
## [,1] [,2]
## [1,] 1 4
## [2,] 2 2
## [,1]
## [1,] 5
## [2,] 4
An R list is like a container whose contents can be items of diverse data types.
A common usage is to package the return values of elaborate statistical functions. For example, the lm() function performs regression analysis, computing not only the estimated coefficient but also residuals, hypothesis tests, and so on. These are pacakged into a list, thus, enabling a single return value.
List members are indicated in R with dollar signs. Thus, x$u is the u component in teh list x.
There won’t be much actual programming in this example, but it will illustrate usage of some of the data types from the last section, will introduce R’s style of object-oriented programming, and will and serve as the basis for several of our programming examples in subsequent lessons.
| Midterm | Finals | Quiz |
|---|---|---|
| 2.0 | 3.3 | 4.0 |
| 3.3 | 2.0 | 3.7 |
| 4.0 | 4.3 | 4.0 |
| 2.3 | 0.0 | 3.3 |
| 2.3 | 1.0 | 3.3 |
| 3.3 | 3.7 | 4.0 |
Midterm<-c(2.0,3.3,4.0,2.3,2.3,3.3)
Finals<-c(3.3,2.0,4.3,0.0,1.0,3.7)
Quiz<-c(4.0,3.7,4.0,3.3,3.3,4.0)
score<-data.frame(Midterm, Finals, Quiz)
scoreNow let us try to predict final exam score from midterm exam score:
The lm() (linear model) function call instructs R to fit the prediction equation:
\(predicted-Finals=\beta_0 +\beta_1 * Midterm\) using least squares.
The results are returned in the object we’ve named lma of class “lm”. We can see the various components of that object by calling attributes():
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
For instance, the estimated values of the \(\beta_i\) are stored in lma$coefficients. As usual, we can print them, by typing the name, and by the way save some typing by abbreviating:
## (Intercept) score$Midterm
## -1.293886 1.282751
Since lma$coefficients is a vector, printing it is simple. But consider what happens when we print the object lma itself:
##
## Call:
## lm(formula = score$Finals ~ score$Midterm)
##
## Coefficients:
## (Intercept) score$Midterm
## -1.294 1.283
We can get a more detailed printout of the contents of lma by calling summary(), another generic function, which in this case triggers a call to summary.lm() behind the scenes:
##
## Call:
## lm(formula = score$Finals ~ score$Midterm)
##
## Residuals:
## 1 2 3 4 5 6
## 2.0284 -0.9392 0.4629 -1.6564 -0.6564 0.7608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2939 2.5308 -0.511 0.636
## score$Midterm 1.2828 0.8567 1.497 0.209
##
## Residual standard error: 1.497 on 4 degrees of freedom
## Multiple R-squared: 0.3592, Adjusted R-squared: 0.199
## F-statistic: 2.242 on 1 and 4 DF, p-value: 0.2087
As you proceed through an interactive R session, R will record the commands you submit.
And as you long as you answer yes to the question “Save workspace image?” put to you when you quit the session, R will save all the objects you created in that session, and restore them in your next session. You thus do not have to recreate the objects again from scratch if you wish to continue work from before.
The saved workspace file is named .Rdata, and is located either in the directory from which you invoked this R session (Linux) or in the R installation directory (Windows).
You can also save the image yourself, to whatever file you wish, by calling save.image(). You can restore the workspace from that file later on by calling load().