Basic calculations
You can use R for basic computations you would perform in a
calculator
# Addition
2-3
[1] -1
# Division
2/3
[1] 0.6666667
# Exponentiation
2^3
[1] 8
# Square root
sqrt(2)
[1] 1.414214
# Logarithms
log(2)
[1] 0.6931472
Cool lines of code, so let us now run a similar chunk of code
log(10)
[1] 2.302585
Well, I thought that log(10) was equal to one. What if we are
actually calculating log based e(2.72)
log(2.72)
[1] 1.000632
Let us compute now some log expressions
log10(10)
[1] 1
Often you will want to test whether something is less than, greater
than or equal to something.
3 == 8
[1] FALSE
3 != 8
[1] TRUE
3 <= 8
[1] TRUE
1==0
[1] FALSE
2==5
[1] FALSE
1==1
[1] TRUE
The logical operators are &
for logical
AND, |
for logical OR,
and !
for NOT. These are some
examples:
# Logical Disjunction (or)
FALSE | FALSE
[1] FALSE
# Logical Conjunction (and)
TRUE & FALSE
[1] FALSE
# Negation
! FALSE
[1] TRUE
# Combination of statements
2 < 3 | 1 == 5
[1] TRUE
Assigning Values to Variables
In R, you create a variable and assign it a value using
<-
as follows
foo <- 2 + 2
foo*3
[1] 12
#Let us now multiply foo by 5
foo*5
[1] 20
fii<-3+4
To see the variables that are currently defined, use ls
(as in “list”)
ls()
[1] "fii" "foo"
To delete a variable, use rm
(as in “remove”)
rm(fii)
ls()
character(0)
Either <-
or =
can be used to assign a
value to a variable, but I prefer <-
because is less
likely to be confused with the logical operator ==
Vectors
The basic type of object in R is a vector, which is an
ordered list of values of the same type. You can create a vector using
the c()
function (as in “concatenate”).
bar <- c(2, 5, 10, 2, 1)
bar
[1] 2 5 10 2 1
baz <- c(2, 2, 3, 3, 3)
baz
[1] 2 2 3 3 3
There are also some functions that will create vectors with regular
patterns, like repeated elements.
# replicate function
rep(2, 5)
[1] 2 2 2 2 2
# consecutive numbers
1:5
[1] 1 2 3 4 5
# sequence from 1 to 10 with a step of 2
seq(1, 10, by=2)
[1] 1 3 5 7 9
seq(1,20,by=3)
[1] 1 4 7 10 13 16 19
rep(1,3)
[1] 1 1 1
Many functions and operators like +
or -
will work on all elements of the vector.
# add vectors
bar + baz
[1] 4 7 13 5 4
# compare vectors
bar == baz
[1] TRUE FALSE FALSE FALSE FALSE
# find length of vector
length(bar)
[1] 5
# find minimum value in vector
min(bar)
[1] 1
# find average value in vector
mean(bar)
[1] 4
You can access parts of a vector by using [
. Recall what
the value is of the vector bar
.
bar
[1] 2 5 10 2 1
# If you want to get the first element:
bar[1]
[1] 2
If you want to get the last element of bar
without
explicitly typing the number of elements of bar
, make use
of the length
function, which calculates the length of a
vector:
bar[length(bar)]
[1] 1
You can also extract multiple values from a vector. For instance to
get the 2nd through 4th values use
bar[c(2, 3, 4)]
Vectors can also be strings or logical values
quxx <- c("a", "b", "cde", "fg")
quxx
[1] "a" "b" "cde" "fg"
Data Frames
In statistical applications, data is often stored as a data frame,
which is like a spreadsheet, with rows as observations and
columns as variables.
To manually create a data frame, use the data.frame()
function.
data.frame(foo = c(1, 2, 3),
bar = c("a", "b", "c"),
baz = c(1.5, 2.5, 3))
NA
Most often you will be using data frames loaded from a file. For
example, load the results of a class survey. The function
load
or read.table
can be used for this.
How to Make a Random Sample
To randomly select a sample use the function sample()
.
The following code selects 5 numbers between 1 and 10 at random (without
duplication)
sample(1:10, size=5)
[1] 4 8 7 2 9
- The first argument gives the vector of data to select elements
from.
- The second argument (
size=
) gives the size of the
sample to select.
Taking a simple random sample from a data frame is only slightly more
complicated, having two steps:
- Use
sample()
to select a sample of size n
from a vector of the row numbers of the data frame.
- Use the index operator
[
to select those rows from the
data frame.
Consider the following example with fake data. First, make
up a data frame with two columns. (LETTERS
is a character
vector of length 26 with capital letters âAâ to âZâ;
LETTERS
is automatically defined and pre-loaded in
R
)
bar <- data.frame(var1 = LETTERS[1:10], var2 = 1:10)
# Check data frame
bar
Suppose you want to select a random sample of size 5. First, define a
variable n
with the size of the sample, i.e. 5
n <- 5
Now, select a sample of size 5 from the vector with 1 to 10 (the
number of rows in bar
). Use the function
nrow()
to find the number of rows in bar
instead of manually entering that number.
Use :
to create a vector with all the integers between 1
and the number of rows in bar
.
samplerows <- sample(1:nrow(bar), size=n)
# print sample rows
samplerows
The variable samplerows
contains the rows of
bar
which make a random sample from all the rows in
bar
. Extract those rows from bar
with
# extract rows
barsample <- bar[samplerows, ]
# print sample
print(barsample)
The code above creates a new data frame called
barsample
with a random sample of rows from
bar
.
In a single line of code:
bar[sample(1:nrow(bar), n), ]
Using Tables
The table()
command allows us to look at tables. Its
simplest usage looks like table(x)
where x
is
a categorical variable.
For example, a survey asks people if they smoke or not. The data
is
Yes, No, No, Yes, Yes
We can enter this into R with the c()
command, and
summarize with the table()
command as follows
x <- c("Yes","No","No","Yes","Yes")
table(x)
x
No Yes
2 3
y<-c("Yes","No","Yes","Yes")
y
[1] "Yes" "No" "Yes" "Yes"
table(y)
y
No Yes
1 3
Numeric measures of center and spread
Suppose, CEO yearly compensations are sampled and the following are
found (in millions)
12 .4 5 2 50 8 3 1 4 0.25
sals <- c(12, .4, 5, 2, 50, 8, 3, 1, 4, 0.25)
# the average
mean(sals)
[1] 8.565
# the variance
var(sals)
[1] 225.5145
# the standard deviation
sd(sals)
[1] 15.01714
# the median
median(sals)
[1] 3.5
# Tukey's five number summary, usefull for boxplots
# five numbers: min, lower hinge, median, upper hinge, max
fivenum(sals)
[1] 0.25 1.00 3.50 8.00 50.00
# summary statistics
summary(sals)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.250 1.250 3.500 8.565 7.250 50.000
How about the mode?
In R we can write our own functions, and a first example of
a function is shown below in order to compute the mode of a
vector of observations x
# Function to find the mode, i.e. most frequent value
getMode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
As an example, we can use the function defined above to find the most
frequent value in te vector baz
# Most frequent value in baz
baz
[1] 2 2 3 3 3
getMode(baz)
[1] 3
bar
[1] 2 5 10 2 1
getMode(bar)
[1] 2
---
title: "First Steps in `R`"
output: html_notebook
---

# Basic calculations

You can use R for basic computations you would perform in a calculator

```{r}
# Addition
2-3
# Division
2/3
# Exponentiation
2^3 
# Square root
sqrt(2)
# Logarithms
log(2)
```

Cool lines of code, so let us now run a similar chunk of code

```{r}
log(10)

```

Well, I thought that log(10) was equal to one. What if we are actually calculating log based e(2.72)


```{r}
log(2.72)
```

Let us compute now some log expressions

```{r}
log10(10)
```



Often you will want to test whether something is less than, greater than or equal to something.




```{r}
3 == 8
3 != 8
3 <= 8
```

```{r}
1==0
```

```{r}
2==5
1==1

```



The _logical operators_ are `&` for logical **AND**, `|` for logical **OR**, and `!` for **NOT**. These are some examples:

```{r}
# Logical Disjunction (or)
FALSE | FALSE
# Logical Conjunction (and)
TRUE & FALSE
# Negation
! FALSE
# Combination of statements
2 < 3 | 1 == 5
```

# Assigning Values to Variables


In R, you create a variable and assign it a value using `<-` as follows

```{r}
foo <- 2 + 2
foo*3
```
```{r}
#Let us now multiply foo by 5
foo*5
```

```{r}
fii<-3+4
```


To see the variables that are currently defined, use `ls` (as in "list")

```{r}
ls()
```

To delete a variable, use `rm ` (as in "remove")

```{r}
rm(fii)
```


```{r}
ls()
```


Either `<-` or `=` can be used to assign a value to a variable, but I prefer `<-` because is less likely to be confused with the logical operator `==`

# Vectors

The basic type of object in R is a _vector_, which is an ordered list of values of the same type. You can create a vector using the `c()` function (as in "concatenate").

```{r}
bar <- c(2, 5, 10, 2, 1) 
bar
```

```{r}
baz <- c(2, 2, 3, 3, 3)
baz
```


There are also some functions that will create vectors with regular patterns, like repeated elements. 

```{r}
# replicate function
rep(2, 5)
# consecutive numbers
1:5
# sequence from 1 to 10 with a step of 2
seq(1, 10, by=2)

seq(1,20,by=3)

rep(1,3)
```

Many functions and operators like `+` or `-` will work on all elements of the vector.

```{r}
# add vectors
bar + baz
# compare vectors
bar == baz
# find length of vector
length(bar)
# find minimum value in vector
min(bar)
# find average value in vector
mean(bar)
```

You can access parts of a vector by using `[`. Recall what the value is of the vector `bar`.

```{r}
bar
# If you want to get the first element:
bar[1]
```


If you want to get the last element of `bar` without explicitly typing the number of elements of `bar`, make use of the `length` function, which calculates the length of a vector:

```{r}
bar[length(bar)]
```


You can also extract multiple values from a vector. For instance to get the 2nd through 4th values use

```{r}
bar[c(2, 3, 4)]
```


Vectors can also be strings or logical values

```{r}
quxx <- c("a", "b", "cde", "fg")
```


```{r}
quxx
```


# Data Frames

In statistical applications, data is often stored as a data frame, which is like a spreadsheet, with _rows as observations_ and _columns as variables_.

To manually create a data frame, use the `data.frame()` function.

```{r}
data.frame(foo = c(1, 2, 3), 
           bar = c("a", "b", "c"), 
           baz = c(1.5, 2.5, 3)) 

```

Most often you will be using data frames loaded from a file. For example, load the results of a class survey. The function `load` or `read.table` can be used for this. 

# How to Make a Random Sample

To randomly select a sample use the function `sample()`. The following code selects 5 numbers between 1 and 10 at random (without duplication)

```{r}
sample(1:10, size=5)
```


- The first argument gives the vector of data to select elements from.
- The second argument (`size=`) gives the size of the sample to select.

Taking a simple random sample from a data frame is only slightly more complicated, having two steps:

1. Use `sample()` to select a sample of size `n` from a vector of the row numbers of the data frame. 
2. Use the index operator `[` to select those rows from the data frame.


Consider the following example with _fake data_. First, make up a data frame with two columns. (`LETTERS` is a character vector of length 26 with capital letters âAâ to âZâ; `LETTERS` is automatically defined and pre-loaded in `R`)

```{r}
bar <- data.frame(var1 = LETTERS[1:10], var2 = 1:10)
# Check data frame
bar
```


Suppose you want to select a random sample of size 5. First, define a variable `n` with the size of the sample, i.e. 5

```{r}
n <- 5
```

Now, select a sample of size 5 from the vector with 1 to 10 (the number of rows in `bar`). Use the function `nrow()` to find the number of rows in `bar` instead of manually entering that number. 

Use `:` to create a vector with all the integers between 1 and the number of rows in `bar`.

```{r}
samplerows <- sample(1:nrow(bar), size=n) 
# print sample rows
samplerows
```



The variable `samplerows` contains the rows of `bar` which make a random sample from all the rows in `bar`. Extract those rows from `bar` with

```{r}
# extract rows
barsample <- bar[samplerows, ]
# print sample
print(barsample)
```



The code above creates a new _data frame_ called `barsample` with a random sample of rows from `bar`.

In a single line of code: 

```{r}
bar[sample(1:nrow(bar), n), ]
```

# Using Tables

The `table()` command allows us to look at tables. Its simplest usage looks like `table(x)` where `x` is a _categorical variable_.

For example, a survey asks people if they smoke or not. The data is 

_Yes, No, No, Yes, Yes_

We can enter this into R with the `c()` command, and summarize with the `table()` command as follows

```{r}
x <- c("Yes","No","No","Yes","Yes") 
table(x)

```

```{r}
y<-c("Yes","No","Yes","Yes")
y
table(y)
```


# Numeric measures of center and spread

Suppose, CEO yearly compensations are sampled and the following are found (in millions)

12    .4    5     2     50    8     3     1     4     0.25

```{r}
sals <- c(12, .4, 5, 2, 50, 8, 3, 1, 4, 0.25)
# the average
mean(sals) 
# the variance
var(sals)
# the standard deviation
sd(sals)
# the median
median(sals)
# Tukey's five number summary, usefull for boxplots
# five numbers: min, lower hinge, median, upper hinge, max
fivenum(sals)
# summary statistics
summary(sals)

```


### How about the _mode_? 

In R we can write our own _functions_, and a first example of a function is shown below in order to compute _the mode_ of a vector of observations `x`

```{r}
# Function to find the mode, i.e. most frequent value
getMode <- function(x) {
     ux <- unique(x)
     ux[which.max(tabulate(match(x, ux)))]
 }
```

As an example, we can use the function defined above to find the most frequent value in te vector `baz`

```{r}
# Most frequent value in baz
baz
getMode(baz)
bar
getMode(bar)
```











