Whats Covered

Intro
- Arithmetic
- Variable assignment
- Basic data types
Vectors
- Creating, naming, aritmetic, selection
Matrices
- Creating, naming, aritmetic, selection
Factors
- Creating, levles, aritmetic, ordering
Data frames
- Creating, looking, selection, sorting
Lists
- Creating, naming, selection

Intro to Basics

How it Works

This section has some pretty juicy R charts here.
This was hott stuff back in 1996
- Many of these styles of charts are still usefull now for data exploration
- Thought I would not use base R graphics to make them
- I’d start with ggplot2 or ggvis
- Still its fun to see these charts. Don’t wory about the code to make them for now.

# The hashtag is used to add comments

# Show some demo graphs generated with R
demo("graphics")

Arthmetic with R

Yeah, it does all the usual stuff.

# An addition
5 + 5

## [1] 10

# A substraction
5 - 5

## [1] 0

# A multiplication
3 * 5

## [1] 15

# A division
(5 + 5)/2

## [1] 5

# Exponentiation
2^5

## [1] 32

# Modulo
17%%4

## [1] 1

Variable assignment

# Assign the value 42 to x
x <- 42

# Print out the value of the variable x
x

## [1] 42

Variable assignment 2

# Assign the value 5 to the variable called my_apples
my_apples <- 5
my_apples

## [1] 5

Variable assignment 3

# Assign a value to the variables called my_apples and my_oranges
my_apples <- 5
my_oranges <- 6

# Add these two variables together and print the result
my_fruit <- my_apples + my_oranges
my_fruit

## [1] 11

Apples and oranges

This would fail because you can’t add character vectors.
Rstudio won’t even knit the document.
It will throw an error in the code.
I have eval=F so this will run and just paste the error below

# Assign a value to the variable called my_apples
my_apples <- 'apples'
my_oranges <- 'oranges'

# Add a character
my_fruit <- my_apples + my_oranges

Error in my_apples + my_oranges : non-numeric argument to binary operator

Basic data types in R

# What's the answer to the universe
my_numeric <- 42

# The quotation marks indicate that the variable is of type character
my_character <- "forty-two"

my_logical <- FALSE

Whats’s that data type?

# Declare variables of different types
my_numeric <- 42
class(my_numeric)

## [1] "numeric"

my_character <- "forty-two"
class(my_character)

## [1] "character"

my_logical <- FALSE
class(my_logical)

## [1] "logical"

Vectors

Note: I almost always use dataframes
But I guess its still good to know the other data types
They all have their uses

Create a vector

Vegas <- as.character("Here we go!")
Vegas

## [1] "Here we go!"

Create a vector (2)

numeric_vector <- c(1, 10, 49)
character_vector <- c("a", "b", "c")
boolean_vector <- c(T,F,T)

# Print the vectors
numeric_vector

## [1]  1 10 49

character_vector

## [1] "a" "b" "c"

boolean_vector

## [1]  TRUE FALSE  TRUE

Create a vector (3)

# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
poker_vector

## [1]  140  -50   20 -120  240

# Roulette winnings form Monday to Friday
roulette_vector <- c(-24, -50, 100, -350, 10)
roulette_vector

## [1]  -24  -50  100 -350   10

Naming a vector

## Name the vectors
names(poker_vector) <- c("Mon","Tues","Wed","Thur","Fri")
poker_vector

##  Mon Tues  Wed Thur  Fri 
##  140  -50   20 -120  240

names(roulette_vector) <- c("Mon","Tues","Wed","Thur","Fri")
roulette_vector

##  Mon Tues  Wed Thur  Fri 
##  -24  -50  100 -350   10

Naming a vector (2)

# Create the variable days_vector
days_vector <- c("Mon","Tues","Wed","Thur","Fri")

# Assign the names of the day to the roulette and poker_vectors
names(poker_vector) <- days_vector
poker_vector

##  Mon Tues  Wed Thur  Fri 
##  140  -50   20 -120  240

names(roulette_vector) <- days_vector
roulette_vector

##  Mon Tues  Wed Thur  Fri 
##  -24  -50  100 -350   10

Calculating total winnings

## First, just an example
A_vector <- c(1, 2, 3)
B_vector <- c(4, 5, 6)

# Take the sum of A_vector and B_vector
total_vector <- A_vector + B_vector
total_vector

## [1] 5 7 9

Calculating total winnings (2)

# Name poker and roulette
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(roulette_vector) <- days
names(poker_vector) <- days

total_daily <- poker_vector + roulette_vector
total_daily

##    Monday   Tuesday Wednesday  Thursday    Friday 
##       116      -100       120      -470       250

Calculating total winnings (3)

total_poker <- sum(poker_vector)
total_roulette <- sum(roulette_vector)

total_week <- sum(roulette_vector) + sum(poker_vector)
total_week

## [1] -84

Comparing total winnings

# Check if you realized higher total gains in poker then in roulette
answer <- total_poker > total_roulette
answer

## [1] TRUE

Vector selection: the good times

# Define new variable based on a selection
poker_wednesday <- poker_vector["Wednesday"]
poker_wednesday

## Wednesday 
##        20

Vector selection: the good times (2)

# Define new variable based on a selection
poker_midweek <- poker_vector[c("Tuesday","Wednesday","Thursday")]       
poker_midweek

##   Tuesday Wednesday  Thursday 
##       -50        20      -120

Vector selection: the good times (3)

# Define new variable based on a selection
roulette_selection_vector <- roulette_vector[2:5]
roulette_selection_vector

##   Tuesday Wednesday  Thursday    Friday 
##       -50       100      -350        10

Vector selection: the good times (4)

average_midweek_gain <- mean(poker_vector[c("Monday","Tuesday","Wednesday")])
average_midweek_gain

## [1] 36.66667

Selection by comparison - Step 1

# What days of the week did you make money on poker
selection_vector <- poker_vector > 0
selection_vector

##    Monday   Tuesday Wednesday  Thursday    Friday 
##      TRUE     FALSE      TRUE     FALSE      TRUE

Selection by comparison - Step 2

# What days of the week did you make money on poker
selection_vector <- poker_vector > 0

# Select from poker_vector these days
poker_winning_days <- poker_vector[selection_vector]
poker_winning_days

##    Monday Wednesday    Friday 
##       140        20       240

Advanced selection

# Show me
roulette_winning_days <- roulette_vector[roulette_vector > 0]
roulette_winning_days

## Wednesday    Friday 
##       100        10

Matrices

What’s a matrix?

# Construction of a matrix with 3 rows containing the numbers 1 up to 9
matrix(1:9, byrow=T, nrow=3)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Analyzing matrices, you shall

# Box office Star Wars: In Millions!
# The first element: US, the second element: Non-US 
new_hope = c(460.998007, 314.4)
empire_strikes = c(290.475067, 247.900000)
return_jedi = c(309.306177,165.8)

# Add your code below to construct the matrix
star_wars_matrix = matrix(c(new_hope, empire_strikes, return_jedi), byrow=T, nrow=3)
star_wars_matrix

##          [,1]  [,2]
## [1,] 460.9980 314.4
## [2,] 290.4751 247.9
## [3,] 309.3062 165.8

Naming a matrix

# Add your code here such that rows and columns of star_wars_matrix have a
# name!
rownames(star_wars_matrix) = c('A New Hope','The Empire Strikes Back','Return of the Jedi')
colnames(star_wars_matrix) = c('US','non-US')
star_wars_matrix

##                               US non-US
## A New Hope              460.9980  314.4
## The Empire Strikes Back 290.4751  247.9
## Return of the Jedi      309.3062  165.8

Calculating the worldwide box office

# Box office Star Wars: In Millions (!) 
# Construct matrix: 
box_office_all = c(461, 314.4,290.5, 247.9,309.3,165.8)
movie_names = c("A New Hope","The Empire Strikes Back","Return of the Jedi")
col_titles = c("US","non-US")

star_wars_matrix = matrix(box_office_all, nrow=3, byrow=TRUE, dimnames=list(movie_names,col_titles))
star_wars_matrix

##                            US non-US
## A New Hope              461.0  314.4
## The Empire Strikes Back 290.5  247.9
## Return of the Jedi      309.3  165.8

worldwide_vector = rowSums(star_wars_matrix)
worldwide_vector

##              A New Hope The Empire Strikes Back      Return of the Jedi 
##                   775.4                   538.4                   475.1

Adding a column for the worldwide box office

# Bind the new variable worldwide_vector as a column to star_wars_matrix
all_wars_matrix = cbind(star_wars_matrix, worldwide_vector)
all_wars_matrix

##                            US non-US worldwide_vector
## A New Hope              461.0  314.4            775.4
## The Empire Strikes Back 290.5  247.9            538.4
## Return of the Jedi      309.3  165.8            475.1

Adding a row

In the course they had the star_wars_matrix2 pre loaded
Here I just copy the original
this works to show what rbind does

# Matrix containing first trilogy box office
star_wars_matrix

##                            US non-US
## A New Hope              461.0  314.4
## The Empire Strikes Back 290.5  247.9
## Return of the Jedi      309.3  165.8

# Create a Matrix containing second trilogy box office
box_office_all = c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5)
movie_names = c("The Phantom Menace","Attack of the Clones","Revenge of the Sixth")
col_titles = c("US","non-US")

star_wars_matrix2 = matrix(box_office_all, nrow=3, byrow=TRUE, dimnames=list(movie_names,col_titles))
star_wars_matrix2

##                         US non-US
## The Phantom Menace   474.5  552.5
## Attack of the Clones 310.7  338.7
## Revenge of the Sixth 380.3  468.5

# Combine both Star Wars trilogies in one matrix
all_wars_matrix = rbind(star_wars_matrix, star_wars_matrix2)
all_wars_matrix

##                            US non-US
## A New Hope              461.0  314.4
## The Empire Strikes Back 290.5  247.9
## Return of the Jedi      309.3  165.8
## The Phantom Menace      474.5  552.5
## Attack of the Clones    310.7  338.7
## Revenge of the Sixth    380.3  468.5

The total box office revenue for the entire saga

# Print box office Star Wars: In Millions (!) for 2 trilogies
all_wars_matrix

##                            US non-US
## A New Hope              461.0  314.4
## The Empire Strikes Back 290.5  247.9
## Return of the Jedi      309.3  165.8
## The Phantom Menace      474.5  552.5
## Attack of the Clones    310.7  338.7
## Revenge of the Sixth    380.3  468.5

total_revenue_vector = colSums(all_wars_matrix)
total_revenue_vector

##     US non-US 
## 2226.3 2087.8

Selection of matix elements

mean_non_us_all  =  mean(star_wars_matrix[,2])
mean_non_us_all

## [1] 242.7

mean_non_us_some = mean(star_wars_matrix[1:2,2])
mean_non_us_some

## [1] 281.15

A little arithmetic with matrices

visitors = star_wars_matrix/5
visitors

##                            US non-US
## A New Hope              92.20  62.88
## The Empire Strikes Back 58.10  49.58
## Return of the Jedi      61.86  33.16

A little arithmetic with matrices (2)

ticket_prices_matrix = matrix(c(5,5,6,6,7,7), nrow=3, byrow=TRUE, dimnames=list(movie_names,col_titles)) 
ticket_prices_matrix

##                      US non-US
## The Phantom Menace    5      5
## Attack of the Clones  6      6
## Revenge of the Sixth  7      7

visitors = star_wars_matrix/ticket_prices_matrix
visitors

##                               US   non-US
## A New Hope              92.20000 62.88000
## The Empire Strikes Back 48.41667 41.31667
## Return of the Jedi      44.18571 23.68571

average_us_visitor = mean(visitors[,1])
average_us_visitor

## [1] 61.60079

average_non_us_visitor = mean(visitors[,2])
average_non_us_visitor

## [1] 42.62746

Factors

What’s a factor and why would you use it?

R uses factors for categorical variables!

What’s a factor and why would you use it? (2)

gender_vector = c("Male","Female","Female","Male","Male")

factor_gender_vector = factor(gender_vector)
factor_gender_vector

## [1] Male   Female Female Male   Male  
## Levels: Female Male

What’s a factor and why would you use it? (3)

animals_vector = c("Elephant", "Giraffe", "Donkey", "Horse")
factor_animals_vector = factor(animals_vector)
factor_animals_vector

## [1] Elephant Giraffe  Donkey   Horse   
## Levels: Donkey Elephant Giraffe Horse

temperature_vector = c("High", "Low", "High", "Low", "Medium")
factor_temperature_vector = factor(temperature_vector, order = TRUE, levels = c("Low", 
    "Medium", "High"), labels = c("L","M","H"))
factor_temperature_vector

## [1] H L H L M
## Levels: L < M < H

Factor levels

survey_vector = c("M", "F", "F", "M", "M")
factor_survey_vector = factor(survey_vector)
factor_survey_vector

## [1] M F F M M
## Levels: F M

levels(factor_survey_vector) = c("Female","Male")
factor_survey_vector

## [1] Male   Female Female Male   Male  
## Levels: Female Male

Summarizing a factor

# Type your code here for survey_vector
summary(survey_vector)

##    Length     Class      Mode 
##         5 character character

# Type your code here for factor_survey_vector
summary(factor_survey_vector)

## Female   Male 
##      2      3

Battle of the sexes

# Male
factor_survey_vector[1]

## [1] Male
## Levels: Female Male

# Female
factor_survey_vector[2]

## [1] Female
## Levels: Female Male

# Battle of the sexes: Male 'larger' than female?
factor_survey_vector[1] > factor_survey_vector[2]

## [1] NA

Ordered Factors

# Create 'speed_vector'
speed_vector <- c('Fast','Slow','Slow','Fast','Ultra-fast')

factor_speed_vector <- factor(speed_vector, ordered = T, levels = c('Slow','Fast','Ultra-fast'))
factor_speed_vector

## [1] Fast       Slow       Slow       Fast       Ultra-fast
## Levels: Slow < Fast < Ultra-fast

# R prints automagically in the right order
summary(factor_speed_vector)

##       Slow       Fast Ultra-fast 
##          2          2          1

Comparing ordered factors

# Is data analyst 2 faster than data analyst 5?
compare_them <- speed_vector[2] > speed_vector[5]
compare_them

## [1] FALSE

Data frames

What’s a data frame?

mtcars  # Built-in R dataset stored in a data frame

##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Quick, hae a look at your data set

# Have a quick look at your data
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Have a look at the structure

# Investigate the structure of the mtcars dataset to get started!
str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Create a data frame

planets = c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", 
    "Neptune")

type = c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
    "Gas giant", "Gas giant", "Gas giant", "Gas giant")

diameter = c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)

rotation = c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)

rings = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

# Create the data frame:
planets_df = data.frame(planets, type, diameter, rotation, rings)
planets_df

##   planets               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 5 Jupiter          Gas giant   11.209     0.41  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE
## 7  Uranus          Gas giant    4.007    -0.72  TRUE
## 8 Neptune          Gas giant    3.883     0.67  TRUE

Create a data frame (2)

# Check the structure of planets.df
str(planets_df)

## 'data.frame':    8 obs. of  5 variables:
##  $ planets : Factor w/ 8 levels "Earth","Jupiter",..: 4 8 1 3 2 6 7 5
##  $ type    : Factor w/ 2 levels "Gas giant","Terrestrial planet": 2 2 2 2 1 1 1 1
##  $ diameter: num  0.382 0.949 1 0.532 11.209 ...
##  $ rotation: num  58.64 -243.02 1 1.03 0.41 ...
##  $ rings   : logi  FALSE FALSE FALSE FALSE TRUE TRUE ...

Selection of data frame elements

# The planets_df data frame from the previous exercise is pre-loaded
closest_planets_df <- planets_df[1:3,]
closest_planets_df

##   planets               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE

furthest_planets_df <- planets_df[6:8,]
furthest_planets_df

##   planets      type diameter rotation rings
## 6  Saturn Gas giant    9.449     0.43  TRUE
## 7  Uranus Gas giant    4.007    -0.72  TRUE
## 8 Neptune Gas giant    3.883     0.67  TRUE

Selection of data frame elements (2)

# The planets_df data frame from the previous exercise is pre-loaded: 
furthest_planets_diameter = planets_df$diameter[3:8]

Only planets with rings

# Create the rings_vector
rings_vector = planets_df$rings

Only planets with rings (2)

# Select the information on planets with rings:
planets_with_rings_df =  planets_df[planets_df$rings,]

Only planets with rings but shorter

# Planets smaller than earth:
small_planets_df  = subset(planets_df, diameter < 1)

Sorting

# Just play around with the order function in the console to see how it
# works
a <- c(100,9,101)
order(a)

## [1] 2 1 3

Sorting your data frame

# What is the correct ordering based on the planets_df$diameter variable?
positions =  order(planets_df$diameter, decreasing = T)
positions

## [1] 5 6 7 8 3 2 4 1

# Create new "ordered" data frame:
largest_first_df = planets_df[positions,]
largest_first_df

##   planets               type diameter rotation rings
## 5 Jupiter          Gas giant   11.209     0.41  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE
## 7  Uranus          Gas giant    4.007    -0.72  TRUE
## 8 Neptune          Gas giant    3.883     0.67  TRUE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE

Lists

Lists, why would you need them?

They are useful sometimes.

Lists, why would you need them? (2)

A list in R is similar to your to-do list at work or school:
- the different items on that list most likely differ in length, characteristic, type of activity that has to do be done, …
A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way.
- These objects can be matrices, vectors, data frames, even other lists, etc.
- It is not even required that these objects are related to each other in any way.
You could say that a list is some kind super data type:
- you can store practically any piece of information in it!

Creating a list

# Vector with numerics from 1 up to 10
my_vector <- 1:10 
my_vector

##  [1]  1  2  3  4  5  6  7  8  9 10

# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)
my_matrix

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

# First 10 elements of the built-in data frame 'mtcars'
my_df <- mtcars[1:10,]
my_df

##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

# Construct list with these different elements:
my_list <- list(my_vector, my_matrix, my_df)
my_list

## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## [[3]]
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Creating a named list

# Construct 'my_list' with these different elements:
my_list <- list(vec = my_vector, mat = my_matrix, df = my_df)

# Print 'my_list' to the console
my_list

## $vec
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## $df
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Creating a named list (2)

# The vectors 'actors' and 'reviews' are pre-loaded in the workspace
actors <- c("Jack Nicholson", "Shelley Duvall", "Danny Lloyd", "Scatman Crothers", "Barry Nelson")

reviews <- data.frame(
  scores = c(4.5, 4.0, 5.0),
  sources = c("IMDb1", "IMDb2", "IMDb3"),
  comments = c("Best Horror Film I Have Ever Seen",
               "A truly brilliant and scary film from Stanley Kubrick",
               "A masterpiece of psychological horror")
)

# Create the list 'shining_list'
shining_list <- list(
    moviename = "The Shining",
    actors = actors,
    reviews = reviews)

shining_list

## $moviename
## [1] "The Shining"
## 
## $actors
## [1] "Jack Nicholson"   "Shelley Duvall"   "Danny Lloyd"      "Scatman Crothers"
## [5] "Barry Nelson"    
## 
## $reviews
##   scores sources                                              comments
## 1    4.5   IMDb1                     Best Horror Film I Have Ever Seen
## 2    4.0   IMDb2 A truly brilliant and scary film from Stanley Kubrick
## 3    5.0   IMDb3                 A masterpiece of psychological horror

Selecting elements from a list

# Define 'last_actor'
last_actor <- shining_list$actors[length(shining_list$actors)]
last_actor

## [1] "Barry Nelson"

# Define 'second_review'
second_review <- shining_list$reviews[2,]
second_review

##   scores sources                                              comments
## 2      4   IMDb2 A truly brilliant and scary film from Stanley Kubrick

Adding more movie information to the list

# We forgot something; add the year to shining_list
shining_list_full <- c(shining_list, year = 1980)

# Have a look at shining_list_full
str(shining_list_full)

## List of 4
##  $ moviename: chr "The Shining"
##  $ actors   : chr [1:5] "Jack Nicholson" "Shelley Duvall" "Danny Lloyd" "Scatman Crothers" ...
##  $ reviews  :'data.frame':   3 obs. of  3 variables:
##   ..$ scores  : num [1:3] 4.5 4 5
##   ..$ sources : Factor w/ 3 levels "IMDb1","IMDb2",..: 1 2 3
##   ..$ comments: Factor w/ 3 levels "A masterpiece of psychological horror",..: 3 2 1
##  $ year     : num 1980

Introduction to R

Amar Kapote

2016-06-28

Whats Covered

Intro to Basics

How it Works

Arthmetic with R

Variable assignment

Variable assignment 2

Variable assignment 3

Apples and oranges

Basic data types in R

Whats’s that data type?

Vectors

Create a vector

Create a vector (2)

Create a vector (3)

Naming a vector

Naming a vector (2)

Calculating total winnings

Calculating total winnings (2)

Calculating total winnings (3)

Comparing total winnings

Vector selection: the good times

Vector selection: the good times (2)

Vector selection: the good times (3)

Vector selection: the good times (4)

Selection by comparison - Step 1

Selection by comparison - Step 2

Advanced selection

Matrices

What’s a matrix?

Analyzing matrices, you shall

Naming a matrix

Calculating the worldwide box office

Adding a column for the worldwide box office

Adding a row

The total box office revenue for the entire saga

Selection of matix elements

A little arithmetic with matrices

A little arithmetic with matrices (2)

Factors

What’s a factor and why would you use it?

What’s a factor and why would you use it? (2)

What’s a factor and why would you use it? (3)

Factor levels

Summarizing a factor

Battle of the sexes

Ordered Factors

Comparing ordered factors

Data frames

What’s a data frame?

Quick, hae a look at your data set

Have a look at the structure

Create a data frame

Create a data frame (2)

Selection of data frame elements

Selection of data frame elements (2)

Only planets with rings

Only planets with rings (2)

Only planets with rings but shorter

Sorting

Sorting your data frame

Lists

Lists, why would you need them?

Lists, why would you need them? (2)

Creating a list

Creating a named list

Creating a named list (2)

Selecting elements from a list

Adding more movie information to the list