Govt2201 Topic 2 Intro to R

<hr />
Michael Bailey
]

---

## Overview of this session

--
   1. R and RStudio

--
   2. R Markdown

--
   3. Loading data (sneaky tricky)

--
   4. Three data types (boring)

--
   5. Four data structures (exciting!)

--
   6. Describing data

--
   7. Create new data objects

--
   8. Loops

--
   9. _If_ statements

---

## Looking ahead

1. Labs

a. We'll start Lab 1 in class next week. It is due Friday September 19 at 5 pm via Canvas.  You will use R to describe and analyze exit polling data. You'll submit your code (in a .Rmd file) and output (in a .html file). 
    
--

2. Quizzes
   
  a. There will be a practice quiz on Chapters 1 and 2 in class on Thursday September 11.
   
  b. There will be a quiz on Chapter 3 on Tuesday September 16.

---

class: left, top, inverse, hide-logo
background-position: center
background-size: cover

## Goal 1: Get set up in R and RStudio

---

## R is

- Powerful

- Flexible

- Eases workflow

- The lingua franca of data science (with Python)

- Free

---

## Installing R and RStudio

- R installation: https://cran.case.edu/

- RStudio is an *integrated development environment* that simplifies many tasks.

--
  + Useful, but not necessary.

--
  + Installation: https://rstudio.com/products/rstudio/download/

---

### R Studio

---

### R Studio

---

### R Studio

---

### R Studio

---

### R Studio

---

### Scripts

--
- A script is a text file where we write and run code our code.

--
- The basic R script is a `.R` file that has code and comments

--
- We'll also use "RMarkdown" files `.Rmd` that combine code and text and figures into pdf, Word or html files.

---

#### Scripts (continued)

When we write a line of code in the script window, we can run it in the console by highlighting the text and...

--
- pressing **`command + enter`** (mac)

--
- pressing **`control + enter`** (windows)

--
- clicking **`run`** in banner at top of script quadrant in RStudio

--
- Always use scripts!

--
   + Save the recipe, not the bread!

<br>

--
- Use lots of comments! (use "#" at the start of a line)

--
- What does this line of code do?

``` r
library(car)
```

--
- Aha!

``` r
# Load package for F-tests
library(car)
```

---

### Console

The console is where calculations "`R`" occur.  We can do interactive commands here.

---

#### Console (continued)

All commands are processed through the console directly (that is, one can type commands directly into it) or via a **script**.

---

### Directories

--
- Stay on top of where you store data

--
- Path names are fussy: On Windows, you also have to replace all \ in a path with / or with \\\\ (double backward slash)

--
- Option 1: use *setwd* ("set working directory")

``` r
setwd("C:\\Documents\\MyFiles")
dta <- read.table("data.csv", header=TRUE)
```

--
- Option 2: use absolute paths

``` r
dta <- read.table("C:\\Documents\\MyFiles\\data.csv", header=TRUE)
```

--
- Option 3: use .Rproj to produce relative paths

--
   + In RStudio, File/New Project and create directory where you will store all material related to this project

--
   + Always open this .Rproj first when working on this projet.

--
   + Will create paths relative to the directory you create

--
   + Is portable to another machine

--
Also: use the *here()* package/function to manage directories

---

### Directories - "Assignment"

#### Please do something like this for all classes

- Set up a directory/folder for this class (e.g., PPOL5200)

- Create subdirectories/subfolders (e.g., data, labs, problemSets, slides)

- Create a RProject (e.g., PPOL5200.RProj) and start with that every time you do

--
- Get used to using relative paths
    + Suppose you are in labs folder and want to use a file in data; 
    + It will look something like `CountryCode = read.csv("../data/CountryCodes.csv")'

- Clean, well-organized folder structures/practices is **very** important when working in teams

---

## "Cheat" sheets

- R [Code summary](https://michaelbailey.georgetown.domains/wp-content/uploads/2020/06/R-commands_Real-Econometrics-Bailey.pdf)

- Stata [Code summary](https://michaelbailey.georgetown.domains/wp-content/uploads/2020/06/Stata-commands_Real-Econometrics-Bailey.pdf)

-  Both from [Real Stats](https://global.oup.com/ushe/product/real-econometrics-9780190857462)

---

## Goal 2: Use R Markdown

---

### Overview of RMarkdown

--
- RMarkdown compiles R code into formats such as pdf, html and Microsoft Word.

--
- Example: Use R and RMarkdown to produce analysis and figures together with formatted paper.

+ [Covid testing and sampling](https://michaelbailey.georgetown.domains/wp-content/uploads/2020/05/Corona_Paper_May2020_ver2.pdf)

---

### R Markdown Steps

- Create .Rmd file (e.g., File/New File/R Markdown for starting template)

- Compile a new document in the format indicated by "kniting" Rmd file in RStudio via the Knit button on RStudio command bar

- RStudio compiles the .Rmd document in **a new session of R**.

--
      + **This is major source of headaches for new users.** An R data object you can see in interactive console may not exist in the Rmd file.

--
      + Implication: **Your Rmd code must be self-contained**: create all objects and do all processing inside Rmd document

- For more details on R Markdown
    - Xie et al RMarkdown book: https://bookdown.org/yihui/rmarkdown/yihui-xie.html
    - http://rmarkdown.rstudio.com
    - https://www.rstudio.com/resources/cheatsheets

---

### Three RMarkdown Elements

1. Frontmatter (referred to as YAML frontmatter)

1. Text

1. Code

---

### Simple Example

<div class="figure" style="text-align: center">
<img src="Figures/RmdExample.png" alt="Basic RMarkdown document" width="80%" />
<p class="caption">Basic RMarkdown document</p>
</div>
---

### RMarkdown Element 1: Frontmatter

#### YAML header

--
- Basic metadata: title, author, date, output

--
- YAML stands for "YAML Ain't Markup Language" (yes, it's a silly name)

--
- Indents and spaces matter here! (**Another common source of headaches**)

#### Setup chunk

--
- Immediately following the YAML frontmatter is a _setup chunk_ of R code that loads libraries and sets settings etc. (More on R code chunks below.)

``` r
library(tidyverse)
library(readxl)
```

---
  
### RMarkdown Element 2: Text

--
- Text typically organized under headers (One hash for large font, 2 hashes for smaller down to 4 hashes)

--
- For bullets, use a single dash. Put two dashes on line above, to reveal the bullets step by step in a presentation.

--
`$Y_{it} = \beta_0 + \beta_1X_{it} + \epsilon_{it}$`

- For math, use LaTex format inside of dollar signs.

``` r
# $Y_{it} = \beta_0 + \beta_1X_{it} + \epsilon_{it}$
```

---

### RMarkdown Element 3: R Code

- R Code in RMarkdown can be in **chunks** or **inline**. For example, see earlier slide called "Simple Example"

##### Chunks

- Chunks are sections of code separated by 3 backticks followed by {r, eval=TRUE} (see options below)

- Code chunk options include

+ `include = FALSE` R runs the code, but code and results do not appear in finished file.

+ `echo = FALSE` prevents code, but not the results from appearing in finished file.

+ `eval = FALSE` R will not evaluate the code in the chunk.
  
--

+ `message = FALSE` prevents messages generated by code from appearing in finished file.

##### Inline

- Inline R code expressions can be used in text sections. They start with backtick r and end with backtick.

---

#### Quick example

---

#### Quick example

First, we show output with echo = FALSE (we do not see R code - just output)

```
##                 Estimate   Std. Error   t value     Pr(>|t|)
## (Intercept) 6.758132e+01 1.3275714260 50.905975 1.979131e-43
## Income      7.433343e-04 0.0002965107  2.506939 1.561728e-02
```

--
Second, we show code and output with echo= TRUE

``` r
# Code chunk with echo = TRUE
ols.1 = lm(LifeExp ~ Income, data = statedata)
summary(ols.1)$coefficients
```

```
##                 Estimate   Std. Error   t value     Pr(>|t|)
## (Intercept) 6.758132e+01 1.3275714260 50.905975 1.979131e-43
## Income      7.433343e-04 0.0002965107  2.506939 1.561728e-02
```

--
Third, we use some code and output in text:  The effect of 1,000 dollars of income on life expectancy is 0.74 years of life expectancy.

---

## Goal 3: Load Data into R

---

## Loading data: 5 common examples

--
- [More information on loading data in R](https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf)

- Data sets in base R: see list by typing *data()*

``` r
state.x77[1:5,]  # First 5 rows of "US State Facts" data provided inside R
##            Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20  50708
## Alaska            365   6315        1.5    69.31   11.3    66.7   152 566432
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65  51945
## California      21198   5114        1.1    71.71   10.3    62.6    20 156361
```

--
- read.csv

``` r
CountryCode = read.csv("Data/Country codes for WVS wave 6.csv")
CountryCode[1:4,]  # First 4 rows of CountryCode data
##   ccode    countryName
## 1     8        Albania
## 2    12        Algeria
## 3    16 American Samoa
## 4    20        Andorra
```

---

#### Loading data (continued): R data formats

--
- Save multiple objects with .RData format (often abbreviated to .Rda)

``` r
object1 = CountryCode[1:10,]
object2 = state.x77[1:5,]
save(object1, object2, file = "Data/data.RData")
rm(list=ls()) # Clear memory
load(file = "Data/data.Rdata") # Load data
objects() # Look at objects in memory
## [1] "object1" "object2"
```

--
- Save one object to a file with .rds format

``` r
saveRDS(object1, file = "Data/data.rds")
rm(list=ls()) # Clear memory
my_data <- readRDS(file = "Data/data.rds") # Load object into new object name
objects()
## [1] "my_data"
my_data[1:4,]
##   ccode    countryName
## 1     8        Albania
## 2    12        Algeria
## 3    16 American Samoa
## 4    20        Andorra
```

---
#### Loading data (continued): Using packages ("libraries")

--
- We need *packages* to load certain types of data sets (such as Excel and Stata data files)

--
- Packages are collections of code/data/documentation to do thousands of specialized tasks

--
- The first time you use a package you need to save the files to your computer

``` r
install.packages("readxl")
```

--
- After that you need to ''pull it off the shelf'' every time you want to use it

``` r
library(readxl)
```

---

#### Loading data (continued): Using packages ("libraries")

``` r
library("readxl") # Package to read Excel data
## Warning: package 'readxl' was built under R version 4.3.3
Poll.1 <- readxl::read_excel("Data/Battleground-65-Final-Dataset.xlsx", sheet = "Final Dataset")
  # We do not necessarily need to identify package (see readxl:: in above command).  However,
  # (a) it can help future us find what package we need to install and 
  # (b) can avoid problems when multiple packages use same function name
  # Example: "select" is function in multiple packages; if you want standard "tidyverse" 
  #           version (more on this later), you'll need to write dplyr::select(...)

Poll.1[1:2, 1:10] ## First 2 rows and first 10 variables
## # A tibble: 2 × 10
##     INT REGION STATE COUNTY    CD SAMPGEN ACTAGE  DTID  JBID  PBID
##   <dbl>  <dbl> <dbl>  <dbl> <dbl> <chr>    <dbl> <dbl> <dbl> <dbl>
## 1   430      3     1      3     1 F           64     1     4     4
## 2   720      3     1     53     1 F           77     4     2     2
```

``` r
# Read in a different sheet in an Excel file
Poll.2 <- readxl::read_excel("Data/Battleground-65-Final-Dataset.xlsx", sheet = "NotData")
Poll.2[1:2, 1:8]  
## # A tibble: 2 × 8
##      x1 x2       x3    x4    x5    x6    x7    x8
##   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     1 this      1     1     1     1     1     1
## 2     2 is        2     2     2     2     2     2
```

---

#### Loading data (continued): Stata data

``` r
library(haven)  # Package to read Stata data
dta = haven::read_dta("Data/Ch5_Exercise6_Global_education.dta")

## First 4 rows and first 10 variables
dta[1:4, 1:10]  
## # A tibble: 4 × 10
##   code  name       open  ed60 ypc60 ypcgr testavg proprts edavg region
##   <chr> <chr>     <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl> <chr> 
## 1 ARG   Argentina 0      6.13  7.39 0.996    3.92    6.5   7.22 LATAM 
## 2 AUS   Australia 0.870  9.81 10.6  2.22     5.09    9.32 11.5  COMM  
## 3 AUT   Austria   1      8.28  7.36 2.96     5.09    9.74  9.85 C-EUR 
## 4 BEL   Belgium   1      7.39  7.76 2.84     5.04    9.69  9.11 C-EUR
```

---

## Goal 4: Identify three major data types

1. Numeric

2. Logical

3. String

---
### 1. Numeric

``` r
# Variable named x.numeric contains the numbers 7 and 10
x.numeric <- c(7, 10)  
x.numeric
## [1]  7 10
```

--
### 2. Logical

``` r
# Variable named x.logic is true/false 
# Based on condition in parentheses
x.logical <- (x.numeric > 8)  
x.logical
## [1] FALSE  TRUE

# Notice double equal sign in condition
x.logical2 <- (x.numeric == 7)  
x.logical2
## [1]  TRUE FALSE
```

---

--
### 3. String
--

``` r
# Variable named x.string has words
x.string <- c("Hello", "World", "how are you doing?", 7)  
x.string
## [1] "Hello"              "World"              "how are you doing?"
## [4] "7"
```

---

#### Data types (continued)

If you want to know what type of data a variable is use *class()*

``` r
x.numeric
## [1]  7 10
class(x.numeric)
## [1] "numeric"

x.string
## [1] "Hello"              "World"              "how are you doing?"
## [4] "7"
class(x.string)
## [1] "character"
```

---

#### Data types (continued)
#### Other data types

- Factors - nominal variables that take on one of a specified number of values (e.g. "U.S." or "Mexico" or "Canada")

--
- Dates

--
- We'll leave getting comfortable with these data types for future work

---

### Missing data

#### Often a big deal!

``` r
# Create a a new variable.  NA is specific term for missing data in R
x.new = c(7, 10, NA, 3, 1)
x.new
## [1]  7 10 NA  3  1
```

``` r
# What happens here?
x.dot = c(7, 10, ".", 3, 1)
```

``` r
x.dot
## [1] "7"  "10" "."  "3"  "1"
class(x.dot)
## [1] "character"
```

---

#### Missing data (continued)

--
Check for missing data with the *is.na()* function

``` r
x.new
## [1]  7 10 NA  3  1
```

``` r
Missing.indicator = is.na(x.new) 
```

--
What data type is the variable *Missing.indicator*?

``` r
Missing.indicator
## [1] FALSE FALSE  TRUE FALSE FALSE
class(Missing.indicator)
## [1] "logical"
```

---

## Goal 5: Identify four data structures

1. Vectors

2. Matrices

3. Data frames

4. Lists

---

### 1. Vector: a column of numbers

More technically, a vector is a sequence of data elements of same type

``` r
# Create a vector called x1
x1 <- c(1, 4, -1, 4, 1, 5)
x1
## [1]  1  4 -1  4  1  5
```

``` r
# x2 is simply 2 x x1 for each element
x2 <- 2 * x1
x2
## [1]  2  8 -2  8  2 10
```

---
#### 1. Vector (continued)

Reference vector elements with single brackets

``` r
x2
## [1]  2  8 -2  8  2 10
# Set the third element of x2 to be missing
x2[3] = NA
x2
## [1]  2  8 NA  8  2 10
```

``` r
x1
## [1]  1  4 -1  4  1  5
# Set all elements of x1 that are < 2 to equal 200
x1[x1 < 2] = 200
x1
## [1] 200   4 200   4 200   5
```

---

#### 1. Vector (continued)

Use logical and brackets to select subset of data.

``` r
x.new
## [1]  7 10 NA  3  1

# the is.na() command indicates if an element is NA
is.na(x.new)
## [1] FALSE FALSE  TRUE FALSE FALSE
```

``` r
# Show observations that are not missing
# Equivalent ways to represent the same information
x.new[is.na(x.new) == FALSE]
## [1]  7 10  3  1
x.new[is.na(x.new) == 0]
## [1]  7 10  3  1
x.new[is.na(x.new) != 1]
## [1]  7 10  3  1
```

---

### 2. Matrix: multiple columns of variables of same type and length

Combine vectors into matrix with *cbind()* ("column bind")

``` r
Matrix.1 = cbind(x1, x2)
Matrix.1
##       x1 x2
## [1,] 200  2
## [2,]   4  8
## [3,] 200 NA
## [4,]   4  8
## [5,] 200  2
## [6,]   5 10
```

--
Check number of rows and columns with *dim()*

``` r
dim(Matrix.1)
## [1] 6 2
```

---
#### 2. Matrix (continued)

``` r
Matrix.1
##       x1 x2
## [1,] 200  2
## [2,]   4  8
## [3,] 200 NA
## [4,]   4  8
## [5,] 200  2
## [6,]   5 10
```

]

Use brackets to identify subsets of a matrix

``` r
# First column
Matrix.1[, 1]
## [1] 200   4 200   4 200   5
```

``` r
# Third row
Matrix.1[3, ]
##  x1  x2 
## 200  NA
```

``` r
# Rows of Matrix.1 where first 
# column is greater than 10
Matrix.1[Matrix.1[,1] > 10, ]
##       x1 x2
## [1,] 200  2
## [2,] 200 NA
## [3,] 200  2
```
]

---

### 3. Data frame: multiple variables of same length in matrix form

.pull-left[
- Very important data structure in R!
- Columns can be different data types
- Think of data frames as Excel spreadsheets
- You may see *tibble*s; just a slightly modified version of data frames used in *tidyverse* coding (more on the tidyverse later)

]

``` r
df <- data.frame(
  st = c("UT", "NV", "OR", 
         "TX", "NY", NA), 
  Wages = x1,
  Spend = x2)
df
##     st Wages Spend
## 1   UT   200     2
## 2   NV     4     8
## 3   OR   200    NA
## 4   TX     4     8
## 5   NY   200     2
## 6 <NA>     5    10

# Variable names
names(df)
## [1] "st"    "Wages" "Spend"
```
]

---

#### Data frames (continued)

Use brackets to identify subsets of a data frame

``` r
# 2nd column
df[, 2]
## [1] 200   4 200   4 200   5
```

``` r
# 4th row
df[4, ]
##   st Wages Spend
## 4 TX     4     8
```

Dollar sign notation

``` r
df$Wages
## [1] 200   4 200   4 200   5
```

``` r
# Rows of df with wages < 10
df[df$Wages < 10, ]
##     st Wages Spend
## 2   NV     4     8
## 4   TX     4     8
## 6 <NA>     5    10
```

---

### 4. List: an object containing other objects

``` r
# Create some objects of different types and lengths
number.vec = c(2, 3, 5) 
string.vec = c("aa", "bb", "cc", "dd", "ee") 
logical.vec = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
```

``` r
# x.lists contains data objects n, s, b
x.list = list(nn = number.vec, ss = string.vec, ll = logical.vec, 3)   
x.list
## $nn
## [1] 2 3 5
## 
## $ss
## [1] "aa" "bb" "cc" "dd" "ee"
## 
## $ll
## [1]  TRUE FALSE  TRUE FALSE FALSE
## 
## [[4]]
## [1] 3
```

--
Think of lists as folders

---
#### Lists (continued)

``` r
x.list
## $nn
## [1] 2 3 5
## 
## $ss
## [1] "aa" "bb" "cc" "dd" "ee"
## 
## $ll
## [1]  TRUE FALSE  TRUE FALSE FALSE
## 
## [[4]]
## [1] 3
```

--
- Referencing objects in lists can be a bit tricky

--
  + Rule of thumb: double brackets are for lists

--
  + Double brackets pull out an object from a list -- an object may itself have many elements!

--
  + Double brackets returns an element from a list as an object

``` r
x.list[[1]]   # First object in the x.list list
## [1] 2 3 5
x.list[[4]]  # Fourth object in the x.list list
## [1] 3
```

---

#### Lists (continued)

We can combine single and double brackets

``` r
x.list
## $nn
## [1] 2 3 5
## 
## $ss
## [1] "aa" "bb" "cc" "dd" "ee"
## 
## $ll
## [1]  TRUE FALSE  TRUE FALSE FALSE
## 
## [[4]]
## [1] 3
```

``` r
# What will this return?
x.list[[1]][3]  
```

]

``` r
x.list[[1]][3]  
## [1] 5
```

``` r
# 1st element of 4th object in x.list
x.list[[4]][1]  
## [1] 3

# 2nd element of "nn" object in x.list
x.list[["nn"]][2]  
## [1] 3

# 3rd element of ss object in x.list
x.list$ss[3]  
## [1] "cc"
```
]

---

### Data structure

--
.pull-left[
Figure out the data structure of an object:

``` r
str(x1)
##  num [1:6] 200 4 200 4 200 5
str(Matrix.1)
##  num [1:6, 1:2] 200 4 200 4 200 5 2 8 NA 8 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "x1" "x2"
str(df)
## 'data.frame':    6 obs. of  3 variables:
##  $ st   : chr  "UT" "NV" "OR" "TX" ...
##  $ Wages: num  200 4 200 4 200 5
##  $ Spend: num  2 8 NA 8 2 10
str(x.list)
## List of 4
##  $ nn: num [1:3] 2 3 5
##  $ ss: chr [1:5] "aa" "bb" "cc" "dd" ...
##  $ ll: logi [1:5] TRUE FALSE TRUE FALSE FALSE
##  $   : num 3
```
]

Compare to knowing the data type

``` r
class(x1)
## [1] "numeric"
```
]
---

## Goal 6: Describe data

---

#### Descriptive statistics

List the data objects in memory

``` r
objects()
##  [1] "df"                "dta"               "logical.vec"      
##  [4] "Matrix.1"          "Missing.indicator" "my_data"          
##  [7] "number.vec"        "Poll.1"            "Poll.2"           
## [10] "string.vec"        "x.dot"             "x.list"           
## [13] "x.logical"         "x.logical2"        "x.new"            
## [16] "x.numeric"         "x.string"          "x1"               
## [19] "x2"
```

---
#### Descriptive statistics (continued)

Many functions available

``` r
# mean() is basic function -- as are sum(), min() etc
mean(x1)
## [1] 102.1667
```

``` r
# table() provides the frequency distribution of each value 
table(x1)
## x1
##   4   5 200 
##   2   1   3
```

``` r
# summary() provides basic descriptive stats 
summary(x1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00    4.25  102.50  102.17  200.00  200.00
```

---

#### Descriptive statistics (continued)

--
We often need to deal with missing data

``` r
x2
## [1]  2  8 NA  8  2 10
mean(x2)
## [1] NA
```

--
For many functions, remove NA observations with ", na.rm=TRUE"

``` r
mean(x2,   na.rm = TRUE)
## [1] 6
median(x1, na.rm = TRUE)
## [1] 102.5
max(x.new, na.rm = TRUE)
## [1] 10
```

---
#### Descriptive statistics (continued)

``` r
# For matrix or data frame - need to specify what mean we're looking for
Matrix.1
##       x1 x2
## [1,] 200  2
## [2,]   4  8
## [3,] 200 NA
## [4,]   4  8
## [5,] 200  2
## [6,]   5 10
```

``` r
# Mean of all data in matrix (seldom useful)
mean(Matrix.1, na.rm = TRUE)
## [1] 58.45455
```

``` r
# Mean of column 1
mean(Matrix.1[,1])
## [1] 102.1667
```

---
#### Descriptive statistics (continued)

--
*apply()* function applies a function to specified dimension of a matrix

``` r
Matrix.1
##       x1 x2
## [1,] 200  2
## [2,]   4  8
## [3,] 200 NA
## [4,]   4  8
## [5,] 200  2
## [6,]   5 10

apply(Matrix.1, 1, mean) # Means of rows
## [1] 101.0   6.0    NA   6.0 101.0   7.5
```

``` r
apply(Matrix.1, 2, mean) # Means of columns
##       x1       x2 
## 102.1667       NA
```

``` r
# Add additional arguments for the function (such as dealing with NA)
apply(Matrix.1, 2, mean, na.rm=TRUE)
##       x1       x2 
## 102.1667   6.0000
```

---

## Goal 7: Create new data objects

---

#### Creating new data objects

Sequences

``` r
# Sequence of numbers 1 thru 4
x.seq = seq(1:4)
x.seq
## [1] 1 2 3 4
```

``` r
# Sequence from 0 to 100 by 20
x.seq2 = seq(0, 100, by = 20)
x.seq2
## [1]   0  20  40  60  80 100
```

Random numbers

``` r
# 6 draws from a random normal distribution
x.rand = rnorm(6)
x.rand
## [1]  1.0680748 -1.4616957  0.1676797  0.9629749  1.4979996 -0.3886791

# 4 draws from a random uniform distribution
x.rand.unif = runif(4)
x.rand.unif
## [1] 0.2516914 0.1401574 0.5151736 0.4227422
```

---

#### Creating new data objects (continued)
#### Can put data into existing objects

``` r
# Add a variable called x3 in the df dataframe
df$x3 = seq(-4, 1, by = 1)
```

``` r
# What is the difference?
Wages.SQ         = df$Wages^2
df$Wages.squared = df$Wages^2
```

``` r
Wages.SQ
## [1] 40000    16 40000    16 40000    25
```

``` r
names(df)
## [1] "st"            "Wages"         "Spend"         "x3"           
## [5] "Wages.squared"
objects()[1:15]
##  [1] "df"                "dta"               "logical.vec"      
##  [4] "Matrix.1"          "Missing.indicator" "my_data"          
##  [7] "number.vec"        "Poll.1"            "Poll.2"           
## [10] "string.vec"        "Wages.SQ"          "x.dot"            
## [13] "x.list"            "x.logical"         "x.logical2"
```

---

class: left, top, inverse, hide-logo
background-position: center
background-size: cover

## Goal 8: Work with loops

---

### Loops

Use the **_`for()` function_** to create loops.

```{}
            for( i  in  1:5 )
                 ^        ^
                 |        |___ the values of i that will be evaluated in the loop
                 |
          i is the 'counter' 
```

**_brackets `{}`_** house the code that is going to happen each iteration.

```{}
        for( i  in  1:5  ){
          |~~~~~~~~~~~~~~~~|   
          |~~~~~~~~~~~~~~~~|
          |~~~~~~~~~~~~~~~~| code performed for each iteration.
          |~~~~~~~~~~~~~~~~|
        }
```

---

#### Loops (continued)

Example

``` r
        for( ii  in  1:4  ){
          cat("The value of ii is", ii, "\n")
          }
## The value of ii is 1 
## The value of ii is 2 
## The value of ii is 3 
## The value of ii is 4
```

- We can name the looping counter anything we want (_ii_ in this case)

- We use the `cat` function (for _concatenate_) to print cleanly

- We use "\n" to start a new line each time

---

class: left, top, inverse, hide-logo
background-position: center
background-size: cover

## Goal 9: Understand if statement

---

### If statements

- Structure:  if(condition) { true.expression } else {false.expression}

``` r
# Simple example using nchar function (that counts characters in string object)
if(nchar("Algeria") > 8) cat("Algeria is a long name\n")
if(nchar("Algeria") < 8) cat("Algeria is a short name\n")
## Algeria is a short name
```
--

``` r
# Now in a loop
for(cc in 2:7){
   if(nchar(CountryCode$country[cc]) > 8) {
      cat(CountryCode$country[cc], "is a long name\n")  }
   if(nchar(CountryCode$country[cc]) < 8) {
      cat(CountryCode$country[cc], "is a short name\n") }
}
## Algeria is a short name
## American Samoa is a long name
## Andorra is a short name
## Angola is a short name
## Antigua is a short name
## Azerbaijan is a long name
```

---

#### If statements (continued)

-   *ifelse* Structure: ifelse(condition, expression1, expression2)

``` r
# Create a new variable that equals 1 if country contains a space
# But some rows are not countries
CountryCode[CountryCode$V2<0,]
##     V2             country
## 187 -5    Missing; Unknown
## 188 -4 Not asked in survey
## 189 -3      Not applicable
## 190 -2           No answer
## 191 -1          Don't know
```

``` r
# Use ifelse to only identify countries with spaces
CountryCode$nameLength  = ifelse(CountryCode$V2  > 0, 
                                nchar(CountryCode$country),  
                                NA)
CountryCode[c(1:4, dim(CountryCode)[1]),]
##     V2        country nameLength
## 1    8        Albania          7
## 2   12        Algeria          7
## 3   16 American Samoa         14
## 4   20        Andorra          7
## 191 -1     Don't know         NA
```