03:00
Dr Ibrahim Inal
Lecturer in Economics
Prof Anurag Banerjee
Lecturer in Economics
Mr. George Cheatle
PhD in Economics
03:00
Data science and computation is a huge field, and it is impossible to mastering all aspects by taking a module or reading a single book.
You will have
details later…
You will probably be disappointed…
Instead I hope to convince you…
And yes I know it was hoping
The RStudio Help menu contains links to many documents for help with both R (select R Help) and RStudio (see RStudio Docs and RStudio Community Forum).
I particularly like the Cheatsheets, which are compact documents crammed with useful information on how to use various products made by the RStudio group.
RStudio Cheatsheet
Open the cheatsheet for RStudio by selecting the Help menu -> Cheatsheets -> RStudio IDE Cheat Sheet. Note the cheatsheet will usually be downloaded in a web browser as a .pdf.
install.packages() function, which typically downloads the package from CRAN and installs it for use. Installing once is enough, but if you update R or RStudio, need to install againlibrary() or require()Many packages include vignettes – longer, tutorial style guides for a package.
Tip
View the “Introduction to dplyr” vignette by issuing the command vignette(“dplyr”).
getconf LONG_BIT command in the terminalYou could also use R and RStudio from
[1] 2
[1] 2
Below is a list of arithmetic operators in R.
| Operator | Description | Example | Result |
|---|---|---|---|
| + | Addition | 2 + 3 | 5.00 |
| - | Subtraction | 5 - 2 | 3.00 |
| * | Multiplication | 4 * 5 | 20.00 |
| / | Division | 10 / 2 | 5.00 |
| ^ | Exponentiation | 2^3 | 8.00 |
| %% | Modulus (Remainder) | 10 %% 3 | 1.00 |
| %/% | Integer Division | 10 %/% 3 | 3.00 |
Caution
\(=\) does the same thing with <- i.e., assignment operator. \(==\) means equal.
| Operator | Description | Example | Result |
|---|---|---|---|
| & | Logical AND | TRUE & FALSE | FALSE |
| | | Logical OR | TRUE | FALSE | TRUE |
| ! | Logical NOT | !TRUE | FALSE |
| == | Equal to | 5 == 5 | TRUE |
| != | Not equal to | 5 != 3 | TRUE |
| > | Greater than | 3 > 2 | TRUE |
| >= | Greater than or equal to | 3 >= 3 | TRUE |
| < | Less than | 4 < 6 | TRUE |
| <= | Less than or equal to | 4 <= 4 | TRUE |
Note
Note that in order to use package specific functions you need to install and call out the package.
Open up a script file by clicking File>New File>R Script. Write #This is my first script and carry out
rm(). Try getting help about rm().R works with objects.
Any object in R is of a particular type, is stored in a particular way, and belongs to a particular class. The first two is more related with how R handle the object, the last one is based on the use of the object. Note that in most text the distinction between data and object types, storage and classes is not clear and depends on the context. You could read more on this https://stackoverflow.com/questions/6258004/types-and-classes-of-variables.
You could think class as the structure of the object from programming perspective and type from R’s perspective.
Caution
This discussion is not relevant for other programming languages.
| Class Type | Description | Example |
|---|---|---|
| character | Character/String data | name <- "John" |
| numeric | Numeric data (real numbers) | price <- 12.34 |
| integer | Integer data (whole numbers) | age <- 30 |
| complex | Complex data (real + imaginary parts) | z <- 3 + 2i |
| logical | Boolean/Logical data (TRUE/FALSE) | is_valid <- TRUE |
Caution
Class is a blueprint for the object.class() gives the class of variable
| Types | Description | Example |
|---|---|---|
| logical | Boolean/Logical data (TRUE/FALSE) | is_valid <- TRUE |
| integer | Integer data (whole numbers) | age <- 30L |
| double | Numeric data (real numbers) | price <- 12.34 |
| complex | Complex data (real + imaginary parts) | z <- 3 + 2i |
| character | Character/String data | name <- "John" |
| raw | Raw bytes | binary_data <- charToRaw("hello") |
| list | List of objects | my_list <- list(1, "hello", TRUE) |
| NULL | Null object (no value) | x <- NULL |
| closure | Function/Closure | add_numbers <- function(a, b) { return(a + b) } |
| special | Special function types | special_func <- system.time() |
| builtin | Basic functions and operators | sum_builtin <- sum(1:10) |
| environment | Environment (symbol and value bindings) | env <- new.env() |
Caution
We can check the type of variables by using typeof() function. We can change the type of a variable to type x using the function as.x. This process is called “coercion”, e.g., as.numeric("123").
In addition to these abstract structures, there are native objects in R. Since R is primarily designed for statistical analysis, its objects are generally data.
| Data Structure | Example |
|---|---|
| vector | x <- c(1, 2, 3, 4) |
| matrix | mat <- matrix(1:6, nrow = 2) |
| array | arr <- array(1:24, dim = c(2, 3, 4)) |
| list | my_list <- list(1, "hello", TRUE) |
| data frame | df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30)) |
| factor | gender <- factor(c("Male", "Female", "Male")) |
| table | tbl <- table(c("A", "A", "B", "C", "C", "C")) |
Vectors can be generated by many functions.
x <- numeric() # initiate an empty numeric vector
y <- c(5, 6, 7) # generate a vector by connecting scalars via c() ("concatenate")
z <- c(5, "test", 7) # does not work as intended
typeof(x)[1] "double"
[1] "double"
[1] "character"
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE
[1] TRUE
[1] "numeric"
[1] "integer"
[1] TRUE TRUE TRUE
[1] TRUE TRUE TRUE
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
[1] 5 4 3 2 1 0 -1 -2 -3 -4 -5
[1] 10.0 10.5 11.0 11.5 12.0 12.5
seq(from = 10, to = 12.5, length.out = 10) # ... sequence with predefined length (implicit increment) [1] 10.00000 10.27778 10.55556 10.83333 11.11111 11.38889 11.66667 11.94444
[9] 12.22222 12.50000
[1] 5 5 5
[1] 6 7 6 7 6 7
[1] 6 6 6 7 7 7
Matrices can be generated directly by using matrix() function or by using vectors
x <- matrix(1:6, ncol=2) # construct matrix with 2 columns filled columnwise (default)
matrix(1:6, ncol=2, byrow=T) # ... or row by row [,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[1] 2
[1] 3
# constructing matrices from vectors
x <- cbind(1:6, 2:7, 3:8) # bind vectors column-wise
dim(x) # reports dimensions of matrix[1] 6 3
[1] TRUE
[1] 2 2 3
[1] TRUE
Most of mathematical operations on vectors and matrices can be carried out with R.
[1] -5 -5 -5 -5 -5
[1] 6 14 24 36 50
[1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000
[,1]
[1,] 130
[1] 10 9 8 7 6
[,1] [,2] [,3] [,4] [,5]
[1,] 6 7 8 9 10
[2,] 12 14 16 18 20
[3,] 18 21 24 27 30
[4,] 24 28 32 36 40
[5,] 30 35 40 45 50
[1] 15
# matrices
x <- matrix(1:9, ncol = 3)
y <- matrix(10:18, ncol = 3)
x * y # elementwise multiplication [,1] [,2] [,3]
[1,] 10 52 112
[2,] 22 70 136
[3,] 36 90 162
[,1] [,2] [,3]
[1,] 138 174 210
[2,] 171 216 261
[3,] 204 258 312
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1] 45
[1] 1 5 9
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE TRUE FALSE
[1] 12 15 18
[1] 6 15 24
[1] 4 5 6
[1] 2 5 8
Data frames are like matrices, but they can contain variables of multiple types.
x <- data.frame(a = 1:3, b = c("rest", "test", "nest"), c = c(T,F,T)) # a data frame with 3 columns named a, b, and c
is.matrix(x)[1] FALSE
[1] TRUE
[1] 3 3
List is one of the most flexible way to handle data.
x <- list(a = 1:3, b = "nest", c = TRUE) # a simple list
x <- list(a = 1:3, b = "nest", c = list(d = "test", e = rep(x = c(TRUE,FALSE), each = 3))) # a more complicated list
is.list(x)[1] TRUE
[1] 3
x+y. Comment on your resultx+y. If you find a way, then implement this and call the new variable zz.uw.z <- c(1, 2, "3")? Why does R return this result?Indexing allows to get an information from data structures.
[1] 11
[1] 15 16 17
[1] 15 17
[1] 10 11 12 13 14 16 18 19 20
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
[1] 4
[,1] [,2]
[1,] 7 10
[2,] 8 11
registry<-list( students=c("Jack","Jill","George"), attendance=c(0.2,0.6,0.5), marks=c("B", "B", "C"))
registry$students[1] "Jack" "Jill" "George"
[1] "Jack" "Jill" "George"
Write a command that produces the names of people who have more than 1 child.(Hint: use indexing on person vector)
Write a command that produces the years of education with at least 13 years of education. (Hint: use indexing on yearsEducation vector)
Write a command that produces the names of people who have more than 1 child and who have at least 13 years of education (Hint use previous two)
Write a command that says whether or not there is someone who has more 15 years of education and at least one child, but doesn’t have any pets (Hint: you could use any(). Also note that ! means not for R )
Write a command that says whether or not every person has more than 13 years education. (Hint; You could use all() function)
Check the which() function by using help. Write a code to get some information by using which()
Within RStudio, click on the menu File -> New File -> R Markdown.... In the pop up window, give the document a ‘Title’ and enter the ‘Author’ information (your name) and select HTML as the default output. Note that HTML files can be opened with any browser.
You could create project to keep your files organised. Project files could contain anything (images, data, scripts etc.) related with your project.
To create a project, open RStudio and select File -> New Project… from the menu.
Alternatively, create a new project by clicking on the ‘Project’ button in the top right of RStudio and selecting ‘New Project…’
In the next window, select New Project
You could name your project and set working directory (it is the place R would looks for things such as data, images etc.Try getwd()).
You could also see your projects within RStudio in the Files tab.
<-.