A New Dawn in R Programming Language

Terhemen Hulugh

2024-07-28

Introduction

The purpose of the first exercise of the day is to introduce you to R, Rstudio and some basic concepts in R.

After completing the exercise you should be able to:

Install R and R Studio

Install R from the R homepage: https://www.r-project.org/

Install Rstudio from the R studio homepage: https://www.rstudio.com/

R studio is divided into 4 quadrants:

  1. Source/script - for writing and saving your code
  2. Console - where the code is run
  3. Environment
  4. File/Plots/Packages/Help - shows files in active directory, display plots, existing packages and documentation of packages and functions

Check version of Rstudio in use

R.version.string
## [1] "R version 4.3.0 (2023-04-21 ucrt)"

To update R, run the codes below:

Exercise:

  1. Difference between a Package and Function in R (Give examples)

  2. Install and load the following packages: tidyverse, dplyr and ggplot2 using the install.packages() and library() functions respectively.

Always ensure the package name is in double quotes (” “) when it is used in install.packages() function.

Exercise Contd:

  1. Check the documentation of the lm() function by typing ?lm in the console
  1. Check your working directory using the getwd() function
getwd()
## [1] "C:/Users/TERHEMEN HULUGH/Documents/R Training"

Exercise Contd:

  1. Creat a folder on desktop called R Training and set your working directory to this folder using the setwd() function.

It is recommend that you also specify folders on your computer by creating data, script, results and graphics folders. Note: - You have to replace the path with one that matches your setup - i.e. copy & paste from the folder

setwd("C:/Users/TERHEMEN HULUGH/Desktop/R Training")
getwd()
## [1] "C:/Users/TERHEMEN HULUGH/Desktop/R Training"

Basic information for using R

x <- c("John", "Peter", "Kingsley", "Andrew") 
x  # autoprint
## [1] "John"     "Peter"    "Kingsley" "Andrew"
print(x) # explicit print
## [1] "John"     "Peter"    "Kingsley" "Andrew"

What is the difference between an Object and a Vector?

A vector is an object that contains one class. However the one exception to this rule is for vectors created as a list.

The 5 basic atomic classes of objects in R are:

  1. numeric

  2. integer

  3. character

  4. logical

  5. complex

Data types: Matrices

Matrices are an array of numbers and are constructed column wise using the matrix() function

x <- matrix(1:10, nrow = 2, ncol = 5)
x
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

Create matrix using the dim function

y <- 1:10
dim(y) <- c(2,5)
y
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

Create matirx using cbind and rbind functions

x <- 1:5
y <- 11:15

cbind(x,y)
##      x  y
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15
rbind(x,y)
##   [,1] [,2] [,3] [,4] [,5]
## x    1    2    3    4    5
## y   11   12   13   14   15

Data types: Factor variables

Factor variables are either unordered or ordered (used for modeling functions like lm, glm)

x <- factor(c("yes", "no", "yes", "yes", "no", "yes"))
x
## [1] yes no  yes yes no  yes
## Levels: no yes
levels(x)
## [1] "no"  "yes"
table(x)
## x
##  no yes 
##   2   4

R normally uses alphabetical order to order the levels of factor variables but this can be changed to your preference using the levels argument in the factor function

x <- factor(c("male", "female", "male", "male", "female", "male"),
            levels = c("male", "female"))
levels(x)
## [1] "male"   "female"
table(x)
## x
##   male female 
##      4      2

Data types: Missing Values

Missing values are either NA or NaN can be identified using is.na() or is.nan() functions

Data types: Missing Values Contd

x <- c(1,5, NA, 10,3)
is.na(x)
## [1] FALSE FALSE  TRUE FALSE FALSE
is.nan(x)
## [1] FALSE FALSE FALSE FALSE FALSE
y <- c(1,5, NA, NaN,3)
is.na(y)
## [1] FALSE FALSE  TRUE  TRUE FALSE
is.nan(y)
## [1] FALSE FALSE FALSE  TRUE FALSE

Data types: Dataframes

x<- data.frame(foo = 1:4, bar = c(T,F,F,T), dro = c("male", "male", "female", "male"))
x
##   foo   bar    dro
## 1   1  TRUE   male
## 2   2 FALSE   male
## 3   3 FALSE female
## 4   4  TRUE   male
x<- data.frame(foo = 1:4, bar = c(T,F,F,T), dro = c("male", "male", "female", "male"))
nrow(x)
## [1] 4
ncol(x)
## [1] 3
x<- data.frame(foo = 1:4, bar = c(T,F,F,T), dro = c("male", "male", "female", "male"))
str(x)
## 'data.frame':    4 obs. of  3 variables:
##  $ foo: int  1 2 3 4
##  $ bar: logi  TRUE FALSE FALSE TRUE
##  $ dro: chr  "male" "male" "female" "male"
summary(x)
##       foo          bar              dro           
##  Min.   :1.00   Mode :logical   Length:4          
##  1st Qu.:1.75   FALSE:2         Class :character  
##  Median :2.50   TRUE :2         Mode  :character  
##  Mean   :2.50                                     
##  3rd Qu.:3.25                                     
##  Max.   :4.00

Data types: names attribute

The names() function assigns names to the columns of a dataset

x<- 1:3
x
## [1] 1 2 3
names(x)
## NULL
names(x) <- c("foo", "boo", "bar")
x
## foo boo bar 
##   1   2   3
names(x)
## [1] "foo" "boo" "bar"

Reading Data into R:

There are principal functions for reading data into R

Writing data in R

The following are functions for saving data in R

Subseting

There are a number of operators that can be used to extract subsets from an object

Subsetting using the []

x <- c("a", "b", "c", "b", "a", "d")
x[1]
## [1] "a"
x[6]
## [1] "d"
x[1:4]
## [1] "a" "b" "c" "b"
x[x > "a"]
## [1] "b" "c" "b" "d"

Subsetting a List using [ ], $ and [[ ]]

x <- list(foo = 1:4, bar = 0.6)
x[1]
## $foo
## [1] 1 2 3 4
x$foo
## [1] 1 2 3 4
x[[1]]
## [1] 1 2 3 4
x <- list(foo = 1:4, bar = 0.6)

x$bar
## [1] 0.6
x[["bar"]]
## [1] 0.6
x["bar"]
## $bar
## [1] 0.6

Subsetting multiple elements

x <- list(foo = 1:4, bar = 0.6, goo = "hello")
x[c(1,3)]
## $foo
## [1] 1 2 3 4
## 
## $goo
## [1] "hello"
x$foo
## [1] 1 2 3 4
x[[1]]
## [1] 1 2 3 4
x <- list(foo = 1:4, bar = 0.6, goo = "hello")

x$bar
## [1] 0.6
x[["bar"]]
## [1] 0.6
x["bar"]
## $bar
## [1] 0.6

Subsetting Matrices

x <- matrix(1:6, 2,3)
x[1,3]
## [1] 5
x[2,2]
## [1] 4
x[1,3, drop=FALSE]
##      [,1]
## [1,]    5

Assignment

  1. Install swirl Since swirl is an R package, you can easily install it by entering a single command from the R console: install.packages(“swirl”)

If you’ve installed swirl in the past make sure you have a more recent version like version 2.2.21 or later. You can check your current version by typing this in the console: packageVersion(“swirl”)

  1. Load swirl Every time you want to use swirl, you need to first load the package. From the R console: library(swirl)

  2. Install the R Progroamming course swirl offers a variety of interactive courses, but for our purposes, you want the one called R Programming. Type the following from the R prompt to install this course: install_from_swirl(“R Programming”)

  1. Start swirl and complete the lessons.

Type the following in the R studio console to start swirl: swirl()

Then, follow the menus and select the R Programming course when given the option.

For the first part of this course you should complete the following lessons: