Introduction to R and RStudio

R

R is a free, open source software program for statistical computing and analysis.

Things to know about R:

  • Statistical computing environment with its own language.
  • Released in 2000 as an open source implementation of S.
  • Available for Windows, Macintosh, and Linux.
  • Produces publication-quality graphs.
  • Numerous statistical methods (basic and advance) and algorithms, through availability of user-created packages.
  • Has packages for weaving written reports and analysis code in one document aka R Markdown and R Notebook.

RStudio

RStudio is a free, open source IDE (integrated development environment) for R.

Things to know about RStudio:

  • Before installing RStudio, R must first be installed.
  • The interface is structured such that users can clearly view: data frames (tables) and graphs.
  • R code and output all in one place at the same time.
  • It allows users to seamlessly import CSV, Excel, text (txt), SAS (.sas7bdat), SPSS (.sav), and Stata (*.dta) files into R without having to write the code to do so.

Installation of R and RStudio

  • You can download R onto your PC from this link.
  • You can download both R and RStudio onto your PC from this link.
  • Short Videos for Installing R and RStudio:
    • Click me for a short video on how to download and install R and RStudio for Windows.
    • Click me for a short video on how to download and install R and RStudio for Mac. Or you can watch this video:
    • Click me for a short video on how to download and install R and RStudio for Mac.

Basic Operations in R

1. Arithmetic Operations: +, -, *, /, ^ or **.

# Addition
2 + 5

# Subtraction
3 - 10

# Multiplication
6 * (-3)

# Division
-9/10

# Exponentiation
2^5

2. Variables: Assigning values to variables using the \(<-\) operator or \(=\).

x <- 5

y <- 3

z <- x + y
z

u <- -2 * z
u

v <- z/u
v

3. Vectors: Creating vectors using the \(c()\) function.

# Numeric vector
vec_numeric <- c(25, 32, 18, 63, 78)
vec_numeric

# Character vector
vec_charactor <- c("apple", "banana", "orange", "mango", "pawpaw")
vec_charactor

# Logical vector
vec_logical <- c(TRUE, FALSE, TRUE)
vec_logical

4. Indexing and Slicing a Vector: Accessing elements of a vector using square brackets [].

# Pick the second number
vec_numeric[2]

# Pick the second and the fifth numbers
vec_numeric[c(2,5)]

# Pick all except the second and fifth numbers
vec_numeric[-c(2,5)]

# Pick the third fruit
vec_charactor[3]

# Pick the third and fifth fruits
vec_charactor[c(3,5)]

# Pick all except the third and fifth fruits
vec_charactor[-c(3,5)]

5. Data Frames: Creating and working with data frames, which are like tables.

# Data frame with age, height, weight, and gender
dat <- data.frame(age = c(25, 30, 35, 28, 22, 27, 33, 29, 31, 24),
                  ht = c(175, 160, 180, 165, 170, 168, 175, 162, 178, 160),
                  wt = c(70, 55, 80, 60, 65, 68, 75, 58, 82, 50),
                  sex = c("Male", "Female", "Male", "Female", "Male", "Male", "Female", "Female", "Male", "Female"))
dat

6. Accessing Elements of a Data Frame: Accessing elements of a data frame using square brackets [] or the $ operator.

Various Conditions to Subset Data.:

  • Equality (==), Inequality (!=), Greater than (>), Greater than or equal to (>=), less than (<), and less than or equal to (<=).

  • The & operator is used for AND conditions, the | operator is used for OR condition, and the ! operator is used for NOT condition.

# Accessing the entire 'age' column
ages <- dat$age
ages

# Accessing the first three elements of the 'height' column
first_three_ht <- dat$ht[1:3]
first_three_ht

# Accessing the value in the second row and third column ('weight')
wt_second_row <- dat[2, "wt"]
wt_second_row

# Accessing the first two columns ('age' and 'height')
first_two_columns <- dat[, 1:2]
first_two_columns

# Accessing the last two columns ('weight' and 'gender')
last_two_columns <- dat[, 3:4]

# Accessing specific columns by index ('height' and 'gender')
height_gender <- dat[, c(2, 4)]
height_gender

# Accessing a subset of the data frame based on a condition
subset_data_1 <- dat[dat$age >= 30, ]
subset_data_1

# Accessing a subset of the data frame based on a condition
subset_data_2 <- dat[dat$sex == "Male", ]
subset_data_2

# Accessing a subset of the data frame based on multiple conditions
subset_data_3 <- dat[dat$age < 30 & dat$sex == "Female", ]
subset_data_3

# Accessing a subset of the data frame based on multiple conditions
subset_data_4 <- dat[dat$wt > 50 & dat$wt <= 75, ]
subset_data_4

# Accessing a subset of the data frame based on multiple conditions
subset_data_5 <- dat[dat$wt < 60 | dat$wt > 75, c("age", "sex", "ht")]
subset_data_5

Reading Data into R

Reading CSV Files (Local Computers)

  • The easiest way to read data from a CSV file is to use read.table(). Most people prefer to use read.csv() which is a rapper around read.table() with the sep argument preset to a comma (,).
  • The outcome of using read.table() is a data.frame.
  • We will learn to import a CSV file from your local computer into R using the test data set. Click me to download the test data set to your local computer.
# Set your working directory to the folder on your local computer that contains the test data set.
# How to set your working directory: On the top RStudio menu, click on “Session”, then “Set Working Directory”, then “Choose Directory”.
setwd("D:/Year_2024/Documents/Fall_2024/MA223")

# Read data into R using the read.csv() function.
test_dat_loc <- read.csv("test.csv", header = TRUE, stringsAsFactors = TRUE)
test_dat_loc

Reading CSV Files (Websites)

url <-  "https://raw.githubusercontent.com/sylvadon4/data_sets/main/test.csv"
test_dat_web <- read.csv(url, header = TRUE, stringsAsFactors = TRUE)
test_dat_web

Built-in Data sets in R

R comes with several built-in data sets. To see the list of pre-loaded data, type the function data():

data()

Load theChickWeight data as follow:

data(ChickWeight)

# To get details about this data set, remove the hash tag and run the code.
# ?ChickWeight

Data Manipulation

Data Inspection

  • head() for first few rows of a matrix or data frame.
  • tail() for last few rows of a matrix or data frame.
  • dim() for dimension of a matrix or data frame.
  • str() for displaying the structure of an R object.
  • nrow() for number of rows of a matrix or data frame.
  • ncol() for number of columns of a matrix or data frame.
  • summary() for numeric variables.
  • quantile() for quartiles.
  • table() for categorical variables.
  • sum(is.na()) for counting the number of NAs in the entire dataset.

If you need to change the data type for any column, use the following functions:

  • as.character() converts to a text string.
  • as.numeric() converts to a number.
  • as.factor() converts to a categorical variable.
  • as.integer() converts to an integer.

Inspect the Test Dataset

url <-  "https://raw.githubusercontent.com/sylvadon4/data_sets/main/test.csv"
test_dat_web <- read.csv(url, header = TRUE, stringsAsFactors = TRUE)
test_dat_web
head(test_dat_web)
tail(test_dat_web)
dim(test_dat_web)
str(test_dat_web)
summary(test_dat_web)

Introduction to R Packages

What is an R Package?

An R package is a collection of R functions, data, and compiled code designed to solve specific problems or provide additional functionality to the R programming language. Packages are crucial for extending R’s capabilities beyond its base functionality.

Installing and Loading R Packages

To install an R package, you can use the install.packages() function. Open your R console or script and type:

install.packages("package_name")

Replace package_name with the name of the package you want to install. If you need to install multiple packages, you can pass a vector of package names.

install.packages(c("package1", "package2", "package3"))

Once installed, you need to load the package into your R session using the library() function. This makes the package’s functions and features available for use.

library(package_name)

Example: Install and Load the ISLR2() package in R. Click me for the documentation on `ISLR2()

# install.packages("ISLR2")
library(ISLR2)

Note: A package can be removed using remove.packages("package name").


  1. Southeast Missouri State University, ↩︎