First Steps with R and RStudio

Matt Steele

Resources

Why Use R and RStudio

  • Open-source

    • Free

    • Platform independent

    • Reproducible

    • Shareable

    • Contains add-on packages

  • Created for data statistical computation and graphic export

File Types

  • R Script: create code based text file that allows you to save and execute code at your discretion

File > New File > R Script


  • R Markdown or Quarto Document: allows you to combine narrative text, code, and the output of code in a single document.

File > New File > R Markdown File > New File > Quarto Document

RStudio

RStudio is an Integrated Development Environment (IDE) that allows you to save you code, store your variables and environments and view outputs.

Source Pane

source_pane

Source Pane

this pane is opened when you create or open a markdown or script file.

  • This area is where you can create code in script or markdown files

Console Pane

console_pane

Console Pane

This is where you interact with the R. The results of your commands are displayed in this pane.

  • Useful for testing code and exploring data

Environment Pane

envir_pane

Environment Pane

view functions, objects, and data sets that are stored here

  • Your environment can be saved and accessed at any point
  • Save your environment to your working directory

Misc Pane

misc_pane

Misc Pane

view files, plots, packages, and get help

Set Your Preferences

Tools > Global Options


Some suggested Preferences to set:

  • Code > Editing > Use Native Pipe Operator

  • Code > Editing > Soft wrap source R files

Working Directory

The working directory in R is the folder where you are working. Hence, it’s the place (the environment) where you have to store your files of your project in order to load them or where your R objects will be saved.

Session > Set Working Directory > Choose Directory


getwd() # show current directory that you are in


setwd("path/to/your/directory") # sets the working directory

Keyboard Shortcuts

Tools > Keyboard Shortcuts Help


PC MAC
Run Code CTRL + ENTER CMD + RETURN
Assignment Operator ALT + - OPTION + -
Pipe Operator CTRL + SHIFT + M CMD + SHIFT + M

Commenting

Comments are used to provide context, documentation, and explanations for the code.


mean(mtcars$mpg) # get the mean of the mpg variable

Functions

Commands your give to perform a task.

toolbox


sample(1:5000)

mean(c(1, 3, 500))

str(400)

Arguments

The information that you give to a function to tell it what to do.

pizza


sample(1:5000, size = 50, replace = TRUE)

sample(1:5000, 50, TRUE)

Documentation

Provides the necessary information, explanations, examples, and guidance to help you learn, understand, and effectively use R functions and packages.


Objects

Allows you to store and work on data (numbers, words, tables, and more).

drawer

Assignment Operator

The assignment operator (<-) allows you to create an object.


PC MAC
Assignment Operator ALT + - CMD + -


a <- 35
b <- 45

a
a + b

Naming Objects

  • Use descriptive and meaningful names that indicate the purpose of the object
  • Use lowercase letters.
  • Use underscores to separate words (e.g., my_variable_name).
  • Avoid using reserved words or functions (e.g., “if,” “else,” “for,” “function”).

Data Types

Double or Numeric used for numbers which can be integers (whole numbers) or real numbers (numbers with decimal points).
Character used for text, words, and strings of characters. Enclose in double (““) or single (’’) quotes.
Factor used to represent categorical data with predefined levels.
Date used for handling dates, times, and time intervals.
Boolean used for decsion-making and represented by binary values, typically TRUE or FALSE

Packages

R packages are like toolkits or collections of pre-built functions, data sets, and tools that extend the capabilities of the R programming language.

Packages


Install

You must install a package before you can load it. But you only need to install it one time.

install.packages("tidyverse")


Load

For every new session, you must load it to use the package’s functions.

library(tidyverse)

Vectors

ordered collections of data items of the same type.

table row


vec_one <- c(1,2,3)
vec_two <- c(4:6)

vec_two

Data Frame

two-dimensional sequence of data variables (columns) and observations (rows). While each variable in a data frame typically contains data of the same type, different variable can contain different data types.


# create vectors
title <- c("Star Wars", "The Empire Strikes Back", "Return of the Jedi")
year <- c(1977, 1980, 1983)
length.min <- c(121, 124, 133)
box.office.mil <- c(787, 534, 572)
  
# combine these vectors with the data.frame() function

  starWars.data <- data.frame(title, year, length.min, box.office.mil)
  starWars.data

Subsetting Variables

allows you to select and work with specific variables (columns) from a data frame.


starWars.data$year

Export / Save Data

Once you are done entering your data, you can export it to your working directory. The function without built-in arguments is write.table( ) but if are saving it as a csv, you are better using write.csv( ).


write.csv(starWars.data, "starwars.csv")

Load Data

load data from a file in your working directory using the read_csv() function from tidyverse. There is also the read.csv( ) function in base R.


fight_songs <- read_csv("fight-songs.csv")

Explore Data


view(fight_songs) # view in a new tab

nrow(fight_songs) # number of rows

ncol(fight_songs) # number of columns

str(fight_songs) # structure of data frame

Descriptive Statistics


summary(fight_songs)

sum(fight_songs$number_fights)

mean(fight_songs$number_fights)

median(fight_songs$number_fights)

sd(fight_songs$number_fights)

min(fight_songs$number_fights)

max(fight_songs$number_fights)

Conclusion