The RMarkdown file can be downloaded here: http://github.com/kareena-delrosario/regression_rcode

Our introduction to R will be divided into 3 different subsections (orienting yourself to R, data manipulation, data visualization)

Overview of Section 1

1. Different R Platforms
2. R Grammar
3. Installing and Loading Libraries
4. Using Functions
5. Importing datasets

Level 1: Orienting yourself to R

Different R Platforms: R Script, R Markdown, and Colab

R Script

This is the basic format for RStudio. The output goes to the console. Plots output to the window in the righthand corner. ‘#’ comments code.

R Markdown

Integrates R code with narrative text. Output and plots appear directly under the code. Can be “knitted” to output an easy-to-read Word, PDF, or HTML. Code is placed in separate blocks (shortcut: CMD + Option + i), which can be modified to skip the code or hide the code in the final output. Can create an outline using ‘#’ outside of the code blocks.

Google Colab

Functions like a Google Doc. Multiple users can access it at a time and does not require R to be installed.

Understanding R Grammar

== vs %in% vs =

# these are vectors 
# (basically, strings of values. always need to add 'c' if there is more than 1 value)
x <- c('a', 'b', 'c')
y <- c('c', 'b', 'a')
- What's up with the <-?
If you want to refer to something you created in R, whether that is a dataframe, vector, or whatever, you HAVE to save it to the environment (to the right) using "<-"

==

# logical operator: asks the program for a match
x == y
## [1] FALSE  TRUE FALSE

%in%

# value matching: contains same variables
x %in% y
## [1] TRUE TRUE TRUE
x %in% letters
## [1] TRUE TRUE TRUE

=

# changes variables (new = old)
x = y

print(x)
## [1] "c" "b" "a"
print(y)
## [1] "c" "b" "a"

() vs []

Parentheses ()

# parentheses are used for closing arguments
result = sum(1, 2, 3)
print(result)  # Outputs: 6
## [1] 6

Square brackets []

# square brackets are used to specify what you want in a vector or dataframe

# for vectors
v = c(10, 20, 30, 40)
print(v[2])  # Accessing the second element, Outputs: 20
## [1] 20
# Working with matrices or dataframes
m = matrix(1:9, nrow=3, ncol=3)
print(m)
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

[row, column]

print(m[2, 3])  # Accessing row 2, column 3
## [1] 8
print(m[, 1])   # Accessing all rows in column 1
## [1] 1 2 3

Installing Packages

## Install Packages from CRAN
install.packages("dplyr")


## Install Package from GitHub
# install.packages("devtools")
library(devtools)

#devtools::install_github("DeveloperName/PackageName")
devtools::install_github("RandiLGarcia/dyadr")

Loading Packages

# Load Packages
library(dplyr)

# One Way to Load Multiple Packages
pkgs <- c("psych",
          "tidyr",
          "tidyverse",
          "dplyr",
          "haven",
          "lm.beta",
          "car",
          "skimr",
          "janitor", 
          "labelled", 
          "expss", 
          "foreign")

lapply(pkgs, library, character.only = TRUE)

Checking Packages

search()
##  [1] ".GlobalEnv"        "package:foreign"   "package:expss"    
##  [4] "package:maditr"    "package:labelled"  "package:janitor"  
##  [7] "package:skimr"     "package:car"       "package:carData"  
## [10] "package:lm.beta"   "package:haven"     "package:lubridate"
## [13] "package:forcats"   "package:stringr"   "package:purrr"    
## [16] "package:readr"     "package:tibble"    "package:ggplot2"  
## [19] "package:tidyverse" "package:tidyr"     "package:psych"    
## [22] "package:dplyr"     "package:stats"     "package:graphics" 
## [25] "package:grDevices" "package:utils"     "package:datasets" 
## [28] "package:methods"   "Autoloads"         "package:base"

Using Functions

In R, there are lots of ways you can do the same thing. One example is how to call a dataframe and the variables within it. We can either enter the dataframe name into each command or we can use dplyr’s pipe (this may depend on the package).

Here’s an example using R’s built in dataframe called ‘iris’:

head(iris)

In this exercise, we want to get the average Sepal.Length and Sepal.Width of the setosa species.

Step-by-step approach

df$variable (specify dataframe and variable within the dataframe)
# Call variable 
iris$Species 
##   [1] setosa     setosa     setosa     setosa     setosa     setosa    
##   [7] setosa     setosa     setosa     setosa     setosa     setosa    
##  [13] setosa     setosa     setosa     setosa     setosa     setosa    
##  [19] setosa     setosa     setosa     setosa     setosa     setosa    
##  [25] setosa     setosa     setosa     setosa     setosa     setosa    
##  [31] setosa     setosa     setosa     setosa     setosa     setosa    
##  [37] setosa     setosa     setosa     setosa     setosa     setosa    
##  [43] setosa     setosa     setosa     setosa     setosa     setosa    
##  [49] setosa     setosa     versicolor versicolor versicolor versicolor
##  [55] versicolor versicolor versicolor versicolor versicolor versicolor
##  [61] versicolor versicolor versicolor versicolor versicolor versicolor
##  [67] versicolor versicolor versicolor versicolor versicolor versicolor
##  [73] versicolor versicolor versicolor versicolor versicolor versicolor
##  [79] versicolor versicolor versicolor versicolor versicolor versicolor
##  [85] versicolor versicolor versicolor versicolor versicolor versicolor
##  [91] versicolor versicolor versicolor versicolor versicolor versicolor
##  [97] versicolor versicolor versicolor versicolor virginica  virginica 
## [103] virginica  virginica  virginica  virginica  virginica  virginica 
## [109] virginica  virginica  virginica  virginica  virginica  virginica 
## [115] virginica  virginica  virginica  virginica  virginica  virginica 
## [121] virginica  virginica  virginica  virginica  virginica  virginica 
## [127] virginica  virginica  virginica  virginica  virginica  virginica 
## [133] virginica  virginica  virginica  virginica  virginica  virginica 
## [139] virginica  virginica  virginica  virginica  virginica  virginica 
## [145] virginica  virginica  virginica  virginica  virginica  virginica 
## Levels: setosa versicolor virginica
# Filtering the data for species 'setosa'
filtered_data <- subset(iris, Species == "setosa")

# Selecting specific columns
selected_data <- filtered_data[,c("Sepal.Length", "Sepal.Width", "Species")]

# Grouping the data ('data of interest' ~grouped_by 'Species', the data, function)
final_result1 <- aggregate(cbind(Sepal.Length, Sepal.Width) ~ Species, selected_data, mean)

print(final_result1)
##   Species Sepal.Length Sepal.Width
## 1  setosa        5.006       3.428

Dplyr approach

%>% (pipe data into functions; shortcut = CMD + Shift + M)
library(dplyr)

final_result2 <- iris %>%
  filter(Species == "setosa") %>% # Filtering the data for species 'setosa'
  select(Sepal.Length, Sepal.Width, Species) %>% # Selecting specific columns
  group_by(Species) %>% # not necessary but guarantees it's kept in output
  dplyr::summarize(average_sepal_length = mean(Sepal.Length),
                   average_sepal_width = mean(Sepal.Width)) 

print(final_result2)
## # A tibble: 1 × 3
##   Species average_sepal_length average_sepal_width
##   <fct>                  <dbl>               <dbl>
## 1 setosa                  5.01                3.43

Calling functions

?mean
?dplyr::mutate

Mini-challenge!

Let’s revisit () and [] using the vectors we created earlier:

data <- data.frame(x, y)
as_tibble(data)
new_data <- data [] # how would you subset COLUMN 2? 
as_tibble(new_data)
new_data2 <- data[] # how would you subset ROW 2? 
as_tibble(new_data2)

Importing Datasets

## CSV
# Saved in the same folder
basic_df <- read.csv("depression_example_data.csv", stringsAsFactors = FALSE) # character strings will not be converted to factors
tibble_df <- read_csv("depression_example_data.csv") # reads as tibble

# Saved in different places
# Option 1 - Set working directory
getwd()
setwd("/Users/kareenadelrosario/Desktop/Local R Code/NewFolder")
read_csv("csvFileName.csv")

# Option 2 - Include file path
read_csv("/Users/kareenadelrosario/Desktop/Local R Code/NewFolder/csvFileName.csv")

# Option 3 - Choose file
read.csv(file.choose(), header = TRUE)

read_sav(file.choose()) # SPSS
read_sas(file.choose()) # SAS

# Option 4 - Use Menu
# file -> Import Dataset

Exporting Datasets

# base r
write.csv(data, "pathway")

# readr package has more export options and is slightly faster
write_csv(data, "pathway", na ="")

# haven packages allow you to export data as an SPSS or SAS file
write_sav()
write_sas()

BONUS: Want to upload your R script to Colab? Make it a python file using the code below:

# devtools::install_github("mkearney/rmd2jupyter")
library(rmd2jupyter)
rmd2jupyter("lab1_datamanivis_kdr.Rmd")