The RMarkdown file can be downloaded here: http://github.com/kareena-delrosario/regression_rcode
Our introduction to R will be divided into 3 different subsections (orienting yourself to R, data manipulation, data visualization)
Overview of Section 1
1. Different R Platforms
2. R Grammar
3. Installing and Loading Libraries
4. Using Functions
5. Importing datasets
This is the basic format for RStudio. The output goes to the console. Plots output to the window in the righthand corner. ‘#’ comments code.
Integrates R code with narrative text. Output and plots appear directly under the code. Can be “knitted” to output an easy-to-read Word, PDF, or HTML. Code is placed in separate blocks (shortcut: CMD + Option + i), which can be modified to skip the code or hide the code in the final output. Can create an outline using ‘#’ outside of the code blocks.
Functions like a Google Doc. Multiple users can access it at a time and does not require R to be installed.
# these are vectors
# (basically, strings of values. always need to add 'c' if there is more than 1 value)
x <- c('a', 'b', 'c')
y <- c('c', 'b', 'a')
- What's up with the <-?
If you want to refer to something you created in R, whether that is a dataframe, vector, or whatever, you HAVE to save it to the environment (to the right) using "<-"
## [1] TRUE TRUE TRUE
## [1] TRUE TRUE TRUE
## [1] 6
# Load Packages
library(dplyr)
# One Way to Load Multiple Packages
pkgs <- c("psych",
"tidyr",
"tidyverse",
"dplyr",
"haven",
"lm.beta",
"car",
"skimr",
"janitor",
"labelled",
"expss",
"foreign")
lapply(pkgs, library, character.only = TRUE)
## [1] ".GlobalEnv" "package:foreign" "package:expss"
## [4] "package:maditr" "package:labelled" "package:janitor"
## [7] "package:skimr" "package:car" "package:carData"
## [10] "package:lm.beta" "package:haven" "package:lubridate"
## [13] "package:forcats" "package:stringr" "package:purrr"
## [16] "package:readr" "package:tibble" "package:ggplot2"
## [19] "package:tidyverse" "package:tidyr" "package:psych"
## [22] "package:dplyr" "package:stats" "package:graphics"
## [25] "package:grDevices" "package:utils" "package:datasets"
## [28] "package:methods" "Autoloads" "package:base"
In R, there are lots of ways you can do the same thing. One example is how to call a dataframe and the variables within it. We can either enter the dataframe name into each command or we can use dplyr’s pipe (this may depend on the package).
Here’s an example using R’s built
in dataframe called ‘iris’:
In this exercise, we want to get the average Sepal.Length and Sepal.Width of the setosa species.
df$variable (specify dataframe and variable within the dataframe)
## [1] setosa setosa setosa setosa setosa setosa
## [7] setosa setosa setosa setosa setosa setosa
## [13] setosa setosa setosa setosa setosa setosa
## [19] setosa setosa setosa setosa setosa setosa
## [25] setosa setosa setosa setosa setosa setosa
## [31] setosa setosa setosa setosa setosa setosa
## [37] setosa setosa setosa setosa setosa setosa
## [43] setosa setosa setosa setosa setosa setosa
## [49] setosa setosa versicolor versicolor versicolor versicolor
## [55] versicolor versicolor versicolor versicolor versicolor versicolor
## [61] versicolor versicolor versicolor versicolor versicolor versicolor
## [67] versicolor versicolor versicolor versicolor versicolor versicolor
## [73] versicolor versicolor versicolor versicolor versicolor versicolor
## [79] versicolor versicolor versicolor versicolor versicolor versicolor
## [85] versicolor versicolor versicolor versicolor versicolor versicolor
## [91] versicolor versicolor versicolor versicolor versicolor versicolor
## [97] versicolor versicolor versicolor versicolor virginica virginica
## [103] virginica virginica virginica virginica virginica virginica
## [109] virginica virginica virginica virginica virginica virginica
## [115] virginica virginica virginica virginica virginica virginica
## [121] virginica virginica virginica virginica virginica virginica
## [127] virginica virginica virginica virginica virginica virginica
## [133] virginica virginica virginica virginica virginica virginica
## [139] virginica virginica virginica virginica virginica virginica
## [145] virginica virginica virginica virginica virginica virginica
## Levels: setosa versicolor virginica
# Filtering the data for species 'setosa'
filtered_data <- subset(iris, Species == "setosa")
# Selecting specific columns
selected_data <- filtered_data[,c("Sepal.Length", "Sepal.Width", "Species")]
# Grouping the data ('data of interest' ~grouped_by 'Species', the data, function)
final_result1 <- aggregate(cbind(Sepal.Length, Sepal.Width) ~ Species, selected_data, mean)
print(final_result1)
## Species Sepal.Length Sepal.Width
## 1 setosa 5.006 3.428
%>% (pipe data into functions; shortcut = CMD + Shift + M)
library(dplyr)
final_result2 <- iris %>%
filter(Species == "setosa") %>% # Filtering the data for species 'setosa'
select(Sepal.Length, Sepal.Width, Species) %>% # Selecting specific columns
group_by(Species) %>% # not necessary but guarantees it's kept in output
dplyr::summarize(average_sepal_length = mean(Sepal.Length),
average_sepal_width = mean(Sepal.Width))
print(final_result2)
## # A tibble: 1 × 3
## Species average_sepal_length average_sepal_width
## <fct> <dbl> <dbl>
## 1 setosa 5.01 3.43
## CSV
# Saved in the same folder
basic_df <- read.csv("depression_example_data.csv", stringsAsFactors = FALSE) # character strings will not be converted to factors
tibble_df <- read_csv("depression_example_data.csv") # reads as tibble
# Saved in different places
# Option 1 - Set working directory
getwd()
setwd("/Users/kareenadelrosario/Desktop/Local R Code/NewFolder")
read_csv("csvFileName.csv")
# Option 2 - Include file path
read_csv("/Users/kareenadelrosario/Desktop/Local R Code/NewFolder/csvFileName.csv")
# Option 3 - Choose file
read.csv(file.choose(), header = TRUE)
read_sav(file.choose()) # SPSS
read_sas(file.choose()) # SAS
# Option 4 - Use Menu
# file -> Import Dataset