This RPubs document shows the code used in “R programming for ABSOLUTE beginners,” an excellent introduction to R code from Dr. Greg Martin’s R Programming 101 YouTube channel. See the video at: https://youtu.be/FY8BISK5DpM?si=TWpU_N_q6y6UAD--
5 + 6
## [1] 11
a <- 5
b <- 6
a + b
## [1] 11
sum(a,b)
## [1] 11
name <- c("Greg", "Gill")
name
## [1] "Greg" "Gill"
name <- c("Greg", "Paul", "Kim")
age <- c(47,52,34)
gender <- c("M","M","F")
name
## [1] "Greg" "Paul" "Kim"
age
## [1] 47 52 34
gender
## [1] "M" "M" "F"
I used the head() function to show the data frame in the output here. Martin simply clicks on the data frame in RStudio and shows it to you there.
friends <- data.frame(name, age, gender)
head(friends)
## name age gender
## 1 Greg 47 M
## 2 Paul 52 M
## 3 Kim 34 F
An aside: One concept Martin doesn’t explain: A data frame is a grid. Horizontal sections of the grid are called “rows.” Vertical sections of the grid are called “columns.” Each data point gets stored in the intersection of a row and a column. In the friends data frame, for example, Martin’s first name, “Greg,” is stored in the intersection of the first row and the first column.
Another aside: When you are using a spreadsheet, like Excel, these intersections are called “cells.” Each has a “cell address,” like A1 for the cell at the intersection of Column A and Row 1. Most of what you do in a spreadsheet involves working with data in cells. R is designed to work with whole columns at once. It’s a little different approach. But, as you’ll see, it’s often much, much faster.
One last aside: In data journalism, it’s rare to build a data frame manually like this. Most of the time, you import data that someone else has already assembled.
Selecting various pieces of the data frame using base R code. Note: This is the hard way to do things. It comes in handy sometimes. But you’ll see an easier way in a moment.
# Show all rows in the data frame's "name" column:
friends$name
## [1] "Greg" "Paul" "Kim"
# Show all rows and columns in the data frame
friends[ , ]
## name age gender
## 1 Greg 47 M
## 2 Paul 52 M
## 3 Kim 34 F
# Show all columns in the data frame's first row
friends[1, ]
## name age gender
## 1 Greg 47 M
# Show the first row of the first column
friends[1,1]
## [1] "Greg"
# Show rows 1 through 3 of the first column (Same result as friends$name)
friends[1:3,1]
## [1] "Greg" "Paul" "Kim"
# Show column 1 of the first three rows (Same result as friends[1, ])
friends[1,1:3]
## name age gender
## 1 Greg 47 M
# Show all rows in the first two columns for which age is less than 50
friends[friends$age<50,1:2]
## name age
## 1 Greg 47
## 3 Kim 34
A much easier way to select pieces of the data frame: Use the tidyverse package. The tidyverse package makes many other things in R easier to do, too.
if (!require("tidyverse"))
install.packages("tidyverse")
library(tidyverse)
friends %>%
select(name,age) %>%
filter(age < 50) %>%
arrange(age)
## name age
## 1 Kim 34
## 2 Greg 47