Working with data in R
There are different type of data in R. The smallest data is called
data element
, and when data element is gathered (with
orders), that chunk of data is called vector
.
Here are some examples.
Data element
alpha <- 1 # this is data element
alpha # If we call alpha, the computer will display the value assigned to alpha.
## [1] 1
Vecetor
To assign vector in R, we use c()
, with comma
,
as separators. For example:
beta <- c(1, 2) # this is a vector
beta
## [1] 1 2
gamma <- c(alpha, beta) # This is also a vector
gamma
## [1] 1 1 2
Remember vectors are ordered.
delta <- c(beta, alpha) # it is still a vector
delta
## [1] 1 2 1
When you call beta
in R, it will retrieve
data associated with beta
. So, how do we assign
characters (letters or categorical variable)? R recognizes
character values with ""
. For instance:
delta_2 <- c("beta", "alpha")
delta_2
## [1] "beta" "alpha"
With quotation marks, the R will recognize inputs as a new character data, not the data we assigned previously.
Data frame
So, what should we do with larger data sets, such as Excel spreadsheets?
Vectors in R can be combined into a data frame, as long as their lengths are the same.
This is called data frame in R, and it can be created using the
data.frame()
function.
epsilon <- data.frame(gamma, delta) # This is a data frame
# Note that the lengths of both gamma and delta are the same.
epsilon
## gamma delta
## 1 1 1
## 2 1 2
## 3 2 1
Selecting variables (vectors) in data frame
A data frame has columns, where each column represents one variable (vector).
We use the dollar sign $ to select a specific variable from a data frame.
epsilon$gamma
## [1] 1 1 2
epsilon$delta
## [1] 1 2 1
epsilon$gamma
returns the exact same vector as just
gamma
, which was used for constructing epsilon.
Now, how can we generate a larger dataset for analysis?
Data frame example
We can make data by adding multiple data elements to vectors, and we can bind those vectors to make one dataframe.
subject <- c("Joe", "Trump", "Obama", "George") # Assigning multiple data elements as a vector
height <- c(183, 190, 187, 182)
IsTall <- c("short", "tall", "tall", "short")
example_dataframe <- data.frame(subject, height, IsTall) # Combining 3 vectors into one data frame
example_dataframe # Displaying the example_data frame
## subject height IsTall
## 1 Joe 183 short
## 2 Trump 190 tall
## 3 Obama 187 tall
## 4 George 182 short
Loading Data
However, we won’t be typing all the data manually every time it’s
needed. To save time, we will directly load data frames from our
computer (most commonly .csv
, but we can also import
.xlsx
) using the read.csv()
function.
Here is an example using systolic blood pressure (SBP) data from NHISS.
dataset_sbp <- read.csv(file = "Inha/5_Lectures/2024/Advanced biostatistics/scripts/BTE3207_Advanced_Biostatistics/dataset/sbp_dataset_korea_2013-2014.csv")
dataset_sbp %>%
reactable::reactable(sortable = T)