To create a vector, we concatenate (combine) values using the c()
function. First, let’s create some objects (named variables)
x <- 2
z <- 4
We can combine the x
and z
variables to create a vector named b
.
b <- c(x, z)
b
## [1] 2 4
We can add the valuesh values 5 and 8 to the end of the b
vector (overwriting the original b
vector)
b <- c(b, 5, 8)
b
## [1] 2 4 5 8
We can create vectors of consecutive integers using the notation min_num:max_num
.
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
Let’s add the numbers 1 through 3 to the beginning of the b
vector.
b <- c(1:3, b)
b
## [1] 1 2 3 2 4 5 8
To find out the type of values in a vector, we can use the class()
function.
class(b)
## [1] "numeric"
To find out the number of elements in a vector, we can use the length()
function.
length(b)
## [1] 7
To check if an object is a vector, we can use the is.vector()
function.
is.vector(b)
## [1] TRUE
We can view specific elements of our vector by indexing, using [ ]
. To view individual elements of a vector, we would use:
b[4] # obtain value in 4th index position
## [1] 2
We can view consecutive elements using the min_num:max_num
syntax
b[2:3] # obtain values in 2nd and 3rd index position
## [1] 2 3
We can view nonconsecutive elements using the c()
function
b[c(1, 3, 5)] # values in the 1st, 3rd and 5th index position
## [1] 1 3 4
We can check if values in a vector meet criteria using relational operators.
b > 3
## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
We can use indexing to find out which of the values in a vector meet the specified criteria
b[b > 3]
## [1] 4 5 8
Dataframes are the most common R object that we will work with. Vectors of the same length can be comined into dataframes, which have row and column dimensions. Dataframes are the default object for imported data.
First, let’s create 4 vectors with 5 values each
store_num <- c(3, 14, 21, 32, 54)
store_rev <- c(543, 654, 345, 678, 234)
store_visits <- c(45, 78, 32, 56, 34)
store_manager <- c("Chelsey","Jorge","Henry","Jade","Carlota")
To combine the vectors into a dataframe, we use the data.frame()
function. We set stringsAsFactors = FALSE
to keep the store_manager
column as a character variable, instead of a factor, which is a special type of variable created for categorical variables.
store_df <- data.frame(store_num,store_rev,store_visits,
store_manager,
stringsAsFactors = FALSE)
For small dataframes, we can view the dataframe as output in the console by running a code line with the name of the dataframe. We can also view the dataframe as a table/spreadsheet using the View()
function.
store_df
## store_num store_rev store_visits store_manager
## 1 3 543 45 Chelsey
## 2 14 654 78 Jorge
## 3 21 345 32 Henry
## 4 32 678 56 Jade
## 5 54 234 34 Carlota
For larger dataframes, we will typically inspect the first few and last few observations using the head()
and tail()
functions (default, n = 6
).
head(x = store_df, n = 2) # output first 2 rows
## store_num store_rev store_visits store_manager
## 1 3 543 45 Chelsey
## 2 14 654 78 Jorge
tail(x = store_df, n = 3) # output last 3 rows
## store_num store_rev store_visits store_manager
## 3 21 345 32 Henry
## 4 32 678 56 Jade
## 5 54 234 34 Carlota
We can view the structure of our dataframe using the str()
function.
str(store_df)
## 'data.frame': 5 obs. of 4 variables:
## $ store_num : num 3 14 21 32 54
## $ store_rev : num 543 654 345 678 234
## $ store_visits : num 45 78 32 56 34
## $ store_manager: chr "Chelsey" "Jorge" "Henry" "Jade" ...
We can view the dimensions of our dataframe
nrow(store_df) # number of rows
## [1] 5
ncol(store_df) # number of columns
## [1] 4
We can also view the row and column names as vectors
rownames(store_df) # obtain row names
## [1] "1" "2" "3" "4" "5"
colnames(store_df) # obtain column names
## [1] "store_num" "store_rev" "store_visits" "store_manager"
Sometimes, we will need to change the column (or row) names. We can do this by assigning new values to the colnames()
. Below, we change the column name of the first column (store_num
) to “Store_Number”.
colnames(store_df)[1] <- "Store_Number"
We can view specific elements of our dataframe by indexing, using the syntax df_name[row_dim, col_dim]
.
To view specific rows, we specify the row_dim by row index number or name (if row names differ from index)
store_df[1, ] # output 1st row
## Store_Number store_rev store_visits store_manager
## 1 3 543 45 Chelsey
store_df[2:3, ] # output 2nd and 3rd rows (consecutive rows)
## Store_Number store_rev store_visits store_manager
## 2 14 654 78 Jorge
## 3 21 345 32 Henry
store_df[c(1, 4), ] # output 1st and 4th rows (nonconsecutive rows)
## Store_Number store_rev store_visits store_manager
## 1 3 543 45 Chelsey
## 4 32 678 56 Jade
To view specific columns, we specify the col_dim by column index number or name.
By Number:
store_df[ ,3] # output 3rd column
## [1] 45 78 32 56 34
store_df[ ,2:4] # output 2nd, 3rd and 4th column (consecutive columns)
## store_rev store_visits store_manager
## 1 543 45 Chelsey
## 2 654 78 Jorge
## 3 345 32 Henry
## 4 678 56 Jade
## 5 234 34 Carlota
store_df[ ,c(2, 4)] # output 2nd and 4th column (nonconsecutive columns)
## store_rev store_manager
## 1 543 Chelsey
## 2 654 Jorge
## 3 345 Henry
## 4 678 Jade
## 5 234 Carlota
By Name:
store_df[ ,"store_rev"] # output store_rev column
## [1] 543 654 345 678 234
store_df[ ,c("Store_Number", "store_manager")] # output Store_Number
## Store_Number store_manager
## 1 3 Chelsey
## 2 14 Jorge
## 3 21 Henry
## 4 32 Jade
## 5 54 Carlota
# and store_manager
We can also view specific columns by using the dollar sign operator ($
) and the syntax dataframe_name$column_name
store_df$store_visits
## [1] 45 78 32 56 34
To view values in specific rows and columns, we specify the row_dim and col_dim
store_df[c(1,3), 2] # output values in 1st and 3rd row of
## [1] 543 345
# column 2
store_df[c(1:2, 5),
c("store_rev", "store_visits")]
## store_rev store_visits
## 1 543 45
## 2 654 78
## 5 234 34
We can create or view subsets based on rows/columns meeting conditions using relational operators.
store_df[store_df$store_visits < 50, ]
## Store_Number store_rev store_visits store_manager
## 1 3 543 45 Chelsey
## 3 21 345 32 Henry
## 5 54 234 34 Carlota
store_df$store_rev[store_df$store_visits >= 56]
## [1] 654 678
We can also use the subset()
function to create subsets of data. The arguments of the function include: x
: the dataframe to create the subset from, subset
: The condition to subset on (if applicable), select
: The columns to include (if select argument empty, all columns are included).
storvis32 <- subset(x = store_df,
subset = store_visits > 32,
select = c(Store_Number, store_rev, store_manager))
storvis32
## Store_Number store_rev store_manager
## 1 3 543 Chelsey
## 2 14 654 Jorge
## 4 32 678 Jade
## 5 54 234 Carlota