Introduction to R: Vectors & Dataframes

Vectors

Creating Vectors

To create a vector, we concatenate (combine) values using the c() function. First, let’s create some objects (named variables)

x <- 2
z <- 4

We can combine the x and z variables to create a vector named b.

b <- c(x, z)
b

## [1] 2 4

We can add the valuesh values 5 and 8 to the end of the b vector (overwriting the original b vector)

b <- c(b, 5, 8)
b

## [1] 2 4 5 8

We can create vectors of consecutive integers using the notation min_num:max_num.

1:10

##  [1]  1  2  3  4  5  6  7  8  9 10

Let’s add the numbers 1 through 3 to the beginning of the b vector.

b <- c(1:3, b)
b

## [1] 1 2 3 2 4 5 8

Describing Vectors

To find out the type of values in a vector, we can use the class() function.

class(b)

## [1] "numeric"

To find out the number of elements in a vector, we can use the length() function.

length(b)

## [1] 7

To check if an object is a vector, we can use the is.vector() function.

is.vector(b)

## [1] TRUE

Indexing & Subsetting Vectors

We can view specific elements of our vector by indexing, using [ ]. To view individual elements of a vector, we would use:

b[4] # obtain value in 4th index position

## [1] 2

We can view consecutive elements using the min_num:max_num syntax

b[2:3] # obtain values in 2nd and 3rd index position

## [1] 2 3

We can view nonconsecutive elements using the c() function

b[c(1, 3, 5)] # values in the 1st, 3rd and 5th index position

## [1] 1 3 4

We can check if values in a vector meet criteria using relational operators.

b > 3

## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

We can use indexing to find out which of the values in a vector meet the specified criteria

b[b > 3]

## [1] 4 5 8

Dataframes

Creating Dataframes

Dataframes are the most common R object that we will work with. Vectors of the same length can be comined into dataframes, which have row and column dimensions. Dataframes are the default object for imported data.

First, let’s create 4 vectors with 5 values each

store_num <- c(3, 14, 21, 32, 54)
store_rev <- c(543, 654, 345, 678, 234)
store_visits <- c(45, 78, 32, 56, 34)
store_manager <- c("Chelsey","Jorge","Henry","Jade","Carlota")

To combine the vectors into a dataframe, we use the data.frame() function. We set stringsAsFactors = FALSE to keep the store_manager column as a character variable, instead of a factor, which is a special type of variable created for categorical variables.

store_df <- data.frame(store_num,store_rev,store_visits,
                       store_manager,
                       stringsAsFactors = FALSE)

For small dataframes, we can view the dataframe as output in the console by running a code line with the name of the dataframe. We can also view the dataframe as a table/spreadsheet using the View() function.

store_df

##   store_num store_rev store_visits store_manager
## 1         3       543           45       Chelsey
## 2        14       654           78         Jorge
## 3        21       345           32         Henry
## 4        32       678           56          Jade
## 5        54       234           34       Carlota

For larger dataframes, we will typically inspect the first few and last few observations using the head() and tail() functions (default, n = 6).

head(x = store_df, n = 2) # output first 2 rows

##   store_num store_rev store_visits store_manager
## 1         3       543           45       Chelsey
## 2        14       654           78         Jorge

tail(x = store_df, n = 3) # output last 3 rows

##   store_num store_rev store_visits store_manager
## 3        21       345           32         Henry
## 4        32       678           56          Jade
## 5        54       234           34       Carlota

Describing Dataframes

We can view the structure of our dataframe using the str() function.

str(store_df)

## 'data.frame':    5 obs. of  4 variables:
##  $ store_num    : num  3 14 21 32 54
##  $ store_rev    : num  543 654 345 678 234
##  $ store_visits : num  45 78 32 56 34
##  $ store_manager: chr  "Chelsey" "Jorge" "Henry" "Jade" ...

We can view the dimensions of our dataframe

nrow(store_df) # number of rows

## [1] 5

ncol(store_df) # number of columns

## [1] 4

We can also view the row and column names as vectors

rownames(store_df) # obtain row names

## [1] "1" "2" "3" "4" "5"

colnames(store_df) # obtain column names

## [1] "store_num"     "store_rev"     "store_visits"  "store_manager"

Sometimes, we will need to change the column (or row) names. We can do this by assigning new values to the colnames(). Below, we change the column name of the first column (store_num) to “Store_Number”.

colnames(store_df)[1] <- "Store_Number"

Indexing & Subsetting Dataframes

We can view specific elements of our dataframe by indexing, using the syntax df_name[row_dim, col_dim].

To view specific rows, we specify the row_dim by row index number or name (if row names differ from index)

store_df[1, ] # output 1st row

##   Store_Number store_rev store_visits store_manager
## 1            3       543           45       Chelsey

store_df[2:3, ] # output 2nd and 3rd rows (consecutive rows)

##   Store_Number store_rev store_visits store_manager
## 2           14       654           78         Jorge
## 3           21       345           32         Henry

store_df[c(1, 4), ] # output 1st and 4th rows (nonconsecutive rows)

##   Store_Number store_rev store_visits store_manager
## 1            3       543           45       Chelsey
## 4           32       678           56          Jade

To view specific columns, we specify the col_dim by column index number or name.

By Number:

store_df[ ,3] # output 3rd column

## [1] 45 78 32 56 34

store_df[ ,2:4] # output 2nd, 3rd and 4th column (consecutive columns)

##   store_rev store_visits store_manager
## 1       543           45       Chelsey
## 2       654           78         Jorge
## 3       345           32         Henry
## 4       678           56          Jade
## 5       234           34       Carlota

store_df[ ,c(2, 4)] # output 2nd and 4th column (nonconsecutive columns)

##   store_rev store_manager
## 1       543       Chelsey
## 2       654         Jorge
## 3       345         Henry
## 4       678          Jade
## 5       234       Carlota

By Name:

store_df[ ,"store_rev"] # output store_rev column

## [1] 543 654 345 678 234

store_df[ ,c("Store_Number", "store_manager")] # output Store_Number

##   Store_Number store_manager
## 1            3       Chelsey
## 2           14         Jorge
## 3           21         Henry
## 4           32          Jade
## 5           54       Carlota

                                               # and store_manager

We can also view specific columns by using the dollar sign operator ($) and the syntax dataframe_name$column_name

store_df$store_visits

## [1] 45 78 32 56 34

To view values in specific rows and columns, we specify the row_dim and col_dim

store_df[c(1,3), 2] # output values in 1st and 3rd row of

## [1] 543 345

                    # column 2

store_df[c(1:2, 5), 
         c("store_rev", "store_visits")]

##   store_rev store_visits
## 1       543           45
## 2       654           78
## 5       234           34

We can create or view subsets based on rows/columns meeting conditions using relational operators.

store_df[store_df$store_visits < 50, ]

##   Store_Number store_rev store_visits store_manager
## 1            3       543           45       Chelsey
## 3           21       345           32         Henry
## 5           54       234           34       Carlota

store_df$store_rev[store_df$store_visits >= 56]

## [1] 654 678

We can also use the subset() function to create subsets of data. The arguments of the function include: x: the dataframe to create the subset from, subset: The condition to subset on (if applicable), select: The columns to include (if select argument empty, all columns are included).

storvis32 <- subset(x = store_df,
                    subset = store_visits > 32,
                    select = c(Store_Number, store_rev, store_manager))
storvis32

##   Store_Number store_rev store_manager
## 1            3       543       Chelsey
## 2           14       654         Jorge
## 4           32       678          Jade
## 5           54       234       Carlota