Creating a Dataframe

Step 1: Load packages

Start by installing the required package; in this case, you will want to install tidyverse. If you have already installed and loaded tidyverse in this session, feel free to skip the code chunks in this step.

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Step 2: Create data frame

Sometimes you will need to generate a data frame directly in R. There are a number of ways to do this; one of the most common is to create individual vectors of data and then combine them into a data frame using the data.frame() function.

Here’s how this works. First, create a vector of names:

names <- c("Peter", "Jennifer", "Julie", "Alex")

Then create a vector of ages:

age <- c(15, 19, 21, 25)

With these two vectors, you can create a new data frame called people:

people <- data.frame(names, age)

Step 3: inspect the data frame

Now that you have this data frame, you can use some different functions to inspect it.

One common function you can use to preview the data is the head() function, which returns the columns and the first several rows of data. You can check out how the head() function works by running the chunk below:

head(people)

##      names age
## 1    Peter  15
## 2 Jennifer  19
## 3    Julie  21
## 4     Alex  25

In addition to head(), there are a number of other useful functions to summarize or preview your data. For example, the str() and glimpse() functions will both provide summaries of each column in your data arranged horizontally. You can check out these two functions in action by running the code chunks below:

str(people)

## 'data.frame':    4 obs. of  2 variables:
##  $ names: chr  "Peter" "Jennifer" "Julie" "Alex"
##  $ age  : num  15 19 21 25

glimpse(people)

## Rows: 4
## Columns: 2
## $ names <chr> "Peter", "Jennifer", "Julie", "Alex"
## $ age   <dbl> 15, 19, 21, 25

You can also use colnames() to get a list the column names in your data set. Run the code chunk below to check out this function:

colnames(people)

## [1] "names" "age"

Now that you have a data frame, you can work with it using all of the tools in R. For example, you could use mutate() if you wanted to create a new variable that would capture each person’s age in twenty years. The code chunk below creates that new variable:

mutate(people, age_in_20 = age + 20)

##      names age age_in_20
## 1    Peter  15        35
## 2 Jennifer  19        39
## 3    Julie  21        41
## 4     Alex  25        45

Step 4: Try it yourself

To get more familiar with creating and using data frames, use the code chunks below to create your own custom data frame.

First, create a vector of any five different fruits. You can type directly into the code chunk below; just place your cursor in the box and click to type. Once you have input the fruits you want in your data frame, run the code chunk.

fruit <- c("Lemon", "Blueberry", "Grapefruit", "Mango", "Strawberry")

Now, create a new vector with a number representing your own personal rank for each fruit. Give a 1 to the fruit you like the most, and a 5 to the fruit you like the least. Remember, the scores need to be in the same order as the fruit above. So if your favorite fruit is last in the list above, the score 1 needs to be in the last position in the list below. Once you have input your rankings, run the code chunk.

rank <- c(4, 2, 5, 3, 1)

Finally, combine the two vectors into a data frame. You can call it fruit_ranks. Edit the code chunk below and run it to create your data frame.

fruit_ranks <- data.frame(fruit, rank)

After you run this code chunk, it will create a data frame with your fruits and rankings.