This activity is focused on creating and using data frames in
R
. A data frame is a collection of columns containing data,
similar to a spreadsheet or SQL table. Data frames are one of the basic
tools you will use to work with data in R
. And you can
create data frames from different data sources.
There are three common sources for data:
package
with data that can be accessed by loading
that package
R
R
codeWherever data comes from, you will almost always want to store it in a data frame object to work with it. Now, you can start creating and exploring data frames with the code chunks in the RMD space. To interact with the code chunk, click the green arrow in the top-right corner of the chunk. The executed code will appear in the RMD space and your console.
Throughout this activity, you will also have the opportunity to practice writing your own code by making changes to the code chunks yourself. If you encounter an error or get stuck, you can always check the Lesson2_Dataframe_Solutions .rmd file in the Solutions folder under Week 3 for the complete, correct code.
Start by installing the required package; in this case, you will want
to install tidyverse
. If you have already installed and
loaded tidyverse
in this session, feel free to skip the
code chunks in this step.
#{r} #install.packages("tidyverse") #
Once a package is installed, you can load it by running the
library()
function with the package name inside the
parentheses:
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Sometimes you will need to generate a data frame directly in
R
. There are a number of ways to do this; one of the most
common is to create individual vectors of data and then combine them
into a data frame using the data.frame()
function.
Here’s how this works. First, create a vector of names by inserting four names into this code block between the quotation marks and then run it:
<- c("Gerardo", "Gerardo Jr", "Gerardo Alexandre", "Rosita") names
Then create a vector of ages by adding four ages separated by commas to the code chunk below. Make sure you are inputting numeric values for the ages or you might get an error.
<- c(76,53 ,12 ,73 ) age
With these two vectors, you can create a new data frame called
people
:
<- data.frame(names, age) people
Now that you have this data frame, you can use some different functions to inspect it.
One common function you can use to preview the data is the
head()
function, which returns the columns and the first
several rows of data. You can check out how the head()
function works by running the chunk below:
head(people)
## names age
## 1 Gerardo 76
## 2 Gerardo Jr 53
## 3 Gerardo Alexandre 12
## 4 Rosita 73
In addition to head()
, there are a number of other
useful functions to summarize or preview your data. For example, the
str()
and glimpse()
functions will both
provide summaries of each column in your data arranged horizontally. You
can check out these two functions in action by running the code chunks
below:
str(people)
## 'data.frame': 4 obs. of 2 variables:
## $ names: chr "Gerardo" "Gerardo Jr" "Gerardo Alexandre" "Rosita"
## $ age : num 76 53 12 73
glimpse(people)
## Rows: 4
## Columns: 2
## $ names <chr> "Gerardo", "Gerardo Jr", "Gerardo Alexandre", "Rosita"
## $ age <dbl> 76, 53, 12, 73
You can also use colnames()
to get a list the column
names in your data set. Run the code chunk below to check out this
function:
colnames(people)
## [1] "names" "age"
Now that you have a data frame, you can work with it using all of the
tools in R
. For example, you could use
mutate()
if you wanted to create a new variable that would
capture each person’s age in twenty years. The code chunk below creates
that new variable:
mutate(people, age_in_20 = age + 20)
## names age age_in_20
## 1 Gerardo 76 96
## 2 Gerardo Jr 53 73
## 3 Gerardo Alexandre 12 32
## 4 Rosita 73 93
To get more familiar with creating and using data frames, use the code chunks below to create your own custom data frame.
First, create a vector of any five different fruits. You can type directly into the code chunk below; just place your cursor in the box and click to type. Once you have input the fruits you want in your data frame, run the code chunk.
<- c("Banana","Manga", "Morango", "Pêra", "Maça") fruits
Now, create a new vector with a number representing your own personal
rank for each fruit. Give a 1 to the fruit you like the most, and a 5 to
the fruit you like the least. Remember, the scores need to be in the
same order as the fruit above. So if your favorite fruit is last in the
list above, the score 1
needs to be in the last position in
the list below. Once you have input your rankings, run the code
chunk.
<- c(2,1,3,5,4) range_fruits
Finally, combine the two vectors into a data frame. You can call it
fruit_ranks
. Edit the code chunk below and run it to create
your data frame.
<- data.frame (fruits, range_fruits) fruis_ranged
After you run this code chunk, it will create a data frame with your fruits and rankings.
In this activity, you learned how to create data frames, view them
with summary functions like head()
and
glimpse()
, and then made changes with the
mutate()
function. You can continue practicing these skills
by modifying the code chunks in the rmd file, or use this code as a
starting point in your own project console. As you explore data frames,
consider how they are similar and different to the tables you have
worked with in other data analysis tools like spreadsheets and SQL. Data
frames are one of the most basic building blocks you will need to work
with data in R
. So understanding how to create and work
with data frames is an important first step to analyzing data.
Make sure to mark this activity as complete in Coursera.