First, let’s learn how to bring our data into R. Of course, this assumes we have data! If you don’t have data to use, you can always download free sample data from Kaggle or similar websites. Today, we will use the palmerpenguins dataset.
First, we will install any load any necessary packages. If you need a refresher for how to do this, please visit the section on Installing and Loading Packages in R.
You’ll see that I always load tidyverse, here, and janitor. These are my favorite packages and always load them. See more about each of these packages in the Package Highlights section.
Load libraries:
library(tidyverse)
library(here)
library(janitor)
library(readxl)
library(writexl)
library(palmerpenguins)
The palmer penguins dataset comes automatically loaded when we load the package. However, I want to show you all how to import data! To do that, I’ll first write the data to my local files and then import it back here.
First, let’s take a look at the data. We can call “penguins” to reference the data.
print(penguins)
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_…¹ body_…² sex year
## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
## 4 Adelie Torgersen NA NA NA NA <NA> 2007
## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
## 7 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007
## 8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
## 9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007
## 10 Adelie Torgersen 42 20.2 190 4250 <NA> 2007
## # … with 334 more rows, and abbreviated variable names ¹flipper_length_mm,
## # ²body_mass_g
Let’s save the dataset as an object in our environment. We can see there are 344 rows and 8 columns.
data <- penguins
Let’s export the data first as a CSV and then as an Excel file so we can see how to export and import data
As a CSV using the write_csv() function and the here package. I am telling the computer to take the object “data” and then (denoted by %>%) writing the csv to a specific pathway. The here package allows us to quickly reference pathways within a project folder instead of needing to set the working directory and full directory pathway each time we restart R. This is especially helpful when we share code through GitHub repositories. Please check the here package information in the Package Highlights section (Coming soon!).
data %>%
write_csv(here("Learning_R",
"2_Data_Cleaning",
"2A_Importing_Data",
"Data",
"penguins.csv")) # everything up to "Data" is the pathway and "penguins.csv" is the filename I am saving the data under
We can also write it as an Excel file using the write_xlsx() function from the writexl package.
data %>%
write_xlsx(here("Learning_R",
"2_Data_Cleaning",
"2A_Importing_Data",
"Data",
"penguins.xlsx")) # make sure to change the file type in the file name!
We can read the data into R by using the opposite of “write”… “read”!
First, we can bring in the CSV file using read_csv(). We want to save it as an object with a unique name so that it is in our environment.
# the arrow "<-" assigns this to a new object named "data_csv"
data_csv <- read_csv(here("Learning_R",
"2_Data_Cleaning",
"2A_Importing_Data",
"Data",
"penguins.csv"))
And bring in the Excel file using read_xlsx()
data_xlsx <- read_xlsx(here("Learning_R",
"2_Data_Cleaning",
"2A_Importing_Data",
"Data",
"penguins.xlsx"))
If the Excel file you are importing has more than one sheet, you can specify the sheets you want to read in using the argument “sheet”
data_xlsx_sheet <-read_xlsx(here("Learning_R",
"2_Data_Cleaning",
"2A_Importing_Data",
"Data",
"penguins.xlsx"), sheet = "Sheet1")
Thank you and please continue on to the next section to learn more :)
Any questions? Email us at justaladycoder@gmail.com