R - Loading data

Mark Bounthavong

15 September 2025; Updated: 29 September 2025

Introduction

When using R for the first time, I had a hard time figuring out have to load my data for analysis. In my search, I found a couple of ways to load data into the R environment.

For this exercise, I used RStudio, which is an integrated development environment (IDE). In other words, it’s a nice user interface that allows you to use R in a clean and convenient environment.

Motivating example

Let’s download some data for this exercise.

We will use the Printed Circuit Board Processed Image data from the University of California Irvine Machine Learning Repository, which can be downloaded here.

The file name is TestPad_PCB_XYRGB_V2.csv and it will be in the *.csv file format.

Download it someplace where you can retrieve it. In this exercise, let’s download the data into the Downloads folder for convenience. After you download the data, unzip it in the Downloads folder.

I’ll walk you through two methods to load the data into R.

The first method will set the working directory and load a *.csv file format.

The second method will load a *.xlsx Excel file format.

Note:

Although I’m using a MacOS to do this exercise, the instructions will work with a Windows-based system.

Method 1 - Setting the working directory and loading a *.csv file format

The first method to load data in R involves selecting the working directory. The working directory is where you tell R where to go to for data for the current session. By setting the working directory, R will use it to locate and load data. (Note: When you start R, you are working in a session.)

Click on Session -> Set Working Directory -> Choose Directory then navigate to the Downloads folder. Select “Open” to open the Downloads folder. (Note: For this example, the data was saved in the Downloads folder. But for you may have the data saved in a different location. Navigate to that location using the Choose Directory feature.)

R will automatically input the code in the Console that will set the working director to the Downloads folder.

You can copy and paste this R code into your script file so that you can run this again without having to click through the tabs to locate and set the working directory.

Loading data by choosing the working directory

Loading data by choosing the working directory

Once the working directory is set, you can load your data using the read.csv() function.

You need to include the file name and the argument header = TRUE, which indicates that the first row contains the column labels.

## Set the working directory
setwd("~/Downloads")

## We will call this dataframe object `df1`
df1 <- read.csv("TestPad_PCB_XYRGB_V2.csv", header = TRUE)

## After the data has been loaded, you can view the first 6 rows:
head(df1)
##     X Y         R         G         B Grey
## 1 105 0 0.9098039 0.9764706 0.9372549    0
## 2 106 0 0.7921569 0.9019608 0.8431373    0
## 3 107 0 0.6313725 0.7882353 0.6941176    0
## 4 108 0 0.4745098 0.6705882 0.5568627    0
## 5 109 0 0.3411765 0.5843137 0.4392157    0
## 6 110 0 0.2745098 0.5411765 0.3882353    0

The data has been loaded and assigned to a dataframe object called df1. You can use any name to your liking. I personally like to keep the name of the dataframe short so that it is easier to type.

Method 2 - Loading a *.xlsx Excel file format

In the above example, we loaded the TestPad_PCB_XYRGB_V2.csv data into R. Fortunately, this data was in the *.csv format, which allowed us to use the read.csv() function to load the data into the R environment.

What if the data was in an Excel format (e.g., *.xlsx)? How would we load the data?

You can open the data in Excel and then save it as a *.csv file. This is the easiest workaround.

Or, you can use the openxlsx package. You can read about the openxlsx package on its GitHub site here.

You will need to install the openxlsx package before you can use it in R.

Once you install the package, you cal load it using the library() function.

After the openxlsx package has been loaded, you can load the data into the R environment using the read.xlsx() function. Make sure to include the argument colNames = TRUE to let R know that the first row contains the column labels.

(Note: For this exercise, I created the TestPad_PCB_XYRGB_V2.xlsx as an *.xlsx file.)

## Install the `openxlsx` package (You only need to do this once)
## install.packages("openxlsx")

## Load the library `openxlsx` package (You need to do this each time you start `R`)
library("openxlsx")

## Set the working directory
setwd("~/Downloads")

## Load the data as a dataframe `df2`
df2 <- read.xlsx("TestPad_PCB_XYRGB_V2.xlsx", colNames = TRUE)

## After the data has been loaded, you can view the first 6 rows:
head(df2)
##     X Y         R         G         B Grey
## 1 105 0 0.9098039 0.9764706 0.9372549    0
## 2 106 0 0.7921569 0.9019608 0.8431373    0
## 3 107 0 0.6313725 0.7882353 0.6941176    0
## 4 108 0 0.4745098 0.6705882 0.5568627    0
## 5 109 0 0.3411765 0.5843137 0.4392157    0
## 6 110 0 0.2745098 0.5411765 0.3882353    0

The data is now loaded in the R environment as a dataframe object with the name df2. You can now use this dataframe for your analyses.

Final thoughts

I reviewed how you can load data that are saved as a *.csv and *.xlsx file format by setting the working directory. However, there are many data formats that R can load into its environment. Chances are there is also a package that will allow you to load that data type into R.

When you have a data format that is not a *.csv or *.xlsx file, don’t worry. Try to search on online for an R package that will load the data into its environment. I’m confident that you will figure things out after going through this exercise.

Acknowledgements

I got the data used in this exercise from the University of California Irvine Machine Learning Repository. There are a ton of data in their repository, and I highly encourage you to explore its contents.

To learn more about the openxlsx package, you can review the source files on its GitHub site.

Disclaimer and Disclosures

This is a work in progress; thus, I may update this in the future.

Any errors or mistakes are my fault, and I’ll try my best to fix them.

Lastly, this is for educational purposes only.