Background
Recently, my colleague contacted me to assist another colleague who was having trouble loading data stored on a Google drive account into R. I have never thought about using Google drive as a place to store data and then load it into the R environment. Normally, I store and load data from GitHub, but there are some limitations, particularly when the dataset is very large. Google drive might be an easy workaround to this limitation, so I decided to figure out how to make this work. (Note: There are ways to store large datasets on GitHub, and I’ll explore these methods in a future tutorial.)
There are a lot of security issues that have to be addressed before loading data from a Google drive account. You should make sure to have two-factor authentication turned on and generate a strong password for your Google account. Once you’ve done these steps, it’s time to begin loading data from Google drive into the R environment.
Note: I created a new Google account for this tutorial given that I may want to use this for future R projects.
Note: I used RStudio version 2022.07.2 and R version 4.2.2
(Innocent and Trusting
) for this tutorial. If you run into
errors, I recommend updating all packages and downloading the newest
version of R and RStudio.
Install and load libraries.
There are several libraries that you will need to install and load:
googledrive
, googlesheets4
, and
tidyverse
. I learned a neat code chunk that will check to
see if you have installed the package, and if not, R will install it for
you. It is a set of if
and !require
statements, which I learned from FelixAnalytix.
I highly recommend you visit their website.
### Install and load R packages
if (!require(googlesheets4)) install.packages("googlesheets4")
if (!require(googledrive)) install.packages("googledrive")
if (!require(tidyverse)) install.packages("tidyverse")
library("googlesheets4")
library("googledrive")
library("tidyverse")
Authenticate Google drive
Once the packages have been installed and the libraries have been loaded, the first thing we’ll do is to authenticate with our Google drive account.
### Authenticate Google drive
googledrive::drive_auth()
You may be asked to cache OAuth access credentials in a folder. The purpose of this prompt is to allow the cache to contain a token that will be used to access the Google drive account. This is necessary to ensure that you are securing the gateway between R and the Google drive account. The token is only valid for the current R session.
Figure 1. OAuth access message.
Select Yes
; this will open the browser from which you
will need to select the Google account.
Figure 2. Select the Google account.
After selecting the Google account, you will be asked to sign into the Tidyverse API packages.
Figure 3. Sign into the Tidyverse API packages.
Click on Continue
.
You’ll be asked to authorize the Tidyverse API package access on the
Google drive account. This includes the ability to make edits to the
file in the Google drive account. Make sure check the box to allow the
Tidyverse API package to
See, edit, create, and delete all of your Google Drive files.
Figure 4. Allow the Tidyverse API package to access the Google drive account.
Click on Continue
.
You should receive a message on your browser that the OAuth is complete.
Figure 5. OAuth complete.
The webpage can be closed at this time.
You may get a warning email on the Google account indicating that the Tidyverse API package has been granted access to the Google drive. This is normal and can be safely deleted if you want.
Figure 6. Warning email.
Now, go into the Google drive account and locate the file where the data is stored. In this example, I will use the Social Deprivation Index (SDI) for United States (US) counties. You can download this data from the Robert Graham Center’s website, which is publicly available.
For the purpose of this exercise, I will use the URL where this data
is stored on my Google drive account. But you will need to make sure
that you are using the correct URL where you will store your data on
your Google drive account. Note: The data that I am using in this
tutorial is a *.CSV
file. I recommend using this file
format to make the data loading process easier.
Figure 7. Copy the URL path where the data is tored.
Once you have the URL copied, you can use the code chuck to load the data into a dataframe.
### Load data from Google drive
df <- read_sheet("https://docs.google.com/spreadsheets/d/1QgqlFnBVngBBpfsgGncAJnC_J0lX-NYGPV1zZ3jOC-4/edit#gid=851423220")
head(df)
## # A tibble: 6 × 18
## COUNTY_FIPS County_population SDI_score PovertyLT100_FPL_score
## <dbl> <dbl> <dbl> <dbl>
## 1 1001 55380 43 69
## 2 1003 212830 16 28
## 3 1005 25361 99 100
## 4 1007 22493 64 86
## 5 1009 57681 43 51
## 6 1011 10248 84 100
## # ℹ 14 more variables: Single_Parent_Fam_score <dbl>,
## # Education_LT12years_score <dbl>, HHNo_Vehicle_score <dbl>,
## # HHRenter_Occupied_score <dbl>, HHCrowding_score <dbl>,
## # Nonemployed_score <dbl>, sdi <dbl>, pct_Poverty_LT100 <dbl>,
## # pct_Single_Parent_Fam <dbl>, pct_Education_LT12years <dbl>,
## # pct_NonEmployed <dbl>, pctHH_No_Vehicle <dbl>, pctHH_Renter_Occupied <dbl>,
## # pctHH_Crowding <dbl>
Once you have the data loaded into the R environment as a dataframe, you can begin to manipulate, wrangle, or analyze the data.
Conclusions
This was a fun exercise to figure out. As I mentioned early on, I have never considered using Google drive to store / share data for R projects. Now that I’ve learned how to make this process work, I may consider using Google drive to store large datasets for future R projects.
Acknowledgements
When researching how to load data from Google drive into the R
environment, I had to go to a couple of websites. I’m not familiar with
Google drive and its nuances, nor am I familiar with the
googledrive
and googlesheets4
packages.
Thankfully, there are some great resources on the internet by some
amazing people.
One of these is FelixAnalytix.
The other is by Appsilon Team. I wasn’t able to figure out who the individual author was, but I wanted to acknowledge that this article was incredibly helpful.
If you run into issues with the OAuth process, you may want to read
the gargle
package vignette to troubleshoot the problem (link).
I also want to acknowledge my colleagues who brought this challenge to my attention. I would never have considered this possibility without their inquiry.
Disclaimer
This is a work in progress and may be updated in future iterations.
Additionally, this is for educational purposes only.