The goal of this exercise is to make you familiar with how to download data from Google Sheets and to briefly review some key concepts R functions and coding concepts.
We’ll do the following things
## Google sheets download package
# comment this out when you are done
# install.packages("googlesheets4")
library(googlesheets4)
# comp bio packages
library(seqinr)
library(rentrez)
library(compbio4all)
library(Biostrings)
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Warning: package 'S4Vectors' was built under R version 4.1.2
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: XVector
## Loading required package: GenomeInfoDb
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:seqinr':
##
## translate
## The following object is masked from 'package:base':
##
## strsplit
First, we need a web address (URL) for the spreadsheet with the data.
spreadsheet_sp <- "https://docs.google.com/spreadsheets/d/1spC_ZA3_cVuvU3e_Jfcj2nEIfzp-vaP7SA5f-qwQ1pg/edit?usp=sharing"
Second, we need to make sure we tall the package we aren’t interested in checking user access credentials / authorization.
# be sure to run this!
googlesheets4::gs4_deauth() # <====== MUST RUN THIS
Third, we download our data.
“Error in curl::curl_fetch_memory(url, handle = handle) : Error in the HTTP2 framing layer”
If that happens, just re-run the code.
# I include this again in case you missed is the first time : )
googlesheets4::gs4_deauth()
# download
## NOTE: if you get an error, just run the code again
refseq_column <- read_sheet(ss = spreadsheet_sp, # the url
sheet = "RefSeq_prot", # the name of the worksheet
range = "selenoprot!H1:H364",
col_names = TRUE,
na = "", # fill in empty spaces "" w/NA
trim_ws = TRUE)
## ✓ Reading from "human_gene_table".
## ✓ Range ''selenoprot'!H1:H364'.
## NOTE: if you get an error, just run the code again
# for reasons we won't get into I'm going to do this
protein_refseq <- refseq_column$RefSeq_prot
Here’s a snapshot of the results
protein_refseq[1:10]
## [1] "NP_000783.2" "NP_998758.1" "NP_001034804.1" "NP_001034805.1"
## [5] "NP_001311245.1" NA NA "NP_054644.1"
## [9] "NP_001353425.1" "NP_000784.3"