The Tidyverse provides packages that simplify repeatable data science tasks. The goal is to facilitate the conversation between humans and a computer about data. The Tidyverse packages all have a same high level philosophy, low-level grammar, and data structures, so that learning one package makes it easier to learn the next.
For this vignette we will focus on the googledrive package
install.packages(“tidyverse”) install.packages(“googledrive”)
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
#> ✓ tibble 3.1.4 ✓ dplyr 1.0.7
#> ✓ tidyr 1.1.4 ✓ stringr 1.4.0
#> ✓ readr 2.0.2 ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
library(googledrive)
library(curl)
#> Using libcurl 7.64.1 with LibreSSL/2.8.3
#>
#> Attaching package: 'curl'
#> The following object is masked from 'package:readr':
#>
#> parse_date
library(RCurl)
#>
#> Attaching package: 'RCurl'
#> The following object is masked from 'package:tidyr':
#>
#> completeThe first step in the process is authorizing Tidyverse access to your google drive
drive_auth(email = "david.simbandumwe19@gmail.com")From the New York Times GITHUB source: CSV US counties "The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
# load data from github
covid_df = read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv");
#> Rows: 1845855 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): county, state, fips
#> dbl (2): cases, deaths
#> date (1): date
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(covid_df)
#> # A tibble: 6 × 6
#> date county state fips cases deaths
#> <date> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020-01-21 Snohomish Washington 53061 1 0
#> 2 2020-01-22 Snohomish Washington 53061 1 0
#> 3 2020-01-23 Snohomish Washington 53061 1 0
#> 4 2020-01-24 Cook Illinois 17031 1 0
#> 5 2020-01-24 Snohomish Washington 53061 1 0
#> 6 2020-01-25 Orange California 06059 1 0# write data to local file system
write.csv(covid_df,"/Users/dsimbandumwe/dev/cuny/data_607/FALL2021TIDYVERSE/output/covid.csv")Goal is to allow Drive access that feels similar to Unix file system utilities so there is a full list of functions that can be performed on your google drive.
Search your google drive for (name, type)
drive_find(type = "folder")
#> # A dribble: 3 × 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 DATA607 - Project3 1H6Y94MNuqosx-2MnWGT4iFkl4qChq9Qp <named list [34]>
#> 2 DS Survey 1BRXkunxriE1a5XsU4nezIFriNgMMhaX- <named list [34]>
#> 3 DS Jobs In India 1NKBzMoPEzmTaTqf4tu1msW0QwvlsBgI0 <named list [34]>Create a directory remotely
drive_mkdir(name = "tmp_dir")
#> Created Drive file:
#> • 'tmp_dir' <id: 12jkegN8p2uIEDDdNPwbBKdlrfkShh00I>
#> With MIME type:
#> • 'application/vnd.google-apps.folder'Upload a local file to you google drive
drive_upload("/Users/dsimbandumwe/dev/cuny/data_607/FALL2021TIDYVERSE/output/covid.csv", path="tmp_dir/covid.csv")
#> Local file:
#> • '/Users/dsimbandumwe/dev/cuny/data_607/FALL2021TIDYVERSE/output/covid.csv'
#> Uploaded into Drive file:
#> • 'covid.csv' <id: 1CNdPG-OSaFREOFeFueX7v18CnLThBlxO>
#> With MIME type:
#> • 'text/csv'View files in a specific folder
drive_ls(path = "tmp_dir")
#> # A dribble: 1 × 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 covid.csv 1CNdPG-OSaFREOFeFueX7v18CnLThBlxO <named list [39]>Remove a directory
drive_rm("tmp_dir")
#> File deleted:
#> • 'tmp_dir' <id: 12jkegN8p2uIEDDdNPwbBKdlrfkShh00I>