DATA607_w9 - Tidyverse Vignette

David Simbandumwe

2021-10-24

Overview

The Tidyverse provides packages that simplify repeatable data science tasks. The goal is to facilitate the conversation between humans and a computer about data. The Tidyverse packages all have a same high level philosophy, low-level grammar, and data structures, so that learning one package makes it easier to learn the next.

For this vignette we will focus on the googledrive package

Installation

install.packages(“tidyverse”) install.packages(“googledrive”)


library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
#> ✓ tibble  3.1.4     ✓ dplyr   1.0.7
#> ✓ tidyr   1.1.4     ✓ stringr 1.4.0
#> ✓ readr   2.0.2     ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
library(googledrive)
library(curl)
#> Using libcurl 7.64.1 with LibreSSL/2.8.3
#> 
#> Attaching package: 'curl'
#> The following object is masked from 'package:readr':
#> 
#>     parse_date
library(RCurl)
#> 
#> Attaching package: 'RCurl'
#> The following object is masked from 'package:tidyr':
#> 
#>     complete

Getting Started

The first step in the process is authorizing Tidyverse access to your google drive

drive_auth(email = "david.simbandumwe19@gmail.com")

More Resources

Google Drive Docs R Docs

Data

From the New York Times GITHUB source: CSV US counties "The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Read Files


# load data from github
covid_df = read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv");
#> Rows: 1845855 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): county, state, fips
#> dbl  (2): cases, deaths
#> date (1): date
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(covid_df)
#> # A tibble: 6 × 6
#>   date       county    state      fips  cases deaths
#>   <date>     <chr>     <chr>      <chr> <dbl>  <dbl>
#> 1 2020-01-21 Snohomish Washington 53061     1      0
#> 2 2020-01-22 Snohomish Washington 53061     1      0
#> 3 2020-01-23 Snohomish Washington 53061     1      0
#> 4 2020-01-24 Cook      Illinois   17031     1      0
#> 5 2020-01-24 Snohomish Washington 53061     1      0
#> 6 2020-01-25 Orange    California 06059     1      0
# write data to local file system
write.csv(covid_df,"/Users/dsimbandumwe/dev/cuny/data_607/FALL2021TIDYVERSE/output/covid.csv")

Introduction to googledrive

Goal is to allow Drive access that feels similar to Unix file system utilities so there is a full list of functions that can be performed on your google drive.

working with directories

Search your google drive for (name, type)

drive_find(type = "folder")
#> # A dribble: 3 × 3
#>   name               id                                drive_resource   
#>   <chr>              <drv_id>                          <list>           
#> 1 DATA607 - Project3 1H6Y94MNuqosx-2MnWGT4iFkl4qChq9Qp <named list [34]>
#> 2 DS Survey          1BRXkunxriE1a5XsU4nezIFriNgMMhaX- <named list [34]>
#> 3 DS Jobs In India   1NKBzMoPEzmTaTqf4tu1msW0QwvlsBgI0 <named list [34]>

Create a directory remotely

drive_mkdir(name = "tmp_dir")
#> Created Drive file:
#> • 'tmp_dir' <id: 12jkegN8p2uIEDDdNPwbBKdlrfkShh00I>
#> With MIME type:
#> • 'application/vnd.google-apps.folder'

working with files

Upload a local file to you google drive

drive_upload("/Users/dsimbandumwe/dev/cuny/data_607/FALL2021TIDYVERSE/output/covid.csv", path="tmp_dir/covid.csv")
#> Local file:
#> • '/Users/dsimbandumwe/dev/cuny/data_607/FALL2021TIDYVERSE/output/covid.csv'
#> Uploaded into Drive file:
#> • 'covid.csv' <id: 1CNdPG-OSaFREOFeFueX7v18CnLThBlxO>
#> With MIME type:
#> • 'text/csv'

View files in a specific folder

drive_ls(path = "tmp_dir")
#> # A dribble: 1 × 3
#>   name      id                                drive_resource   
#>   <chr>     <drv_id>                          <list>           
#> 1 covid.csv 1CNdPG-OSaFREOFeFueX7v18CnLThBlxO <named list [39]>

clean up

Remove a directory

drive_rm("tmp_dir")
#> File deleted:
#> • 'tmp_dir' <id: 12jkegN8p2uIEDDdNPwbBKdlrfkShh00I>