Reading Data

Load Packages

Lets start by loading in information from necessary packages:
Notice that the here package is indicating that I am in my folder under Joy_worm_images.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.1     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(here)

## here() starts at /Users/grad/Box/Joy_worm_images/Joy

Using the here package

Lets see how the here package works.
We can use the function list.files to list files in a given directory…
This will give us information about what is in your folder.

Notice - I am using the notation here::here()… this is called specifying the name space. This lets other people who read your code know which package your function belongs to. So in this case we are using the here() function that is present in the here package

list.files(here::here())

## [1] "20200519_jordan_H01.csv" "20200521_jordan_H02.csv"
## [3] "20200526_jordan_H03.csv" "Joy.Rproj"              
## [5] "Presentations"           "R scripts"

Cool, so what if we want to read a file in the folder? This is where tidyverse comes in!
Remember, tidyverse is essentially a universe of R packages…
We will use the read_csv() function present in the readr package (housed in tidyverse).

Here’s what that looks like:

readr::read_csv(here::here("20200519_jordan_H01.csv"))

## Warning: Missing column names filled in: 'X1' [1]

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   X1 = col_double(),
##   Label = col_character(),
##   Area = col_double(),
##   Angle = col_double(),
##   Length = col_double()
## )

First 6 rows of dataframe
X1	Label	Area	Angle	Length
1	p01-growth-H01-2X_B01.TIF	81	0.000	80.435
2	p01-growth-H01-2X_B01.TIF	5	0.000	3.875
3	p01-growth-H01-2X_B01.TIF	77	0.000	76.811
4	p01-growth-H01-2X_B01.TIF	6	36.027	5.101
5	p01-growth-H01-2X_B01.TIF	84	0.000	84.011
6	p01-growth-H01-2X_B01.TIF	6	48.991	5.080

Using the here package with tidyverse

But what if there’s more than one file in your directory? (which in your case there are)
We would not want to write out the file name each time… instead lets assign all the files to a variable and then read.

# I don't want to include the folders "Presenations" and "R scripts" so I am specifying th number of objects unlike above
# I am also adding in "full.names = TRUE" to get the full pathname for each file (ensures we won't have issues later on)
files <- list.files(here::here(), full.names = TRUE)[1:3]
files

## [1] "/Users/grad/Box/Joy_worm_images/Joy/20200519_jordan_H01.csv"
## [2] "/Users/grad/Box/Joy_worm_images/Joy/20200521_jordan_H02.csv"
## [3] "/Users/grad/Box/Joy_worm_images/Joy/20200526_jordan_H03.csv"

Now we are going to use a function in the purrr package (also part of tidyverse) called map_dfr() to go through all the elements in the variable files and read each. Don’t worry too much about the notation used for purrr… if you are curious to know more about purrr feel free to reach out to me.

worms <- purrr::map_dfr(files, ~readr::read_csv(.x))

Great! Now lets check out the dataframe we just loaded in…

First 10 rows of the worms dataframe
X1	Label	Area	Angle	Length
1	p01-growth-H01-2X_B01.TIF	81	0.000	80.435
2	p01-growth-H01-2X_B01.TIF	5	0.000	3.875
3	p01-growth-H01-2X_B01.TIF	77	0.000	76.811
4	p01-growth-H01-2X_B01.TIF	6	36.027	5.101
5	p01-growth-H01-2X_B01.TIF	84	0.000	84.011
6	p01-growth-H01-2X_B01.TIF	6	48.991	5.080
7	p01-growth-H01-2X_B01.TIF	71	0.000	70.411
8	p01-growth-H01-2X_B01.TIF	5	-51.072	4.178
9	p01-growth-H01-2X_B01.TIF	86	0.000	85.893
10	p01-growth-H01-2X_B01.TIF	6	56.310	4.807

And there you have it! We have identified files in our directory that we are interested in, and we have used purrr and readr to read in each file.

Give it a try yourself…

We can use the above basics to look at data in your own folder.
Begin by listing files. For example, let me pretend I’m working in Izzy’s folder. Let’s list the files found here:

# you will not write a path here. I must do this because I'm entering a folder that is not my own
# your code should look like: list.files(here::here())
list.files("/Users/grad/Box/Joy_worm_images/Izzy")

##  [1] "20200604_Izzy_H01.csv"                       
##  [2] "20200604_Izzy_H02.csv"                       
##  [3] "20200607_Izzy_H03.csv"                       
##  [4] "20200607_Izzy_H05.csv"                       
##  [5] "20200610_Izzy_H04.csv"                       
##  [6] "20200621_Izzy_H06.csv"                       
##  [7] "20200621_Izzy_H07.csv"                       
##  [8] "20200621_Izzy_H08.csv"                       
##  [9] "20200621_Izzy_H09.csv"                       
## [10] "20200622_Izzy_H10.csv"                       
## [11] "20200622_Izzy_H11.csv"                       
## [12] "20200623_Izzy_H12.csv"                       
## [13] "20200623_Izzy_H13.csv"                       
## [14] "20200629_Izzy_H14.csv"                       
## [15] "20200629_Izzy_H15.csv"                       
## [16] "20200630_Izzy_H16.csv"                       
## [17] "20200630_Izzy_H17.csv"                       
## [18] "20200630_Izzy_H18.csv"                       
## [19] "20200630_Izzy_H19.csv"                       
## [20] "20200630_Izzy_H20.csv"                       
## [21] "20200705_Izzy_H21.csv"                       
## [22] "20200705_Izzy_H22.csv"                       
## [23] "20200705_Izzy_H23.csv"                       
## [24] "20200706_Izzy_H24.csv"                       
## [25] "20200706_Izzy_H25.csv"                       
## [26] "20200706_Izzy_H26.csv"                       
## [27] "20200706_Izzy_H27.csv"                       
## [28] "20200706_Izzy_H28.csv"                       
## [29] "20200706_Izzy_H29.csv"                       
## [30] "20200706_Izzy_H30.csv"                       
## [31] "20200707_Izzy_H31.csv"                       
## [32] "20200707_Izzy_H32.csv"                       
## [33] "20200707_Izzy_H33.csv"                       
## [34] "20200709_Izzy_H34.csv"                       
## [35] "20200709_Izzy_H35.csv"                       
## [36] "20200709_Izzy_H36.csv"                       
## [37] "20200709_Izzy_H37.csv"                       
## [38] "20200709_Izzy_H38.csv"                       
## [39] "20200709_Izzy_H39.csv"                       
## [40] "20200709_Izzy_H40.csv"                       
## [41] "20200713_Izzy_H41.csv"                       
## [42] "20200713_Izzy_H42.csv"                       
## [43] "20200715_Izzy_H43.csv"                       
## [44] "20200715_Izzy_H44.csv"                       
## [45] "20200715_Izzy_H45.csv"                       
## [46] "20200715_Izzy_H46.csv"                       
## [47] "20200715_Izzy_H47.csv"                       
## [48] "20200715_Izzy_H48.csv"                       
## [49] "20200715_Izzy_H49.csv"                       
## [50] "20200715_Izzy_H50.csv"                       
## [51] "20200715_Izzy_H51.csv"                       
## [52] "20200717_Izzy_H52.csv"                       
## [53] "20200718_Izzy_H53.csv"                       
## [54] "20200718_Izzy_H54.csv"                       
## [55] "20200719_Izzy_H55.csv"                       
## [56] "20200720_Izzy_H56.csv"                       
## [57] "20200720_Izzy_H57.csv"                       
## [58] "20200721_Izzy_H58.csv"                       
## [59] "20200721_Izzy_H59.csv"                       
## [60] "20200721_Izzy_H60.csv"                       
## [61] "20200721_Izzy_H61.csv"                       
## [62] "20200722_Izzy_H62.csv"                       
## [63] "20200722_Izzy_H63.csv"                       
## [64] "20200722_Izzy_H64.csv"                       
## [65] "20200722_Izzy_H65.csv"                       
## [66] "20200722_Izzy_H66.csv"                       
## [67] "20200722_Izzy_H67.csv"                       
## [68] "20200723_Izzy_H68.csv"                       
## [69] "20200723_Izzy_H69.csv"                       
## [70] "20200723_Izzy_H70.csv"                       
## [71] "20200723_Izzy_H71.csv"                       
## [72] "20200723_Izzy_H72.csv"                       
## [73] "Data 1 - Big FIve Personality Traits.numbers"
## [74] "Izzy.Rproj"                                  
## [75] "TidyData.R"

Notice that at the end there is a file that is NOT a .csv file. You do not want to read this so we will need to tell R which files we want to read, similar to what I did before…

list.files("/Users/grad/Box/Joy_worm_images/Izzy")[1:60]

##  [1] "20200604_Izzy_H01.csv" "20200604_Izzy_H02.csv" "20200607_Izzy_H03.csv"
##  [4] "20200607_Izzy_H05.csv" "20200610_Izzy_H04.csv" "20200621_Izzy_H06.csv"
##  [7] "20200621_Izzy_H07.csv" "20200621_Izzy_H08.csv" "20200621_Izzy_H09.csv"
## [10] "20200622_Izzy_H10.csv" "20200622_Izzy_H11.csv" "20200623_Izzy_H12.csv"
## [13] "20200623_Izzy_H13.csv" "20200629_Izzy_H14.csv" "20200629_Izzy_H15.csv"
## [16] "20200630_Izzy_H16.csv" "20200630_Izzy_H17.csv" "20200630_Izzy_H18.csv"
## [19] "20200630_Izzy_H19.csv" "20200630_Izzy_H20.csv" "20200705_Izzy_H21.csv"
## [22] "20200705_Izzy_H22.csv" "20200705_Izzy_H23.csv" "20200706_Izzy_H24.csv"
## [25] "20200706_Izzy_H25.csv" "20200706_Izzy_H26.csv" "20200706_Izzy_H27.csv"
## [28] "20200706_Izzy_H28.csv" "20200706_Izzy_H29.csv" "20200706_Izzy_H30.csv"
## [31] "20200707_Izzy_H31.csv" "20200707_Izzy_H32.csv" "20200707_Izzy_H33.csv"
## [34] "20200709_Izzy_H34.csv" "20200709_Izzy_H35.csv" "20200709_Izzy_H36.csv"
## [37] "20200709_Izzy_H37.csv" "20200709_Izzy_H38.csv" "20200709_Izzy_H39.csv"
## [40] "20200709_Izzy_H40.csv" "20200713_Izzy_H41.csv" "20200713_Izzy_H42.csv"
## [43] "20200715_Izzy_H43.csv" "20200715_Izzy_H44.csv" "20200715_Izzy_H45.csv"
## [46] "20200715_Izzy_H46.csv" "20200715_Izzy_H47.csv" "20200715_Izzy_H48.csv"
## [49] "20200715_Izzy_H49.csv" "20200715_Izzy_H50.csv" "20200715_Izzy_H51.csv"
## [52] "20200717_Izzy_H52.csv" "20200718_Izzy_H53.csv" "20200718_Izzy_H54.csv"
## [55] "20200719_Izzy_H55.csv" "20200720_Izzy_H56.csv" "20200720_Izzy_H57.csv"
## [58] "20200721_Izzy_H58.csv" "20200721_Izzy_H59.csv" "20200721_Izzy_H60.csv"

Awesome okay, now lets assign these to a variable so we can call them in the next step:

# remember we want to use the full names to avoid downstream problems
files <- list.files("/Users/grad/Box/Joy_worm_images/Izzy", full.names = TRUE)[1:60]
## your code will look like this: files <- list.files(here::here(), full.names = TRUE)[1:__]

Now lets read in all files:

worms <- purrr::map_dfr(files, ~readr::read_csv(.x))

Aaand here we are:

First 10 rows of Izzy’s data
X1	Label	Area	Angle	Length
1	p01-growth-H01-2X_F01.TIF	79	0.000	78.675
2	p01-growth-H01-2X_F01.TIF	7	-69.444	5.696
3	p01-growth-H01-2X_F01.TIF	83	0.000	82.100
4	p01-growth-H01-2X_F01.TIF	6	29.982	5.003
5	p01-growth-H01-2X_F01.TIF	65	0.000	64.593
6	p01-growth-H01-2X_F01.TIF	5	0.000	4.868
7	p01-growth-H01-2X_F01.TIF	86	0.000	85.628
8	p01-growth-H01-2X_F01.TIF	4	36.870	2.500
9	p01-growth-H01-2X_F01.TIF	81	0.000	80.141
10	p01-growth-H01-2X_F01.TIF	6	59.349	5.231

Explore your data