Lets start by loading in information from necessary packages:
Notice that the here package is indicating that I am in my folder under Joy_worm_images.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.1 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(here)
## here() starts at /Users/grad/Box/Joy_worm_images/Joy
Lets see how the here package works.
We can use the function list.files to list files in a given directory…
This will give us information about what is in your folder.
Notice - I am using the notation here::here()… this is called specifying the name space. This lets other people who read your code know which package your function belongs to. So in this case we are using the here() function that is present in the here package
list.files(here::here())
## [1] "20200519_jordan_H01.csv" "20200521_jordan_H02.csv"
## [3] "20200526_jordan_H03.csv" "Joy.Rproj"
## [5] "Presentations" "R scripts"
Cool, so what if we want to read a file in the folder? This is where tidyverse comes in!
Remember, tidyverse is essentially a universe of R packages…
We will use the read_csv() function present in the readr package (housed in tidyverse).
Here’s what that looks like:
readr::read_csv(here::here("20200519_jordan_H01.csv"))
## Warning: Missing column names filled in: 'X1' [1]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## Label = col_character(),
## Area = col_double(),
## Angle = col_double(),
## Length = col_double()
## )
| X1 | Label | Area | Angle | Length |
|---|---|---|---|---|
| 1 | p01-growth-H01-2X_B01.TIF | 81 | 0.000 | 80.435 |
| 2 | p01-growth-H01-2X_B01.TIF | 5 | 0.000 | 3.875 |
| 3 | p01-growth-H01-2X_B01.TIF | 77 | 0.000 | 76.811 |
| 4 | p01-growth-H01-2X_B01.TIF | 6 | 36.027 | 5.101 |
| 5 | p01-growth-H01-2X_B01.TIF | 84 | 0.000 | 84.011 |
| 6 | p01-growth-H01-2X_B01.TIF | 6 | 48.991 | 5.080 |
But what if there’s more than one file in your directory? (which in your case there are)
We would not want to write out the file name each time… instead lets assign all the files to a variable and then read.
# I don't want to include the folders "Presenations" and "R scripts" so I am specifying th number of objects unlike above
# I am also adding in "full.names = TRUE" to get the full pathname for each file (ensures we won't have issues later on)
files <- list.files(here::here(), full.names = TRUE)[1:3]
files
## [1] "/Users/grad/Box/Joy_worm_images/Joy/20200519_jordan_H01.csv"
## [2] "/Users/grad/Box/Joy_worm_images/Joy/20200521_jordan_H02.csv"
## [3] "/Users/grad/Box/Joy_worm_images/Joy/20200526_jordan_H03.csv"
Now we are going to use a function in the purrr package (also part of tidyverse) called map_dfr() to go through all the elements in the variable files and read each. Don’t worry too much about the notation used for purrr… if you are curious to know more about purrr feel free to reach out to me.
worms <- purrr::map_dfr(files, ~readr::read_csv(.x))
Great! Now lets check out the dataframe we just loaded in…
| X1 | Label | Area | Angle | Length |
|---|---|---|---|---|
| 1 | p01-growth-H01-2X_B01.TIF | 81 | 0.000 | 80.435 |
| 2 | p01-growth-H01-2X_B01.TIF | 5 | 0.000 | 3.875 |
| 3 | p01-growth-H01-2X_B01.TIF | 77 | 0.000 | 76.811 |
| 4 | p01-growth-H01-2X_B01.TIF | 6 | 36.027 | 5.101 |
| 5 | p01-growth-H01-2X_B01.TIF | 84 | 0.000 | 84.011 |
| 6 | p01-growth-H01-2X_B01.TIF | 6 | 48.991 | 5.080 |
| 7 | p01-growth-H01-2X_B01.TIF | 71 | 0.000 | 70.411 |
| 8 | p01-growth-H01-2X_B01.TIF | 5 | -51.072 | 4.178 |
| 9 | p01-growth-H01-2X_B01.TIF | 86 | 0.000 | 85.893 |
| 10 | p01-growth-H01-2X_B01.TIF | 6 | 56.310 | 4.807 |
And there you have it! We have identified files in our directory that we are interested in, and we have used purrr and readr to read in each file.
We can use the above basics to look at data in your own folder.
Begin by listing files. For example, let me pretend I’m working in Izzy’s folder. Let’s list the files found here:
# you will not write a path here. I must do this because I'm entering a folder that is not my own
# your code should look like: list.files(here::here())
list.files("/Users/grad/Box/Joy_worm_images/Izzy")
## [1] "20200604_Izzy_H01.csv"
## [2] "20200604_Izzy_H02.csv"
## [3] "20200607_Izzy_H03.csv"
## [4] "20200607_Izzy_H05.csv"
## [5] "20200610_Izzy_H04.csv"
## [6] "20200621_Izzy_H06.csv"
## [7] "20200621_Izzy_H07.csv"
## [8] "20200621_Izzy_H08.csv"
## [9] "20200621_Izzy_H09.csv"
## [10] "20200622_Izzy_H10.csv"
## [11] "20200622_Izzy_H11.csv"
## [12] "20200623_Izzy_H12.csv"
## [13] "20200623_Izzy_H13.csv"
## [14] "20200629_Izzy_H14.csv"
## [15] "20200629_Izzy_H15.csv"
## [16] "20200630_Izzy_H16.csv"
## [17] "20200630_Izzy_H17.csv"
## [18] "20200630_Izzy_H18.csv"
## [19] "20200630_Izzy_H19.csv"
## [20] "20200630_Izzy_H20.csv"
## [21] "20200705_Izzy_H21.csv"
## [22] "20200705_Izzy_H22.csv"
## [23] "20200705_Izzy_H23.csv"
## [24] "20200706_Izzy_H24.csv"
## [25] "20200706_Izzy_H25.csv"
## [26] "20200706_Izzy_H26.csv"
## [27] "20200706_Izzy_H27.csv"
## [28] "20200706_Izzy_H28.csv"
## [29] "20200706_Izzy_H29.csv"
## [30] "20200706_Izzy_H30.csv"
## [31] "20200707_Izzy_H31.csv"
## [32] "20200707_Izzy_H32.csv"
## [33] "20200707_Izzy_H33.csv"
## [34] "20200709_Izzy_H34.csv"
## [35] "20200709_Izzy_H35.csv"
## [36] "20200709_Izzy_H36.csv"
## [37] "20200709_Izzy_H37.csv"
## [38] "20200709_Izzy_H38.csv"
## [39] "20200709_Izzy_H39.csv"
## [40] "20200709_Izzy_H40.csv"
## [41] "20200713_Izzy_H41.csv"
## [42] "20200713_Izzy_H42.csv"
## [43] "20200715_Izzy_H43.csv"
## [44] "20200715_Izzy_H44.csv"
## [45] "20200715_Izzy_H45.csv"
## [46] "20200715_Izzy_H46.csv"
## [47] "20200715_Izzy_H47.csv"
## [48] "20200715_Izzy_H48.csv"
## [49] "20200715_Izzy_H49.csv"
## [50] "20200715_Izzy_H50.csv"
## [51] "20200715_Izzy_H51.csv"
## [52] "20200717_Izzy_H52.csv"
## [53] "20200718_Izzy_H53.csv"
## [54] "20200718_Izzy_H54.csv"
## [55] "20200719_Izzy_H55.csv"
## [56] "20200720_Izzy_H56.csv"
## [57] "20200720_Izzy_H57.csv"
## [58] "20200721_Izzy_H58.csv"
## [59] "20200721_Izzy_H59.csv"
## [60] "20200721_Izzy_H60.csv"
## [61] "20200721_Izzy_H61.csv"
## [62] "20200722_Izzy_H62.csv"
## [63] "20200722_Izzy_H63.csv"
## [64] "20200722_Izzy_H64.csv"
## [65] "20200722_Izzy_H65.csv"
## [66] "20200722_Izzy_H66.csv"
## [67] "20200722_Izzy_H67.csv"
## [68] "20200723_Izzy_H68.csv"
## [69] "20200723_Izzy_H69.csv"
## [70] "20200723_Izzy_H70.csv"
## [71] "20200723_Izzy_H71.csv"
## [72] "20200723_Izzy_H72.csv"
## [73] "Data 1 - Big FIve Personality Traits.numbers"
## [74] "Izzy.Rproj"
## [75] "TidyData.R"
Notice that at the end there is a file that is NOT a .csv file. You do not want to read this so we will need to tell R which files we want to read, similar to what I did before…
list.files("/Users/grad/Box/Joy_worm_images/Izzy")[1:60]
## [1] "20200604_Izzy_H01.csv" "20200604_Izzy_H02.csv" "20200607_Izzy_H03.csv"
## [4] "20200607_Izzy_H05.csv" "20200610_Izzy_H04.csv" "20200621_Izzy_H06.csv"
## [7] "20200621_Izzy_H07.csv" "20200621_Izzy_H08.csv" "20200621_Izzy_H09.csv"
## [10] "20200622_Izzy_H10.csv" "20200622_Izzy_H11.csv" "20200623_Izzy_H12.csv"
## [13] "20200623_Izzy_H13.csv" "20200629_Izzy_H14.csv" "20200629_Izzy_H15.csv"
## [16] "20200630_Izzy_H16.csv" "20200630_Izzy_H17.csv" "20200630_Izzy_H18.csv"
## [19] "20200630_Izzy_H19.csv" "20200630_Izzy_H20.csv" "20200705_Izzy_H21.csv"
## [22] "20200705_Izzy_H22.csv" "20200705_Izzy_H23.csv" "20200706_Izzy_H24.csv"
## [25] "20200706_Izzy_H25.csv" "20200706_Izzy_H26.csv" "20200706_Izzy_H27.csv"
## [28] "20200706_Izzy_H28.csv" "20200706_Izzy_H29.csv" "20200706_Izzy_H30.csv"
## [31] "20200707_Izzy_H31.csv" "20200707_Izzy_H32.csv" "20200707_Izzy_H33.csv"
## [34] "20200709_Izzy_H34.csv" "20200709_Izzy_H35.csv" "20200709_Izzy_H36.csv"
## [37] "20200709_Izzy_H37.csv" "20200709_Izzy_H38.csv" "20200709_Izzy_H39.csv"
## [40] "20200709_Izzy_H40.csv" "20200713_Izzy_H41.csv" "20200713_Izzy_H42.csv"
## [43] "20200715_Izzy_H43.csv" "20200715_Izzy_H44.csv" "20200715_Izzy_H45.csv"
## [46] "20200715_Izzy_H46.csv" "20200715_Izzy_H47.csv" "20200715_Izzy_H48.csv"
## [49] "20200715_Izzy_H49.csv" "20200715_Izzy_H50.csv" "20200715_Izzy_H51.csv"
## [52] "20200717_Izzy_H52.csv" "20200718_Izzy_H53.csv" "20200718_Izzy_H54.csv"
## [55] "20200719_Izzy_H55.csv" "20200720_Izzy_H56.csv" "20200720_Izzy_H57.csv"
## [58] "20200721_Izzy_H58.csv" "20200721_Izzy_H59.csv" "20200721_Izzy_H60.csv"
Awesome okay, now lets assign these to a variable so we can call them in the next step:
# remember we want to use the full names to avoid downstream problems
files <- list.files("/Users/grad/Box/Joy_worm_images/Izzy", full.names = TRUE)[1:60]
## your code will look like this: files <- list.files(here::here(), full.names = TRUE)[1:__]
Now lets read in all files:
worms <- purrr::map_dfr(files, ~readr::read_csv(.x))
Aaand here we are:
| X1 | Label | Area | Angle | Length |
|---|---|---|---|---|
| 1 | p01-growth-H01-2X_F01.TIF | 79 | 0.000 | 78.675 |
| 2 | p01-growth-H01-2X_F01.TIF | 7 | -69.444 | 5.696 |
| 3 | p01-growth-H01-2X_F01.TIF | 83 | 0.000 | 82.100 |
| 4 | p01-growth-H01-2X_F01.TIF | 6 | 29.982 | 5.003 |
| 5 | p01-growth-H01-2X_F01.TIF | 65 | 0.000 | 64.593 |
| 6 | p01-growth-H01-2X_F01.TIF | 5 | 0.000 | 4.868 |
| 7 | p01-growth-H01-2X_F01.TIF | 86 | 0.000 | 85.628 |
| 8 | p01-growth-H01-2X_F01.TIF | 4 | 36.870 | 2.500 |
| 9 | p01-growth-H01-2X_F01.TIF | 81 | 0.000 | 80.141 |
| 10 | p01-growth-H01-2X_F01.TIF | 6 | 59.349 | 5.231 |
Try some of these functions:
1. colnames(worms) - gives the names of columns
2. dim(worms) - outputs the number of rows then columns
3. summary(worms) - outputs the summary statistics
4. str(worms) - gives structure information of the df