Suppose you have a bunch of submitted student reports. Since they are working in groups, there are duplicate files submitted. You want to discard the duplicates.
First get the files
reports <- dir("Report file directory", full.names = TRUE)
Then calculate MD5 sums. Identical files have identical sums: You can also embed plots, for example:
library(tools)
md5 <- md5sum(reports)
Put this into the data.frame :
dr <- data.frame(name = names(md5), hash = md5)
Get the rows with unique MD5 sums:
unique_reports <- dr[match(unique(dr$hash), dr$hash), ]
If you on Mac OS X, you can open these files directly from R:
system(paste0("open '", unique_reports$name[1], "'"))
Who needs Finder (any other file browser), you can do everything from R! :)
Note that I do not advocate such workflow. But it works for me! :) Sometimes :)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.