If you’ve ever uploaded subjects, you know that it can be hard to make sure you’ve got them all uploaded correctly. “Deleting” the set actually just creates a bunch of orphaned images that take up space on the server, and just re-loading the manifest will create duplicates that will be a pain to sort through later.
To upload properly, you’ll need to download a copy of the SUBJECTS EXPORT from the data export tab, extract the filenames from the metadata column, and compare to your original manifest.
library(tidyjson)
library(dplyr)
library(jsonlite)
library(magrittr)
#read in manifest - the images you intend to upload
manifest <- read.csv("~/Desktop/ZooniverseConsulting/SAS-Experiment/subjects/original/original_manifest.csv", stringsAsFactors = F)
#read in subject export - the images that actually loaded
subjects <- read.csv("~/Downloads/snapshots-at-sea-subjects.csv", stringsAsFactors = F)
# filter to JUST the needed subject set, which is identified next to your subject set's name
subjects %<>% filter(., subject_set_id == 12613)
head(subjects)
The metadata column is in JSON, but the column will probably be named “filename” or something similar. Use the “prettify” function from jsonlite to look inside.
subjects$metadata[1] %>% prettify
{
"rowname": "8",
"filename": "20150801-KSpencer-4181.jpg"
}
So parse the JSON column by grabbing the values for the “filename” key. Oddly, you’ll have one row for every workflow in your project, even if that subject_set isn’t linked to that workflow. So just ignore that and grab unique values only.
Check to see what from the original manifest is missing from the subject export.
missing <- anti_join(manifest, flat)
Joining, by = "filename"
head(missing)
Write this as a new manifest. Uploading via the CLI means you can leave it in the original directory and only the missing images will upload. If you need to upload via the project builder GUI, you’ll need to manually move all of the “missing” photos to a new directory to upload.
write.csv(missing, file = "~/Desktop/ZooniverseConsulting/SAS-Experiment/subjects/original/missing_original_manifest.csv", row.names = F)
LS0tCnRpdGxlOiAiVXBsb2FkaW5nIEVycm9ycyAtIGZpeGluZyBzdWJqZWN0cyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKSWYgeW91J3ZlIGV2ZXIgdXBsb2FkZWQgc3ViamVjdHMsIHlvdSBrbm93IHRoYXQgaXQgY2FuIGJlIGhhcmQgdG8gbWFrZSBzdXJlIHlvdSd2ZSBnb3QgdGhlbSBhbGwgdXBsb2FkZWQgY29ycmVjdGx5LiAiRGVsZXRpbmciIHRoZSBzZXQgYWN0dWFsbHkganVzdCBjcmVhdGVzIGEgYnVuY2ggb2Ygb3JwaGFuZWQgaW1hZ2VzIHRoYXQgdGFrZSB1cCBzcGFjZSBvbiB0aGUgc2VydmVyLCBhbmQganVzdCByZS1sb2FkaW5nIHRoZSBtYW5pZmVzdCB3aWxsIGNyZWF0ZSBkdXBsaWNhdGVzIHRoYXQgd2lsbCBiZSBhIHBhaW4gdG8gc29ydCB0aHJvdWdoIGxhdGVyLgoKVG8gdXBsb2FkIHByb3Blcmx5LCB5b3UnbGwgbmVlZCB0byBkb3dubG9hZCBhIGNvcHkgb2YgdGhlIFNVQkpFQ1RTIEVYUE9SVCBmcm9tIHRoZSBkYXRhIGV4cG9ydCB0YWIsIGV4dHJhY3QgdGhlIGZpbGVuYW1lcyBmcm9tIHRoZSBtZXRhZGF0YSBjb2x1bW4sIGFuZCBjb21wYXJlIHRvIHlvdXIgb3JpZ2luYWwgbWFuaWZlc3QuCgoKYGBge3J9CmxpYnJhcnkodGlkeWpzb24pCmxpYnJhcnkoZHBseXIpCmxpYnJhcnkoanNvbmxpdGUpCmxpYnJhcnkobWFncml0dHIpCmBgYAoKCmBgYHtyfQojcmVhZCBpbiBtYW5pZmVzdCAtIHRoZSBpbWFnZXMgeW91IGludGVuZCB0byB1cGxvYWQKbWFuaWZlc3QgPC0gcmVhZC5jc3YoIn4vRGVza3RvcC9ab29uaXZlcnNlQ29uc3VsdGluZy9TQVMtRXhwZXJpbWVudC9zdWJqZWN0cy9vcmlnaW5hbC9vcmlnaW5hbF9tYW5pZmVzdC5jc3YiLCBzdHJpbmdzQXNGYWN0b3JzID0gRikKCiNyZWFkIGluIHN1YmplY3QgZXhwb3J0IC0gdGhlIGltYWdlcyB0aGF0IGFjdHVhbGx5IGxvYWRlZApzdWJqZWN0cyA8LSByZWFkLmNzdigifi9Eb3dubG9hZHMvc25hcHNob3RzLWF0LXNlYS1zdWJqZWN0cy5jc3YiLCBzdHJpbmdzQXNGYWN0b3JzID0gRikgCgojIGZpbHRlciB0byBKVVNUIHRoZSBuZWVkZWQgc3ViamVjdCBzZXQsIHdoaWNoIGlzIGlkZW50aWZpZWQgbmV4dCB0byB5b3VyIHN1YmplY3Qgc2V0J3MgbmFtZQpzdWJqZWN0cyAlPD4lIGZpbHRlciguLCBzdWJqZWN0X3NldF9pZCA9PSAxMjYxMykKaGVhZChzdWJqZWN0cykKYGBgCgpUaGUgbWV0YWRhdGEgY29sdW1uIGlzIGluIEpTT04sIGJ1dCB0aGUgY29sdW1uIHdpbGwgcHJvYmFibHkgYmUgbmFtZWQgImZpbGVuYW1lIiBvciBzb21ldGhpbmcgc2ltaWxhci4gVXNlIHRoZSAicHJldHRpZnkiIGZ1bmN0aW9uIGZyb20ganNvbmxpdGUgdG8gbG9vayBpbnNpZGUuCmBgYHtyfQpzdWJqZWN0cyRtZXRhZGF0YVsxXSAlPiUgcHJldHRpZnkKCmBgYAoKU28gcGFyc2UgdGhlIEpTT04gY29sdW1uIGJ5IGdyYWJiaW5nIHRoZSB2YWx1ZXMgZm9yIHRoZSAiZmlsZW5hbWUiIGtleS4gT2RkbHksIHlvdSdsbCBoYXZlIG9uZSByb3cgZm9yIGV2ZXJ5IHdvcmtmbG93IGluIHlvdXIgcHJvamVjdCwgZXZlbiBpZiB0aGF0IHN1YmplY3Rfc2V0IGlzbid0IGxpbmtlZCB0byB0aGF0IHdvcmtmbG93LiBTbyBqdXN0IGlnbm9yZSB0aGF0IGFuZCBncmFiIHVuaXF1ZSB2YWx1ZXMgb25seS4KYGBge3J9CgpmbGF0IDwtIHN1YmplY3RzICU+JSAKICAgICBzZWxlY3QoLiwgc3ViamVjdF9pZCwgc3ViamVjdF9zZXRfaWQsIG1ldGFkYXRhKSAlPiUKICAgICBhcy50YmxfanNvbihqc29uLmNvbHVtbiA9ICJtZXRhZGF0YSIpICU+JSAjcGFyc2UgSlNPTgogICAgIHNwcmVhZF92YWx1ZXMoZmlsZW5hbWUgPSBqc3RyaW5nKCJmaWxlbmFtZSIpKSAlPiUgCiAgICAgdW5pcXVlKC4pCgpoZWFkKGZsYXQpCmBgYAoKQ2hlY2sgdG8gc2VlIHdoYXQgZnJvbSB0aGUgb3JpZ2luYWwgbWFuaWZlc3QgaXMgbWlzc2luZyBmcm9tIHRoZSBzdWJqZWN0IGV4cG9ydC4KYGBge3J9Cm1pc3NpbmcgPC0gYW50aV9qb2luKG1hbmlmZXN0LCBmbGF0KQpoZWFkKG1pc3NpbmcpCmBgYAoKV3JpdGUgdGhpcyBhcyBhIG5ldyBtYW5pZmVzdC4gVXBsb2FkaW5nIHZpYSB0aGUgQ0xJIG1lYW5zIHlvdSBjYW4gbGVhdmUgaXQgaW4gdGhlIG9yaWdpbmFsIGRpcmVjdG9yeSBhbmQgb25seSB0aGUgbWlzc2luZyBpbWFnZXMgd2lsbCB1cGxvYWQuIElmIHlvdSBuZWVkIHRvIHVwbG9hZCB2aWEgdGhlIHByb2plY3QgYnVpbGRlciBHVUksIHlvdSdsbCBuZWVkIHRvIG1hbnVhbGx5IG1vdmUgYWxsIG9mIHRoZSAibWlzc2luZyIgcGhvdG9zIHRvIGEgbmV3IGRpcmVjdG9yeSB0byB1cGxvYWQuCgpgYGB7cn0Kd3JpdGUuY3N2KG1pc3NpbmcsIGZpbGUgPSAifi9EZXNrdG9wL1pvb25pdmVyc2VDb25zdWx0aW5nL1NBUy1FeHBlcmltZW50L3N1YmplY3RzL29yaWdpbmFsL21pc3Npbmdfb3JpZ2luYWxfbWFuaWZlc3QuY3N2Iiwgcm93Lm5hbWVzID0gRikKCmBgYA==