The objective of this script is to make an exploratory analysis of the photos I took during my holidays in Boston last summer. Something very simple and descriptive.

First of all we must load the packages we are going to use. Besides the “usual suspects” (dplyr, ggplot, tidyr, etc.) I am going to use a very useful package for photographers. Its name is exifR and it is going to help us extract the metadata from our photos.

library(dplyr)
library(ggplot2)
library(data.table)
library(lubridate)
library(tidyr)
library(xlsx)
library(readr)
library(rmarkdown)
library(scales)
library(gridExtra)

We create a list with all the photos and pass it through the exifr function from the exifR package. My camera is a Canon 6D and I took all my photos in RAW format. So, I set the pattern argument to “*.CR2“. I save the output file as a csv:

# list.cr2 <- list.files(pattern = "*.CR2")
# exifbos <- exifr(list.cr2)
# write.csv(exifbos, 'exifbos.csv', row.names = F)

We import the csv file as a data.frame and see how many observations and variables it has.

exifbos <- read_csv("exifbos.csv", col_types = cols(.default = "c"))
dim(exifbos)
## [1] 1369  343

343 variables with information from 1369 photos. I don’t need so much information for this analysis.

If you want to take a look at the information stored in the raw metadata you can visit this web: http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html#CameraInfo6D

I select just 11 variables from the table and I make some transformations to make the analysis possible.

# I have a problem related with the time zone when I try to generate the html with Knit. I solve the problem setting the time zone to "Europe/Madrid"

Sys.setenv(TZ="Europe/Madrid")

# https://stackoverflow.com/questions/47314121/r-error-unknown-timezone-with-as-posixct


exifbos1 <- select(exifbos,
  Model,
  LensModel, 
  ShutterSpeed,
  Aperture, 
  ISO,
  CanonExposureMode,
  FocusMode,
  MeteringMode,
  ContinuousDrive,
  DateTimeOriginal, 
  SourceFile)

# I use the ymd_hms function from the lubridate package to convert the "DateTimeOriginal" variable to Date class. 
# I also change the class of the variables "Lens", "ShutterSpeed", "Aperture" and "ISO" to Factor.

exifbos1$DateTimeOriginal <- ymd_hms(exifbos1$DateTimeOriginal) 
exifbos1$Aperture <- as.factor(exifbos1$Aperture)
exifbos1$ISO <- as.factor(exifbos1$ISO)
exifbos1$CanonExposureMode <- as.factor(exifbos1$CanonExposureMode)
exifbos1$FocusMode <- as.factor(exifbos1$FocusMode)
exifbos1$MeteringMode <- as.factor(exifbos1$MeteringMode)
exifbos1$ContinuousDrive <- as.factor(exifbos1$ContinuousDrive)
exifbos1$Model <- as.factor(exifbos1$Model)
exifbos1$LensModel <- as.factor(exifbos1$LensModel)

We take a look to the data with summary

summary(exifbos1)
##           Model                  LensModel   ShutterSpeed      
##  Canon EOS 6D:1369   EF35mm f/2 IS USM:718   Length:1369       
##                      EF50mm f/1.8 STM :210   Class :character  
##                      EF85mm f/1.8 USM :441   Mode  :character  
##                                                                
##                                                                
##                                                                
##                                                                
##     Aperture        ISO      CanonExposureMode FocusMode MeteringMode
##  4      :464   100    :483   0:   8            0:1299    1: 127      
##  2.8    :321   12800  : 96   2:   1            1:  57    3:1234      
##  5.6    :290   400    : 81   3:1270            3:  13    5:   8      
##  5      :102   6400   : 77   4:  90                                  
##  3.5    : 69   320    : 57                                           
##  2      : 44   1250   : 54                                           
##  (Other): 79   (Other):521                                           
##  ContinuousDrive DateTimeOriginal               SourceFile       
##  1 : 144         Min.   :2017-07-11 20:41:23   Length:1369       
##  10:1225         1st Qu.:2017-07-13 19:26:31   Class :character  
##                  Median :2017-07-17 11:00:53   Mode  :character  
##                  Mean   :2017-07-16 02:35:31                     
##                  3rd Qu.:2017-07-17 17:48:10                     
##                  Max.   :2017-07-19 12:26:07                     
## 

The summary function already offers us a lot of information about the photos, but we have some problems with certain variables. For example, the ‘CanonExposureMode’, ‘FocusMode’, ‘MeteringMode’ and ‘ContinuousDrive’ contain numerical codes which, without a description, are almost useless. Besides this, the format of the variable ‘ShutterSpeed’ doesn’t correspond to the photo standard. Instead of 1/125 or 1/250 it shows the result of the division: ‘0.008’ and ‘0.004’. Not very convenient!

To solve the first of these problems, the “numerical code” problem, we need the equivalences or descriptions of these numbers. We need a dictionary for each of them. That’s easy. We only have to visit this web http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html#CameraInfo6D and look for the description of each of these variables.

# We create a vector with the "CanonExposureMode" codes,
CanonExposureMode <- c(0, 1, 2, 3, 4, 5, 6, 7)

# and another vector with the correspondent descriptions.
CanonExposureMode_des <- c("Easy",
                           "Program AE",
                           "Shutter speed priority AE",
                           "Aperture-priority AE",
                           "Manual",
                           "Depth-of-field AE",
                           "M-Dep",
                           "Bulb")

# We create a data frame with these two vectors
CanonExposureMode_df <- data.frame(CanonExposureMode, CanonExposureMode_des)

# We change the 'CanonExposureMode' variable to factor class, so its class is the same
# as the 'CanonExposureMode' variable from the 'exifbos1' table.
CanonExposureMode_df$CanonExposureMode <- as.factor(CanonExposureMode_df$CanonExposureMode)

# Finally, we make a left join on the 'exifbos1' table and the 'CanonExposureMode_df' data frame
exifbos1 <- exifbos1 %>% left_join(CanonExposureMode_df, by = "CanonExposureMode")
## Warning: Column `CanonExposureMode` joining factors with different levels,
## coercing to character vector
# And we drop the 'CanonExposureMode' variable
exifbos1 <- exifbos1 %>% select(-CanonExposureMode)

# We check the new 'exifbos1' variable 'CanonExposureMode_des'. It looks fine :)
summary(exifbos1$CanonExposureMode_des)
##      Aperture-priority AE                      Bulb 
##                      1270                         0 
##         Depth-of-field AE                      Easy 
##                         0                         8 
##                     M-Dep                    Manual 
##                         0                        90 
##                Program AE Shutter speed priority AE 
##                         0                         1

We apply the same process to the variables ‘FocusMode’, ‘MeteringMode’ and ‘ContinuousDrive’:

# 'FocusMode'

# We create a vector with the 'FocusMode' codes,
FocusMode <- c(0, 1, 2, 3, 4, 5, 6, 16, 256, 512, 519)

# and another vector with the correspondent descriptions.
FocusMode_des <- c("One-shot AF",
                   "AI Servo AF",
                   "AI Focus AF",
                   "Manual Focus (3)",
                   "Single",
                   "Manual Focus (6)",
                   "Continuous",
                   "Pan Focus",
                   "AF + MF",
                   "Movie Snap Focus",
                   "Movie Servo AF")

# We create a data frame with these two vectors
FocusMode_df <- data.frame(FocusMode, FocusMode_des)

# We change the 'FocusMode' variable to factor class, so its class is the same
# as the 'FocusMode' variable from the 'exifbos1' table.
FocusMode_df$FocusMode <- as.factor(FocusMode_df$FocusMode)

# Finally, we make a left join on the 'exifbos1' table and the 'FocusMode_df' data frame
exifbos1 <- exifbos1 %>% left_join(FocusMode_df, by = "FocusMode")
## Warning: Column `FocusMode` joining factors with different levels, coercing
## to character vector
# And we drop the 'FocusMode' variable
exifbos1 <- exifbos1 %>% select(-FocusMode)


# 'MeteringMode'

# We create a vector with the 'MeteringMode' codes,
MeteringMode <- c(0, 1, 2, 3, 4, 5)

# and another vector with the correspondent descriptions.
MeteringMode_des <- c("Default",
                      "Spot",
                      "Average",
                      "Evaluative",
                      "Partial",
                      "Center-weighted average")

# We create a data frame with these two vectors
MeteringMode_df <- data.frame(MeteringMode, MeteringMode_des)

# We change the 'MeteringMode' variable to factor class, so its class is the same
# as the 'MeteringMode' variable from the 'exifbos1' table.
MeteringMode_df$MeteringMode <- as.factor(MeteringMode_df$MeteringMode)

# Finally, we make a left join on the 'exifbos1' table and the 'MeteringMode_df' data frame
exifbos1 <- exifbos1 %>% left_join(MeteringMode_df, by = "MeteringMode")
## Warning: Column `MeteringMode` joining factors with different levels,
## coercing to character vector
# And we drop the 'MeteringMode' variable
exifbos1 <- exifbos1 %>% select(-MeteringMode)


# ContinuousDrive

# We create a vector with the 'ContinuousDrive' codes,
ContinuousDrive <- c(0, 1, 2, 3, 4, 5, 6, 9, 10)

# and another vector with the correspondent descriptions.
ContinuousDrive_des <- c("Single" ,
                         "Continuous" ,
                         "Movie" ,
                         "Continuous, Speed Priority",
                         "Continuous, Low",
                         "Continuous, High" ,
                         "Silent Single" ,
                         "Single, Silent",
                         "Continuous, Silent")

# We create a data frame with these two vectors
ContinuousDrive_df <- data.frame(ContinuousDrive, ContinuousDrive_des)

# We change the 'ContinuousDrive' variable to factor class, so its class is the same
# as the 'ContinuousDrive' variable from the 'exifbos1' table.
ContinuousDrive_df$ContinuousDrive <- as.factor(ContinuousDrive_df$ContinuousDrive)

# Finally, we make a left join on the 'exifbos1' table and the 'ContinuousDrive_df' data frame
exifbos1 <- exifbos1 %>% left_join(ContinuousDrive_df, by = "ContinuousDrive")
## Warning: Column `ContinuousDrive` joining factors with different levels,
## coercing to character vector
# And we drop the 'ContinuousDrive' variable
exifbos1 <- exifbos1 %>% select(-ContinuousDrive)

We make several transformations to the ShutterSpeed variable in order to get the format we need.

# In order to get the denominator of the shutter speeds which are minor than 1 second we divide 1 by the ShutterSpeed value
exifbos1 <- exifbos1 %>% mutate(ShutterSpeed_1 = ifelse(as.numeric(ShutterSpeed) < 1, 1/as.numeric(ShutterSpeed), ">=1"))
## Warning: package 'bindrcpp' was built under R version 3.3.2
# We create a new variable 'ShutterSpeed_2" where we copy the ShutterSpeed values which are >= 1, and paste 
# as characters "1", "/" and the value from ShutterSpeed_1 where 'ShutterSpeed' is < 1.
exifbos1 <- exifbos1 %>% mutate(ShutterSpeed_2 = ifelse(as.numeric(ShutterSpeed) < 1, paste0("1", "/", as.character(round(ShutterSpeed_1))), as.character(ShutterSpeed)))

# Finally we eliminate the columns 'ShutterSpeed', 'ShutterSpeed_1' and replace the column name 'ShutterSpeed_2'
# with 'ShutterSpeed'
exifbos1 <- exifbos1 %>% select(-ShutterSpeed, -ShutterSpeed_1) %>%
                      rename(ShutterSpeed = ShutterSpeed_2)

# We change the class of the variable to 'factor'
exifbos1$ShutterSpeed <- as.factor(exifbos1$ShutterSpeed)
summary(exifbos1)
##           Model                  LensModel      Aperture        ISO     
##  Canon EOS 6D:1369   EF35mm f/2 IS USM:718   4      :464   100    :483  
##                      EF50mm f/1.8 STM :210   2.8    :321   12800  : 96  
##                      EF85mm f/1.8 USM :441   5.6    :290   400    : 81  
##                                              5      :102   6400   : 77  
##                                              3.5    : 69   320    : 57  
##                                              2      : 44   1250   : 54  
##                                              (Other): 79   (Other):521  
##  DateTimeOriginal               SourceFile       
##  Min.   :2017-07-11 20:41:23   Length:1369       
##  1st Qu.:2017-07-13 19:26:31   Class :character  
##  Median :2017-07-17 11:00:53   Mode  :character  
##  Mean   :2017-07-16 02:35:31                     
##  3rd Qu.:2017-07-17 17:48:10                     
##  Max.   :2017-07-19 12:26:07                     
##                                                  
##                CanonExposureMode_des          FocusMode_des 
##  Aperture-priority AE     :1270      One-shot AF     :1299  
##  Manual                   :  90      AI Servo AF     :  57  
##  Easy                     :   8      Manual Focus (3):  13  
##  Shutter speed priority AE:   1      AF + MF         :   0  
##  Bulb                     :   0      AI Focus AF     :   0  
##  Depth-of-field AE        :   0      Continuous      :   0  
##  (Other)                  :   0      (Other)         :   0  
##                 MeteringMode_des                 ContinuousDrive_des
##  Average                :   0    Continuous, Silent        :1225    
##  Center-weighted average:   8    Continuous                : 144    
##  Default                :   0    Continuous, High          :   0    
##  Evaluative             :1234    Continuous, Low           :   0    
##  Partial                :   0    Continuous, Speed Priority:   0    
##  Spot                   : 127    Movie                     :   0    
##                                  (Other)                   :   0    
##   ShutterSpeed
##  1/320  :311  
##  1/250  :257  
##  1/125  :220  
##  1/160  :145  
##  1/500  : 65  
##  1/640  : 59  
##  (Other):312

We generate four graphs with the lens, aperture, shutterspeed and ISO used and we put them on a grid

g1 <- ggplot(data=exifbos1, aes(x = LensModel)) +
  geom_bar()  + 
  geom_text(stat='count', aes(label=..count..), vjust= 2, color = "white") +
  theme_minimal()

g2 <- ggplot(data=exifbos1, aes(x = Aperture)) +
  geom_bar()  + 
  geom_text(stat='count', aes(label=..count..), vjust= -1) +
  theme_minimal()

g3 <- ggplot(data=exifbos1, aes(x = ShutterSpeed)) +
  geom_bar()  + 
  geom_text(stat='count', aes(label=..count..), vjust= -1) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

g4 <- ggplot(data=exifbos1, aes(x = ISO)) +
  geom_bar()  + 
  geom_text(stat='count', aes(label=..count..), vjust= -1) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

grid.arrange(arrangeGrob(g1, g2, g3, g4, ncol = 2, nrow = 2))

We generate another graph where we plot every photo versus the day (x axis) and the time of the photo (y axis). We add another dimension, the lense used, with colors

g5 <- ggplot(exifbos1, aes(x = as.factor(day(DateTimeOriginal)), y = hour(DateTimeOriginal), col = LensModel, alpha = 1)) +
  geom_jitter(width = 0.5, height = 0.5) +
  theme(legend.position ="top") + 
  labs(x = "Days (July - 2018)", y = "Time of the photo") 

g5