Summary

Here’s how it started. I’m looking at buying a new lens for my camera. In addition to being expensive, lenses come in a wide variety configurations. I wanna make sure I get a lens that fits my needs.

One of the main considerations when buying a lens is the focal length and speed, which heavily depend on the type of photography you do. I feel I can use some data to help assess my needs.

Modern cameras embed a lot of data in each shot. This includes: camera make and model, aperture, flash settings, etc. This information is stored as Exchangeable Image File Format - EXIF attributes.

The goal of this project is to scrape these attributes from my Flickr Photostream using the Flickr API


The Libraries

library(RCurl)
library(knitr)
library(rmongodb)
library(XML)
library(lattice)
library(ggplot2)

# Flickr API Key --- GET YOUR OWN
api_key <- "GET_YOUR_OWN_KEY_AT_FLICKR_DOT_COM"
baseURL <- paste("https://api.flickr.com/services/rest/?&format=rest&api_key=",api_key,sep="")
# we're using Mongo to store the results
m <- mongo.create()
ns <- "flickr"

Finding my Flickr UserID

All I know is my username. Here, we will use Flickr’s API findByUsername method in order to obtain my UserID

#
# Use findByUsername: https://www.flickr.com/services/api/explore/flickr.people.findByUsername
#

# Flickr username
user_name <- "rmalarc"

# Getting the userID
findByUsername <- paste(baseURL,"&method=flickr.people.findByUsername&username=",user_name,sep="")

findByUsername_data <- xmlRoot(xmlParse(getURL(findByUsername)))
user_id <- xmlSApply(findByUsername_data,xmlAttrs)["id",]

(user_id)
## [1] "10904202@N07"

Get All My Public Photos

In this step we will get all my public photos by using the getPublicPhotos method. The list of photos is stored in a data frame.

#
# Use getPublicPhotos: https://www.flickr.com/services/api/flickr.people.getPublicPhotos.html
#
getPublicPhotos <- paste(baseURL
                         ,"&method=flickr.people.getPublicPhotos&per_page=1000&user_id="
                         ,user_id
                         ,sep="")

getPublicPhotos_data <- xmlRoot(xmlParse(getURL(getPublicPhotos)))

#parse the total number of pages
pages_data <- data.frame(xmlAttrs(getPublicPhotos_data[["photos"]]))
pages_data[] <- lapply(pages_data, as.character)
pages_data[] <- lapply(pages_data, as.integer)
colnames(pages_data)<- "value"
total_pages <- pages_data["pages","value"] 

photos_for_user<-NULL

# loop thru pages of photos and save the list in a DF
for(i in c(1:total_pages)){
  getPublicPhotos_data <- xmlRoot(xmlParse(getURL(getPublicPhotos)))
  tmp_df<-data.frame(t(data.frame(xmlSApply(getPublicPhotos_data[["photos"]],xmlAttrs))),stringsAsFactors=FALSE)
  tmp_df$page <- i
  photos_for_user<-rbind(photos_for_user,tmp_df)

  # get the next page
  getPublicPhotos <- paste(baseURL
                           ,"&method=flickr.people.getPublicPhotos&per_page=1000&user_id="
                           ,user_id,"&page="
                           ,i+1,sep="")  
  getPublicPhotos_data <- xmlRoot(xmlParse(getURL(getPublicPhotos)))
  
}

kable(head(photos_for_user))
id owner secret server farm title ispublic isfriend isfamily page
photo 17011910768 10904202@N07 7ab8396b7d 7627 8 1 0 0 1
photo.1 17196870332 10904202@N07 259eff77f2 8793 9 1 0 0 1
photo.2 17197945231 10904202@N07 e6b2ed22d1 8817 9 1 0 0 1
photo.3 16576089414 10904202@N07 e9c6ca917a 5334 6 1 0 0 1
photo.4 17196868242 10904202@N07 f9ef8f0425 8731 9 1 0 0 1
photo.5 16578337763 10904202@N07 d9112cda3f 7690 8 1 0 0 1

Select the Photos to Process

The purpose of this script is to process new photos from my photo stream. Previous data is stored in MongoDB. Here, we will select photos not yet processed.

# get previously processed photos
entry_type <- "photos_exif"
coll<-paste(ns,entry_type,sep=".")

entries_processed <- NULL
cursor<-mongo.find(m
                    ,coll
                    ,query = mongo.bson.empty()
                    ,fields=list(photo_id=1L)
                    )
entries_processed <- mongo.cursor.to.data.frame(cursor,stringsAsFactors=FALSE)
## Warning in mongo.cursor.to.data.frame(cursor, stringsAsFactors = FALSE):
## This fails for most NoSQL data structures. I am working on a new solution
mongo.cursor.destroy(cursor)
## [1] FALSE
# process records not already processed
entries_to_process <- photos_for_user[!(photos_for_user$id %in% entries_processed$photo_id),]


kable(head(entries_to_process))

id owner secret server farm title ispublic isfriend isfamily page — —— ——- ——- —– —— ——— ——— ——— —–


Get Exif Photo Attributes

Now it’s time to get those Exif attributes for my pictures. We will use the getExif method from the Flicker API. Results are stored into MongoDB.

#
# Use getExif: https://www.flickr.com/services/api/flickr.photos.getExif.html
#

photos_exif <-NULL
for(photo_id in entries_to_process$id){
  Sys.sleep(0.5)

  #photo_id<-photos_for_user$id[1]
  getExif <- paste(baseURL,"&method=flickr.photos.getExif&photo_id=",photo_id,sep="")
  getExif_data <- xmlRoot(xmlParse(getURL(getExif)))
  
  # get the exif attribute name 
  tmp_df<-data.frame(xmlSApply(getExif_data[["photo"]],xmlAttrs),stringsAsFactors=FALSE)
  colnames(tmp_df)<-tmp_df["tag",]
  
  # get the exif attribute raw values
  tmp_df_val<-data.frame(t(data.frame(xmlSApply(getExif_data[["photo"]],xmlValue))),stringsAsFactors=FALSE)
  
  # name the column after the attribute name
  colnames(tmp_df_val)<-tmp_df["tag",]

  # add a photo_id column and rowname
  tmp_df_val$photo_id <- photo_id
  row.names(tmp_df_val)<-photo_id
  colnames(photos_exif)
  
  # export to mongo
  mongo.insert(m, coll,mongo.bson.from.JSON(toJSON(tmp_df_val)))
#  photos_exif<-rbind.fill(photos_exif,tmp_df_val)
}

Load Processed Photo Attributes

In this step, we will load all previously loaded data.

photos_exif <- NULL
cursor<-mongo.find(m
                    ,coll
                    ,query = mongo.bson.empty()
                    )
photos_exif <- mongo.cursor.to.data.frame(cursor)
## Warning in mongo.cursor.to.data.frame(cursor): This fails for most NoSQL
## data structures. I am working on a new solution
mongo.cursor.destroy(cursor)
## [1] FALSE
kable(head(photos_exif))
Make Model XResolution YResolution ResolutionUnit Software ModifyDate ExifVersion FocalPlaneXResolution FocalPlaneYResolution FocalPlaneResolutionUnit Compression ResolutionUnit.1 ThumbnailOffset ThumbnailLength DisplayedUnitsX DisplayedUnitsY PhotoshopThumbnail IPTCDigest CodedCharacterSet ApplicationRecordVersion ObjectName ViewingCondIlluminant ViewingCondSurround ViewingCondIlluminantType MeasurementObserver MeasurementBacking MeasurementGeometry MeasurementFlare MeasurementIlluminant XMPToolkit CreatorTool MetadataDate Title DCTEncodeVersion APP14Flags0 APP14Flags1 ColorTransform photo_id ExposureTime FNumber ExposureProgram ISO DateTimeOriginal CreateDate ExposureCompensation MaxApertureValue SubjectDistance MeteringMode LightSource Flash FocalLength SubSecTimeOriginal SubSecTimeDigitized SensingMethod FileSource SceneType CFAPattern CustomRendered ExposureMode WhiteBalance DigitalZoomRatio FocalLengthIn35mmFormat SceneCaptureType GainControl Contrast Saturation Sharpness SubjectDistanceRange DateCreated TimeCreated DigitalCreationDate DigitalCreationTime Lens LensID ImageNumber ApproximateFocusDistance Format InstanceID DocumentID OriginalDocumentID HistoryAction HistoryInstanceID HistoryWhen HistorySoftwareAgent HistoryChanged JFIFVersion Orientation ComponentsConfiguration CompressedBitsPerPixel SubSecTime FlashpixVersion ColorSpace Instructions FlashCompensation SpecialInstructions Rating LensInfo LensModel HistoryParameters DerivedFromDocumentID DerivedFromOriginalDocumentID DerivedFromInstanceID YCbCrPositioning MakerNoteVersion Quality FocusMode FlashSetting WhiteBalanceFineTune WB_RBLevels ProgramShift ExposureDifference FlashExposureComp ISOSetting ImageBoundary ExternalFlashExposureComp FlashExposureBracketValue CropHiSpeed ExposureTuning SerialNumber VRInfoVersion VibrationReduction ActiveD.Lighting PictureControlVersion PictureControlName PictureControlBase PictureControlAdjust PictureControlQuickAdjust Brightness HueAdjustment FilterEffect ToningEffect ToningSaturation Timezone DaylightSavings DateDisplayFormat ISOExpansion ISO2 ISOExpansion2 AutoDistortionControl LensType FlashMode ShootingMode LensFStops ShotInfoVersion FirmwareVersion NoiseReduction ColorBalanceUnknown LensDataVersion ExitPupilPosition AFAperture FocusPosition FocusDistance LensIDNumber MinFocalLength MaxFocalLength MaxApertureAtMinFocal MaxApertureAtMaxFocal MCUVersion EffectiveMaxAperture RetouchHistory ImageDataSize ShutterCount FlashInfoVersion FlashSource ExternalFlashFirmware ExternalFlashFlags FlashCommanderMode FlashControlMode FlashGNDistance FlashColorFilter FlashGroupAControlMode FlashGroupBControlMode FlashGroupCControlMode FlashGroupACompensation FlashGroupBCompensation FlashGroupCCompensation VariProgram MultiExposureVersion MultiExposureMode MultiExposureShots MultiExposureAutoGain HighISONoiseReduction PowerUpTime AFInfo2Version ContrastDetectAF AFAreaMode PhaseDetectAF PrimaryAFPoint AFPointsUsed ContrastDetectAFInFocus FileInfoVersion DirectoryNumber FileNumber PreviewImageStart PreviewImageLength InteropIndex InteropVersion OtherImageStart OtherImageLength DateAcquired PLUSVersion PropertyReleaseStatus ModelReleaseStatus MinorModelAgeDisclosure DigitalSourceType ImageDescription Caption.Abstract Description UserComment ImageUniqueID PipelineVersion StreamType WhiteBalance0 WhiteBalance1 WhiteBalance2 ExposureCompensation.1 Contrast.1 Keywords Subject LastKeywordXMP RegionRectangle RegionPersonDisplayName RegionInfoRegionsPersonEmailDigest RegionInfoRegionsPersonLiveIdCID RegionInfoRegionsPersonSourceID CopyrightFlag Marked WhiteBalance.1 ExposureBracketValue ColorSpace.1 Sharpness.1 Saturation.1 AutoBracketRelease WB_RGGBLevels FocalLength.1 AutoFocus XResolution.1 YResolution.1 YCbCrPositioning.1 Compression.1 ResolutionUnit.2 Padding Padding.1 OffsetSchema ResolutionUnit.3 About XPKeywords Contrast.2 Brightness.1 ColorMode ISOSelection DataDump ImageAdjustment AuxiliaryLens ManualFocusDistance DigitalZoom AFPoint AFPointsInFocus SceneMode ImageStabilization SceneAssist ImageProcessing GPSVersionID FlashType
TBD TBD 240240 dpi 240240 dpi inches Adobe Photoshop Lightroom 3.4.1 2011:07:03 13:07:36 0230 0 0 Unknown (0) JPEG (old-style) inches 354 12298 inches inches (Binary data 12298 bytes, use -b option to extract) 149c06d715d3551c5208f774f6e505b4 UTF8 4 Nevermind 19.6445 20.3718 16.8089 3.92889 4.07439 3.36179 D50 CIE 1931 0 0 0 Unknown (0) 0.999% D65 Adobe XMP Core 5.2-c004 1.136881, 2010/06/10-18:11:35 Adobe Photoshop Lightroom 3.4.1 2011:07:03 13:07:36-04:00 Nevermind 100 (none) (none) YCbCr 5897949466 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NIKON CORPORATION NIKON D3100 240240 dpi 240240 dpi inches Adobe Photoshop Lightroom 3.4.1 2011:07:03 11:56:03 0230 NA NA NA JPEG (old-style) inches 936 10854 inches inches (Binary data 10854 bytes, use -b option to extract) b5f5079d6b4add09c68e161d71c9b86a UTF8 4 Rockets 19.6445 20.3718 16.8089 3.92889 4.07439 3.36179 D50 CIE 1931 0 0 0 Unknown (0) 0.999% D65 Adobe XMP Core 5.2-c004 1.136881, 2010/06/10-18:11:35 Adobe Photoshop Lightroom 3.4.1 2011:07:03 11:56:03-04:00 Rockets 100 (none) (none) YCbCr 5897149309 1/130.077 sec (1/13) 5.3f/5.3 Manual 800 2011:07:02 21:45:15 2011:07:02 21:45:15 00 EV 5.3 4294967295 m Multi-segment Unknown No Flash 165.0 mm165 mm 20 20 One-chip color area Digital Camera Directly photographed [Green,Blue][Red,Green] Normal Manual Auto 1 247 mm Standard Low gain up Normal Normal Normal Unknown 2011:07:02 21:45:15 2011:07:02 21:45:15 55.0-200.0 mm f/4.0-5.6 144 2494 4294967295 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NIKON CORPORATION NIKON D3100 240240 dpi 240240 dpi inches Adobe Photoshop Lightroom 3.4.1 2011:07:04 15:27:02 0230 NA NA NA JPEG (old-style) inches 936 8614 inches inches (Binary data 8614 bytes, use -b option to extract) 9557bb826024486e623b21ad55304211 UTF8 4 NA 19.6445 20.3718 16.8089 3.92889 4.07439 3.36179 D50 CIE 1931 0 0 0 Unknown (0) 0.999% D65 Adobe XMP Core 5.2-c004 1.136881, 2010/06/10-18:11:35 Adobe Photoshop Lightroom 3.4.1 2011:07:04 15:27:02-04:00 NA 100 (none) (none) YCbCr 5902290596 1/25001/2500 sec 4.2f/4.2 Aperture-priority AE 400 2011:07:04 10:26:39 2011:07:04 10:26:39 00 EV 4.1 12.6 m Multi-segment Unknown No Flash 70.0 mm70 mm 40 40 One-chip color area Digital Camera Directly photographed [Green,Blue][Red,Green] Normal Auto Auto 1 105 mm Standard Low gain up Normal Normal Normal Unknown 2011:07:04 10:26:39 2011:07:04 10:26:39 55.0-200.0 mm f/4.0-5.6 144 3010 12.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
TBD TBD 240240 dpi 240240 dpi inches Adobe Photoshop Lightroom 3.4.1 2011:07:02 08:50:10 0230 0 0 Unknown (0) JPEG (old-style) inches 386 15578 inches inches (Binary data 15578 bytes, use -b option to extract) 9dfd812a07083bd41a91a33c4d41da73 UTF8 4 NA 19.6445 20.3718 16.8089 3.92889 4.07439 3.36179 D50 CIE 1931 0 0 0 Unknown (0) 0.999% D65 Adobe XMP Core 5.2-c004 1.136881, 2010/06/10-18:11:35 Adobe Photoshop Lightroom 3.4.1 2011:07:02 08:50:10-04:00 NA 100 (none) (none) YCbCr 5893337439 NA NA NA NA NA 1903:12:31 19:00:00 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1903:12:31 19:00:00-05:00 NA NA NA NA image/jpeg xmp.iid:0580117407206811997AC32132F59959 xmp.did:0480117407206811997AC32132F59959 xmp.did:0480117407206811997AC32132F59959 saved xmp.iid:0480117407206811997AC32132F59959 2011:06:28 15:49:08-04:00 Adobe Photoshop CS5.1 Macintosh / NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NIKON CORPORATION NIKON D3100 240240 dpi 240240 dpi inches Adobe Photoshop Lightroom 3.4.1 2011:07:04 15:27:03 0230 NA NA NA JPEG (old-style) inches 936 7078 inches inches (Binary data 7078 bytes, use -b option to extract) 3b7b969da8aea50c7e54f614798f3d98 UTF8 4 NA 19.6445 20.3718 16.8089 3.92889 4.07439 3.36179 D50 CIE 1931 0 0 0 Unknown (0) 0.999% D65 Adobe XMP Core 5.2-c004 1.136881, 2010/06/10-18:11:35 Adobe Photoshop Lightroom 3.4.1 2011:07:04 15:27:03-04:00 NA 100 (none) (none) YCbCr 5901728223 1/12500.001 sec (1/1250) 5.6f/5.6 Aperture-priority AE 400 2011:07:04 11:47:13 2011:07:04 11:47:13 00 EV 5.7 3.35 m Multi-segment Unknown No Flash 200.0 mm200 mm 50 50 One-chip color area Digital Camera Directly photographed [Green,Blue][Red,Green] Normal Auto Auto 1 300 mm Standard Low gain up Normal Normal Normal Unknown 2011:07:04 11:47:13 2011:07:04 11:47:13 55.0-200.0 mm f/4.0-5.6 144 3065 3.35 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NIKON COOLPIX S8100 240240 dpi 240240 dpi inches Adobe Photoshop Lightroom 3.4.1 2011:06:12 11:33:33 0230 NA NA NA JPEG (old-style) inches 750 12103 inches inches (Binary data 12103 bytes, use -b option to extract) f6f31c84f2afc6b5297389c99dcfc39e UTF8 4 NA 19.6445 20.3718 16.8089 3.92889 4.07439 3.36179 D50 CIE 1931 0 0 0 Unknown (0) 0.999% D65 Adobe XMP Core 5.2-c004 1.136881, 2010/06/10-18:11:35 Adobe Photoshop Lightroom 3.4.1 2011:06:12 11:33:33-04:00 NA 100 (none) (none) YCbCr 5824423227 1/600.017 sec (1/60) 5.1f/5.1 Program AE 200 2011:06:09 10:55:23 2011:06:09 10:55:23 00 EV 3.5 NA Multi-segment Unknown Auto, Fired 47.2 mm NA NA NA Digital Camera Directly photographed NA Normal Auto Auto 0 262 mm Standard Low gain up Normal Normal Normal Unknown 2011:06:09 10:55:23 2011:06:09 10:55:23 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#photos_exif_old <-photos_exif
#apply(photos_for_user,1,function(results){
#  mongo.insert(m, coll,mongo.bson.from.JSON(toJSON(results)))})

Cleaning Up the Data

attr<-photos_exif[photos_exif$Model == "NIKON D3100"
                  ,c("photo_id"
                    ,"Model"
                    ,"Lens"
                    ,"FocalLength"
                    ,"FNumber"
                    ,"MaxApertureValue"
                    ,"ISO"
                    ,"ExposureTime"
                    ,"GainControl"
                    ,"Flash"
                    ,"LightSource"
                    ,"DateTimeOriginal")]


attr$Lens[grep("18-270mm",attr$Lens)] <- "18-270mm f/3.5-6.3"
attr$Lens[grep("18\\.0-270\\.0",attr$Lens)] <- "18-270mm f/3.5-6.3"
attr$Lens[grep("18\\.0-55\\.0",attr$Lens)] <- "18-55mm f/3.5-5.6"
attr$Lens[grep("55\\.0-200\\.0",attr$Lens)] <- "55-200mm f/4-5.6"
attr$Lens<-as.character(attr$Lens)
attr$ExposureTime<-as.character(attr$ExposureTime)


attr$FocalLength_clean <- as.numeric(gsub("^\\d+\\.\\d+ +mm(.+) +mm","\\1",attr$FocalLength,fixed = FALSE ))
## Warning: NAs introduced by coercion
attr$FNumber_clean <- as.numeric(gsub("^.+\\/(.+)","\\1",attr$FNumber,fixed = FALSE ))
attr$ExposureTime_clean <- as.integer(gsub(".*\\/([0-9]+)\\)$","\\1",attr$ExposureTime))
## Warning: NAs introduced by coercion
attr$FocalLength_disc <- cut(attr$FocalLength_clean, breaks = c(0,25, 50, 75, 100,150, 300))

Analysing the Data

Lens Utilization

p <- qplot(Lens, data=attr, geom="bar", fill=Lens)
p + labs(title = "Volume of Pictures by Lens Type")

As we can see, I definitely use my prime lens a lot more than the others, even the fancy 18-270mm zoom lens.

Focal Length Utilization

histogram(~FocalLength_clean, data = attr,
          main="Distribution of Focal Length",
          xlab="Focal Length (mm)")

fl_freq<-data.frame(table(attr$FocalLength_disc)/sum(table(attr$FocalLength_disc)))
colnames(fl_freq)<-c("Focal Length","Frequency")
kable(fl_freq)
Focal Length Frequency
(0,25] 0.2017587
(25,50] 0.6365413
(50,75] 0.0576453
(75,100] 0.0205178
(100,150] 0.0195408
(150,300] 0.0639961

As we can see above, over 85% of my pictures are taken at less than 50mm.

histogram(~FocalLength_clean | Lens, data = attr,
          main="Distribution of Focal Length by Lens Type",
          xlab="Focal Length (mm)")

In the above plot, we can see that I don’t even use high-zoom levels even with my bigger lenses.

f-Number and Exposure Time Analysis

xyplot(FocalLength_clean~FNumber_clean|Lens, data=attr,
   main = "Focal Length vs f-Number by Lens Type",
   xlab = "f-Number", ylab = "Focal Length (mm)")

histogram(~FNumber_clean | Lens, data = attr,
          main="Distribution of f-Number by Lens Type",
          xlab="f-Number")

xyplot(ExposureTime_clean~FNumber_clean|Lens, data=attr,
   main = "Exposure Time vs f-Number by Lens Type",
   xlab = "f-Number", ylab = "Exposure Time (1/DEN)")

As we can see from the charts above, I tend to shoot my pictures on with a low f-number, regardless of the lens and exposure time.


Let’s do some statistical tests..

First one: Let’s prove that hardly ever use zoom settings over 100mm

  • H0: My utilization of high-zoom (>100mm) values is random P(X>100)>0.95
  • H1: I consistently utilize high-zoom in my pictures P(X>100)<=0.95
#Let's calculate the mean and standard deviation for my 17-270mm lens
fl_17_2170 <- attr[attr$Lens=="18-270mm f/3.5-6.3" &!is.na(attr$FocalLength_clean),]



mean_fl <- mean(fl_17_2170$FocalLength_clean) 
sd_fl <- sd(fl_17_2170$FocalLength_clean) 

pnorm(100,mean_fl,sd_fl)
## [1] 0.7400844

With a p of 0.74, we must reject the null hypothesis and accept that I do utilize consistently zoom settings greater than 100mm

Since I do use high-level zoom settings, what’s the cut-off with a p of 0.90?

qnorm(0.95,mean_fl,sd_fl)
## [1] 173.0482

According tpo the value above, it makes sense to buy a lens with a zoom up to 173mm

Notes on the tests

The above tests imply that zoom values follow a normal distribition, which as we look at the below histogram does not appear to be the case.

histogram(~FocalLength_clean, data = fl_17_2170,
          main="Distribution of Focal Length for 18-270mm f/3.5-6.3 lens",
          xlab="Focal Length (mm)")


Conclusion

Based on my photographic history, there’s no need to go for a big zoom lens. Despite of the evidence, I should be fine if I choose a lens that goes up to 100mm. Instead, I should focus on a lens that offers the lowest f-number.