Presented By

Title: Final-Report-Flickr-Data

Team Members: Blesson Thomas, Shekha Saxena, Shivam Namdeo

Introduction

What is Flickr ? It is one of the largest photo management and sharing application in the world. It lets users share their photos on its platflorm and enable new ways of organizing photos and videos.

Project Proposal

Data Collection

flickr.interestingness.getList flickr.tags.getListPhoto flickr.tags.getRelated flickr.tags.getHotList flickr.places.placesForTags

Original Plan

Analytics

We have used Knn algorithm for predicting the photo count.

RMSE- 16772 MAE - 6158

Loading libraries

library(tm)
library(tidyverse)
library(forecast)
library(wordcloud)
library(syuzhet)
library(lubridate)
library(ggplot2)
library(scales)
library(reshape2)
library(dplyr)
library(tidytext)

AI/ML/NLP Procedure summary

mydata <- read_csv("flickr.csv")
predictData <- read_csv("knn.csv")
KnnModel<-readRDS("Knn.rds")
head(mydata)
## # A tibble: 6 x 3
##   tags        continent     photoCount
##   <chr>       <chr>              <dbl>
## 1 palisades   North America       7138
## 2 palisades   Europe                21
## 3 palisades   Australia              3
## 4 palisades   Asia                   3
## 5 floridakeys North America       9867
## 6 floridakeys Europe                 6
mydata$tags = as.factor(mydata$tags)
mydata$continent = as.factor(mydata$continent)

# partition
set.seed(1)  
train.index <- sample(c(1:dim(mydata)[1]), dim(mydata)[1]*0.80)  
train <- mydata[train.index, ]
valid <- mydata[-train.index, ]

accuracy(predictData$Prediction, predictData$Actual)
##                 ME     RMSE      MAE  MPE MAPE
## Test set -1896.413 16772.93 6158.554 -Inf  Inf

Implication - Using the data, we have tried to predict the photocount expected if a new photo with some tag is uploaded in a continent. This will help the Flickr team to prepare the servers for an upload. We are predict this value because today’s is a world of trend and people follow it blindly so one needs to to be prepared for an upcoming trend.Although the error value is not very good, but Flickr will still be able to predict an estimate value up and down the number i.e the number of photos that might get uploaded. We can further train the model and make it more accurate with some external data as well.

Analyzing Market for Flickr

plot1 = mydata %>% group_by(continent) %>% summarise(photoCount = sum(photoCount))

ggplot(plot1, aes(x = continent, y = photoCount, fill=continent)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ggtitle("Continents where Flickr is used most") +
  theme(legend.position="none", axis.text.x = element_text(hjust = 1)) 

Implication - As we can see from the graph above, there is a huge scope for Flickr to market in Africa, South America and Australia. The biggest market for Flickr is Europe.

plot2 = mydata %>% group_by(tags) %>% summarise(photoCount = sum(photoCount)) %>% filter(photoCount > 800000)

ggplot(plot2, aes(x = tags, y = photoCount, fill=tags)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ggtitle("Top tags and the no. of pics count on them all over the world") +
  theme(legend.position="none", axis.text.x = element_text(hjust = 1)) 

Implication - From the graph above we can see that the most popular tag is “instagramapp” all over the world.

Peer comments

Summary

Since the peer comments suggested most that we should perform sentiment analysis on the comments, below is the sentiment analysis on the comments of most interesting photos on Flickr.

Sentiment Analysis on Flickr Comments

myData <- read_csv("Comment_flickr.csv")
corpus<-iconv(myData$comments)
corpus<-Corpus(VectorSource(corpus))
inspect(corpus[1:5])
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] Fantastic love this photo !!                                                                                                                                                                                                                                                                                                                                 
## [2] [https://www.flickr.com/photos/112944336@N06] Thanks !                                                                                                                                                                                                                                                                                                       
## [3] Fabulous image, fantastic captured, a very successful job                                                                                                                                                                                                                                                                                                    
## [4] There is a big difference between a snapshot and a photo.\nMany who posted on Flickr cannot tell the difference but you definitely could. Thank you for sharing this aesthetically pleasing, attention catching, clean photo.\nps: My comment is not individualised but I really choose photos I comment. If you don’t like having my comment, just block me.
## [5] Excellent image!
corpus<-tm_map(corpus,tolower)
corpus<-tm_map(corpus,removePunctuation)
corpus<-tm_map(corpus,removeNumbers)
clean_set<-tm_map(corpus,removeWords,stopwords('english'))
length(clean_set)
## [1] 2200
removeURL <- function(x) gsub('http[[:alnum:]]*', '', x)
clean_set <- tm_map(clean_set, content_transformer(removeURL))

clean_set<-tm_map(clean_set,stripWhitespace)

tdm<-TermDocumentMatrix(clean_set)
tdm<-as.matrix(tdm)
tdm[1:10,1:20]
##                Docs
## Terms           1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##   fantastic     1 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   love          1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   photo         1 0 0 2 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0
##   thanks        0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   captured      0 0 1 0 0 0 0 0 0  0  0  1  0  0  0  0  0  0  0  0
##   fabulous      0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   image         0 0 1 0 1 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   job           0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   successful    0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   aesthetically 0 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
# Here we plot the words which a frequency of more than 50 
w<-rowSums(tdm)
w<-subset(w,w>=50)
barplot(w,las=2,col=rainbow(50))

Implication - From the plot above we can see the most occuring words in the corpus.

Word Cloud

# Wordcloud

w <- sort(rowSums(tdm), decreasing = TRUE)
set.seed(222)
wordcloud(words = names(w),
          freq = w,
          max.words = 100,
          random.order = F,
          min.freq = 5,
          colors = brewer.pal(8, 'Dark2'),
          scale = c(5, 0.3),
          rot.per = 0.7)

Implication - Here we have plotted the word cloud which shows the words occuring most in bold and denser towards the center of the cloud.

Sentiment Analysis - NRC

comments<-iconv(myData$comments)
s<-get_nrc_sentiment(comments)

barplot(colSums(s),
        las = 2,
        col = rainbow(10),
        ylab = 'Count',
        main = 'Sentiment Scores for Flickr Comment')

Implications - from the plot above we can see that mostly comments have a positive sentiment and very few comments have disgust sentiment.