Facebook Data Analysis by Abhinav Jain
This report is based on the analysis done on my Facebook data extracted through Facebook Graph API and then cleaned and wrangled to create appropriate datasets using Python and R.
## YEAR Mean_Likes Median_Likes Mean_Comments
## Min. :2011 Min. : 4.00 Min. : 4.000 Min. :0.000
## 1st Qu.:2012 1st Qu.: 7.00 1st Qu.: 4.000 1st Qu.:1.000
## Median :2014 Median :13.00 Median : 8.000 Median :2.000
## Mean :2014 Mean :16.57 Mean : 9.571 Mean :2.429
## 3rd Qu.:2016 3rd Qu.:27.50 3rd Qu.:14.000 3rd Qu.:4.000
## Max. :2017 Max. :30.00 Max. :19.000 Max. :5.000
## Median_Comments Likes Comments Posts
## Min. :0.0000 Min. : 62 Min. : 12.0 Min. : 14.0
## 1st Qu.:0.0000 1st Qu.:1002 1st Qu.:140.0 1st Qu.: 72.5
## Median :0.0000 Median :2856 Median :186.0 Median :125.0
## Mean :0.8571 Mean :2314 Mean :316.6 Mean :151.1
## 3rd Qu.:1.5000 3rd Qu.:3373 3rd Qu.:463.0 3rd Qu.:183.0
## Max. :3.0000 Max. :4534 Max. :812.0 Max. :408.0
## TYPE_VIDEO TYPE_LINK TYPE_PHOTO TYPE_STATUS
## Min. : 0.00 Min. : 0.00 Min. : 5.00 Min. : 9.00
## 1st Qu.: 0.50 1st Qu.:11.00 1st Qu.:33.50 1st Qu.: 11.00
## Median : 1.00 Median :23.00 Median :49.00 Median : 18.00
## Mean : 53.57 Mean :19.14 Mean :42.57 Mean : 35.71
## 3rd Qu.: 26.50 3rd Qu.:27.00 3rd Qu.:56.50 3rd Qu.: 38.50
## Max. :320.00 Max. :35.00 Max. :64.00 Max. :124.00
## 'data.frame': 7 obs. of 12 variables:
## $ YEAR : int 2011 2012 2013 2014 2015 2016 2017
## $ Mean_Likes : int 4 5 13 25 30 30 9
## $ Median_Likes : int 4 4 8 19 19 9 4
## $ Mean_Comments : int 1 1 4 5 4 2 0
## $ Median_Comments: int 0 0 1 3 2 0 0
## $ Likes : int 62 506 2856 3162 1497 4534 3584
## $ Comments : int 12 132 812 661 186 265 148
## $ Posts : int 14 95 217 125 50 149 408
## $ TYPE_VIDEO : int 0 0 1 1 6 47 320
## $ TYPE_LINK : int 0 16 28 23 6 35 26
## $ TYPE_PHOTO : int 5 47 64 55 20 58 49
## $ TYPE_STATUS : int 9 32 124 45 18 9 13
Important Insights from the data
There have been many insights from the Facebook data. The important ones are present below:
Distribution of POSTS across WORDS_LEVEL
POSTS are divided in 4 categories based on #WORDS:
- Vlow -> 0 - 25
- Low -> 25 - 75
- Med -> 75 -100
- High ->100 -Max
##
## Vlow Low Med High
## 757 113 43 162
Most of the Posts fall in the Vlow category, followed by High category.

- Every year, Words_Level Vlow has the highest number of posts!
- For 2017, almost all Posts have Vlow Words_Level.
Distribution of status TYPE across YEAR

- For Year 2012-> Photo, 2013-> Status, 2014-> Photo, 2015 -> Photo, 2016 -> Photo 2017 -> Video.
- In 2015, 2016, 2017 posts were shared across all the 4 categories.
- From 2011-2014, the Photo and status TYPE gave a tough competition.
Distribution of POSTS across the STATUS_TYPE faceted around YEAR

Importantly, there has been a gradual increase in the Video Status being shared with the years:
- The VIDEO status being shared began in 2013.
- It increased steadily with zenith currently in 2017!
- There has been 500% increase in VIDEO Posts in 2017 since 2016!
Distribution of HOUR at which POST is updated

It is clear that there is a BiModal Distribution of Posts with Hour of the day:
- There are posts for every hour of the day.
- Steep rise in the No of Posts at 16,17 and 18 Hours (100+ at each hour)
- First Peak also appears at 5am due to different time zones.( Now I am in MST as compared to IST earlier.)
Distribution of SENTIMENT with TOTAL_WORDS

- Majority of the POSTS have POSITIVE Sentiment(0 to 1).
- Posts with NEGATIVE Sentiment have Low COMMENTS.
- Approx. all the Posts with High Words_Level are Extremely Positive!

Note: Although Posts with Words_Level: High are extremely positive but with the increase in TOTAL_WORDS there is a decline in Positivity.
Bringing together Multiple variables

- YEAR 2013, 2014 and 2015 were dominant by posts of type: STATUS.
- The #Likes for posts with more TOTAL_WORDS has increased.
- The #Posts are very low in 2015.
- Talking about Mean LIKES (represented by Black Line), they have almost been uniform across the months except for year 2015 and 2016!
- There is a steep dip in May 2016, which was the time when I quit my job to go for further studies.
- The #LIKES in Year 2017 are Low, comparable to initial Years of my Facebook Journey!
Other significant trends
The trends below provide some more insights into the dataset and have been useful enough to reach at the conclusions and insights above.
Distribution of Posts with YEAR

The #Posts increase gradually in the beginning followed by a similar decline in year 2014 and 2015. It begins to catch up in year 2016 but it is steepest in Year 2017! There is still 4 months in the year 2017!
Distribution of TOTAL_WORDS across YEAR

Except for the Year 2012 and 2017, Total_Words being used in the Posts across all the years have been approximately the same falling somewhere in the category of <25 words Vlow Words_Level.
Distribution of the LIKES as per the WORDS_LEVEL

Except for the Vlow category of words, all the other categories seem to have same number of Median Likes.

- 2012,2013 is the only year where there has been a consistent increase in the median LIKES with increase in number of words.
- 2014 is the year which register the Highest Median Likes for the Low Words_Level posts!
Distribution of Posts across WORDS_LEVEL

Trend in status TYPE across YEAR

The steep increase in the number of Videos being shared in year 2017 is clearly visible.
Distribution of LIKES with TOTAL_WORDS across YEAR

As per the Latest Statistics, Posts of type Photo generate MOST Likes followed by Status.
Facet Wrapping the same with the YEAR

- The range of number of comments is same for all the years as evident by the same Y-scale across all the years.
- 2014 is the only year when there is considerable number of comments spread across all Words_Level!
- The highest value of Median Comments happens in 2014 for the Low category!
Let’s try to find the reason behind the uniform comment trends in year 2014!

For Year 2014, the Posts of Type Status/Photo are the ones dominantly present across Words_Level. Since, these two TYPE generate most Likes/Comments, hence the Comment Trends in 2014.
Distribution of LIKES with HOUR

- Mean LIKES are approx. same throughout the day.
- The peaks occur at 8AM, 16 and 20 hours!
Ratio of UNIQ_WORDS/TOTAL_WORDS
UNIQ_WORDS -> #Words that remain after removing the common words also known as StopWords.

From the Plot above, there is an almost Normal Distribution of the Uniq_Ratio, with the peak occuring at 0.45 approx. It means that no of UNIQ_WORDS for posts were about 45% of the TOTAL_WORDS!
