Facebook Data Analysis by Abhinav Jain

This report is based on the analysis done on my Facebook data extracted through Facebook Graph API and then cleaned and wrangled to create appropriate datasets using Python and R.

##       YEAR        Mean_Likes     Median_Likes    Mean_Comments  
##  Min.   :2011   Min.   : 4.00   Min.   : 4.000   Min.   :0.000  
##  1st Qu.:2012   1st Qu.: 7.00   1st Qu.: 4.000   1st Qu.:1.000  
##  Median :2014   Median :13.00   Median : 8.000   Median :2.000  
##  Mean   :2014   Mean   :16.57   Mean   : 9.571   Mean   :2.429  
##  3rd Qu.:2016   3rd Qu.:27.50   3rd Qu.:14.000   3rd Qu.:4.000  
##  Max.   :2017   Max.   :30.00   Max.   :19.000   Max.   :5.000  
##  Median_Comments      Likes         Comments         Posts      
##  Min.   :0.0000   Min.   :  62   Min.   : 12.0   Min.   : 14.0  
##  1st Qu.:0.0000   1st Qu.:1002   1st Qu.:140.0   1st Qu.: 72.5  
##  Median :0.0000   Median :2856   Median :186.0   Median :125.0  
##  Mean   :0.8571   Mean   :2314   Mean   :316.6   Mean   :151.1  
##  3rd Qu.:1.5000   3rd Qu.:3373   3rd Qu.:463.0   3rd Qu.:183.0  
##  Max.   :3.0000   Max.   :4534   Max.   :812.0   Max.   :408.0  
##    TYPE_VIDEO       TYPE_LINK       TYPE_PHOTO     TYPE_STATUS    
##  Min.   :  0.00   Min.   : 0.00   Min.   : 5.00   Min.   :  9.00  
##  1st Qu.:  0.50   1st Qu.:11.00   1st Qu.:33.50   1st Qu.: 11.00  
##  Median :  1.00   Median :23.00   Median :49.00   Median : 18.00  
##  Mean   : 53.57   Mean   :19.14   Mean   :42.57   Mean   : 35.71  
##  3rd Qu.: 26.50   3rd Qu.:27.00   3rd Qu.:56.50   3rd Qu.: 38.50  
##  Max.   :320.00   Max.   :35.00   Max.   :64.00   Max.   :124.00
## 'data.frame':    7 obs. of  12 variables:
##  $ YEAR           : int  2011 2012 2013 2014 2015 2016 2017
##  $ Mean_Likes     : int  4 5 13 25 30 30 9
##  $ Median_Likes   : int  4 4 8 19 19 9 4
##  $ Mean_Comments  : int  1 1 4 5 4 2 0
##  $ Median_Comments: int  0 0 1 3 2 0 0
##  $ Likes          : int  62 506 2856 3162 1497 4534 3584
##  $ Comments       : int  12 132 812 661 186 265 148
##  $ Posts          : int  14 95 217 125 50 149 408
##  $ TYPE_VIDEO     : int  0 0 1 1 6 47 320
##  $ TYPE_LINK      : int  0 16 28 23 6 35 26
##  $ TYPE_PHOTO     : int  5 47 64 55 20 58 49
##  $ TYPE_STATUS    : int  9 32 124 45 18 9 13

Important Insights from the data

There have been many insights from the Facebook data. The important ones are present below:

Distribution of LIKES & COMMENTS with YEAR

  • The #Likes are much more than #Comments for any particular year.
  • Infact, Likes are approx. 9 times that of Comments, in the year 2016!
## 'data.frame':    1075 obs. of  18 variables:
##  $ ID         : Factor w/ 1073 levels "1405174859572227_1000278100061907",..: 775 775 572 586 585 583 580 579 578 576 ...
##  $ DAY        : int  31 31 30 29 24 8 25 25 25 19 ...
##  $ MONTH      : int  12 12 12 12 12 12 10 10 10 10 ...
##  $ YEAR       : int  2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ DATE       : Factor w/ 577 levels "1/1/13","1/1/15",..: 171 171 169 164 158 179 85 85 85 74 ...
##  $ HOUR       : int  20 20 5 3 9 7 8 8 8 9 ...
##  $ MIN        : int  0 0 29 54 33 59 40 36 7 39 ...
##  $ SEC        : int  0 0 49 29 4 4 52 27 44 43 ...
##  $ TIME       : Factor w/ 1060 levels "00:00:59","00:02:16",..: 925 925 166 88 322 279 305 303 285 324 ...
##  $ TYPE       : Factor w/ 5 levels "link","note",..: 1 1 3 3 4 3 4 3 4 4 ...
##  $ LIKES      : int  0 0 5 1 5 0 0 3 6 3 ...
##  $ COMMENTS   : int  0 0 0 0 1 0 0 0 2 3 ...
##  $ TOTAL_WORDS: int  0 0 6 0 218 0 0 0 46 46 ...
##  $ POS        : num  0 0 0.667 0 0.045 0 0 0 0.355 0.213 ...
##  $ NEG        : num  0 0 0 0 0.138 0 0 0 0.041 0 ...
##  $ NEU        : num  0 0 0.333 0 0.817 0 0 0 0.604 0.787 ...
##  $ COMP       : num  0 0 0.612 0 -0.964 0 0 0 0.96 0.852 ...
##  $ UNIQ_WORDS : int  0 0 0 0 94 0 0 0 22 24 ...
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   2.085   2.000  70.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    3.00    6.00   15.38   15.00  233.00

Variation in LIKES & COMMENTS with #WORDS

  • With Increase in WORDS, Likes & Comments decrease.
  • Less Comments in Year 2017.
  • Highest Likes for Posts from 2016 & 2017.
  • High #Comments have been for Posts from 2013, 2014 and 2015.

Distribution of POSTS across WORDS_LEVEL

POSTS are divided in 4 categories based on #WORDS:

  • Vlow -> 0 - 25
  • Low -> 25 - 75
  • Med -> 75 -100
  • High ->100 -Max
## 
## Vlow  Low  Med High 
##  757  113   43  162

Most of the Posts fall in the Vlow category, followed by High category.

  • Every year, Words_Level Vlow has the highest number of posts!
  • For 2017, almost all Posts have Vlow Words_Level.

Distribution of status TYPE across YEAR

  • For Year 2012-> Photo, 2013-> Status, 2014-> Photo, 2015 -> Photo, 2016 -> Photo 2017 -> Video.
  • In 2015, 2016, 2017 posts were shared across all the 4 categories.
  • From 2011-2014, the Photo and status TYPE gave a tough competition.

Distribution of POSTS across the STATUS_TYPE faceted around YEAR

Importantly, there has been a gradual increase in the Video Status being shared with the years:

  • The VIDEO status being shared began in 2013.
  • It increased steadily with zenith currently in 2017!
  • There has been 500% increase in VIDEO Posts in 2017 since 2016!

Distribution of LIKES & COMMENTS with status TYPE

  • The Photo Posts generate MOST Likes and MORE Likes.
  • The Status Posts receive consistent #Likes across TOTAL_WORDS.
  • They are followed by Status, then Link and Video.

Distribution of HOUR at which POST is updated

It is clear that there is a BiModal Distribution of Posts with Hour of the day:

  • There are posts for every hour of the day.
  • Steep rise in the No of Posts at 16,17 and 18 Hours (100+ at each hour)
  • First Peak also appears at 5am due to different time zones.( Now I am in MST as compared to IST earlier.)

Distribution of LIKES & COMMENTS with HOUR

  • More #Comments are viewed for posts at hour 16, 17 and 18.
  • These are also the HOURS at which Highest #POSTS are updated.
  • Also, HIGH Comments are viewed at HOURS 7-11.

Distribution of SENTIMENT with TOTAL_WORDS

  • Majority of the POSTS have POSITIVE Sentiment(0 to 1).
  • Posts with NEGATIVE Sentiment have Low COMMENTS.
  • Approx. all the Posts with High Words_Level are Extremely Positive!

Note: Although Posts with Words_Level: High are extremely positive but with the increase in TOTAL_WORDS there is a decline in Positivity.

Bringing together Multiple variables

  • YEAR 2013, 2014 and 2015 were dominant by posts of type: STATUS.
  • The #Likes for posts with more TOTAL_WORDS has increased.
  • The #Posts are very low in 2015.
  • Talking about Mean LIKES (represented by Black Line), they have almost been uniform across the months except for year 2015 and 2016!
  • There is a steep dip in May 2016, which was the time when I quit my job to go for further studies.
  • The #LIKES in Year 2017 are Low, comparable to initial Years of my Facebook Journey!

Distribution of LIKES with TOTAL_WORDS across YEAR

As per the Latest Statistics, Posts of type Photo generate MOST Likes followed by Status.

Distribution of COMMENTS across the WORDS_LEVEL

The median Comments are highest for High Words_Level which is similar to the distribution of Likes.

Facet Wrapping the same with the YEAR

  • The range of number of comments is same for all the years as evident by the same Y-scale across all the years.
  • 2014 is the only year when there is considerable number of comments spread across all Words_Level!
  • The highest value of Median Comments happens in 2014 for the Low category!

Let’s try to find the reason behind the uniform comment trends in year 2014!

For Year 2014, the Posts of Type Status/Photo are the ones dominantly present across Words_Level. Since, these two TYPE generate most Likes/Comments, hence the Comment Trends in 2014.

Variation of COMMENTS with status TYPE

  • It is visible that More Comments are mostly attributed to posts of TYPE:STATUS.
  • The POSTS of type LINK, VIDEO didn’t get many Comments!

  • It shows that Year 2013,2014 and 2015 were all about STATUS varying in length from too Few words to too Many!
  • For 2016 and 2017, PHOTOS to have received majority of comments!
  • The above is true even though, the Number of Videos were significantly higher than photos in 2017 and comparable in 2016!

Distribution of LIKES with HOUR

  • Mean LIKES are approx. same throughout the day.
  • The peaks occur at 8AM, 16 and 20 hours!

Distribution of COMMENTS with HOUR.

  • Similar to LIKES,Mean COMMENTS are same throughout the day.
  • Peak occurs at 9 and 16 HOURS.

Ratio of UNIQ_WORDS/TOTAL_WORDS

UNIQ_WORDS -> #Words that remain after removing the common words also known as StopWords.

From the Plot above, there is an almost Normal Distribution of the Uniq_Ratio, with the peak occuring at 0.45 approx. It means that no of UNIQ_WORDS for posts were about 45% of the TOTAL_WORDS!