Abstract

Social media is persvaisive in modern society, Most of us interacts with at least one of them on daily basis, but seldom do we ask :How do social media platforms make money?

Social media makes money from our attention by the means of advertisement. If you ever notice advertisements of any formats on your Facebook feeds, that is where the cash flows in.

Indeed, This is by far the most common way of monetization for a lot of tech companies. But the holy grail of Facebook alike businesses goes far beyond bombarding ads, rather, it is done in a “targeted” fashion in marketing parlance; Meaning they are able to cherry pick the ads that are most likely resonate with you, rather than randomly picking those that are totally irrelevant - And they are able to so because of the huge volume of data it garners as a social platform.

Data mining is what is happening underneath the hood. By analysizing the huge volume of data which users voluntarily “submit”, Facebook is able to understand us - what are our likes and dislikes, what are our political views,.etc; even how we feel. And then facebook buckets us into “audiences” based on these insights, which can be readily sold to advertisers in order to inform better marketing decision-making.

In this paper, We will take advantage of facebook’s data to understand Sentiment. In part I, we will focus on analyzing original post (posts posted by author) by means of sentiment analysis, In part II, we will analyze shared posts(posts shared by author) to understand what are the topics she is most interests in.

Methodology

Due to authoritation limitation, only Auther’s data can be accessed.

We will utilise Facebook Graph API to acquire User (Rina Lin)’s posts and likes data since account establishment;
Data is ingested, parsed and saved through scripting langauge Python in the form of data frame;
Data analysis is accomplished in R.

Data Processing

library(dplyr)
library(tidyr)
library(tidytext)
library(ggplot2)
library(wordcloud)
library(igraph)
library(ggraph)
library(widyr)
data(stop_words)
nrc_sentiments<-get_sentiments("nrc")

TimeParser <- function(Time, YourTz) {
    timeDate <- Time %>% as.POSIXct("%Y-%m-%dT%H:%M:%S", tz = "GMT")
    attr(timeDate, "tzone") <- YourTz
    return(timeDate)
}

posts <- read.csv("posts_fb.csv",stringsAsFactors = FALSE)

posts$local_time <- TimeParser(posts$created_time, "America/Chicago")
posts <- posts %>% mutate(Year = format(local_time, "%Y") %>% as.numeric(), 
    Month = format(local_time, "%m") %>% as.numeric(), Day = format(local_time, 
        "%d") %>% as.numeric(), Hour = format(local_time, "%H") %>% as.numeric())
posts.original <- posts %>% filter(message != "None")

WordCloud on Word Frequency

posts.raw <- posts.original %>% select(Year, Month, Day, message) %>% mutate(message = iconv(message, 
    "latin1", "ASCII", sub = "")) %>% arrange(Year, Month, Day)

posts.raw$message_id <- as.numeric(rownames(posts.raw))

posts.words <- posts.raw %>% unnest_tokens(word, message)

posts.words %>% anti_join(stop_words, by = "word") %>% count(word) %>% with(wordcloud(word, 
    n, max.words = 100))

We tranform word frequency counts into word cloud to make it more visually appealing. The most frequently used words are “life”, “love”,“time”,“hate”, followed by “world”,“chicago”,“friends”. Also noticing a lot of country/city names emerged from the text. (“chicago”,“beijing”,“china”,“holland”,“lowland”,“paris”,and “evanston”).

N-grams: Find Out Misclassified Negations

posts.2gram <- posts.raw %>% unnest_tokens(bigram, message, token = "ngrams", 
    n = 2)

negation <- c("not", "don't", "doesn't", "haven't", "hasn't", "wouldn't", "won't", 
    "can't", "couldn't", "isn't", "aren't", "ain't", "wasn't", "weren't")
negation.misclassified <- posts.2gram %>% separate(bigram, c("word1", "word2"), 
    sep = " ") %>% # filter(!word1 %in% stop_words$word) %>% filter(!word2 %in%
# stop_words$word) %>%
count(message_id, word1, word2, sort = TRUE) %>% filter(word1 %in% negation) %>% 
    inner_join(nrc_sentiments, by = c(word2 = "word"))
negation.misclassified

## # A tibble: 2 x 5
##   message_id   word1      word2     n sentiment
##        <dbl>   <chr>      <chr> <int>     <chr>
## 1         85   can't     resist     1  negative
## 2        135 doesn't facilitate     1  positive

Sentiment Analysis

posts.sentiment <- posts.words %>% inner_join(nrc_sentiments, by = "word") %>% 
    arrange(Year, Month, Day) %>% count(Year, Month, Day, message_id, sentiment) %>% 
    left_join(negation.misclassified, by = c("message_id", "sentiment")) %>% 
    mutate(true.n = ifelse(is.na(n.y), n.x, n.x - 2 * n.y)) %>% select(Year, 
    Month, Day, message_id, sentiment, true.n) %>% group_by(Year, Month, Day, 
    sentiment) %>% summarise(n = sum(true.n)) %>% filter(sentiment == "negative" | 
    sentiment == "positive")

posts.sentiments.month <- posts.sentiment %>% group_by(Year, Month, sentiment) %>% 
    summarise(score = sum(n))
posts.sentiments.month$id <- as.numeric(rownames(posts.sentiments.month))

posts.sentiments.month %>% ggplot(aes(x = id, y = score, fill = sentiment)) + 
    geom_col() + facet_grid(sentiment ~ .)

Conclusion

Negative score spikes at around 30th month since 2011 May, which is 2013 November.Also see small spikes around 60th month and 90th Month - It appears to be that every 30 months I feel low!

Also note from 2011 May to 2012 August my negative score is maintained at an overall high level comparing to the rest of the flight. This stays in accordance with my personal history (As far as I can accurately remember).

Now let’s look at the chart for postive level.My positive level spikes at around 52th month, 70th month and 81 month.

Overall my positiveness score is higher than negativeness score, indicating the subject of our analysis might be a more optimistic person by nature.

Sentiment Analysis

Rina(Xiaoru) Lin

July 27, 2017

Abstract

Methodology

Data Processing

WordCloud on Word Frequency

N-grams: Find Out Misclassified Negations

Sentiment Analysis

Conclusion