The main goal of this blog is to find out who is the happiest people.
Are they female or male?
Are they married?
Do they have children?
Where are they come from?
hm_data <- read_csv("/Users/zhongming/Documents/GitHub/Spring2019-Proj1-silverbulletKID/output/processed_moments.csv")
## Parsed with column specification:
## cols(
## hmid = col_integer(),
## wid = col_integer(),
## reflection_period = col_character(),
## original_hm = col_character(),
## cleaned_hm = col_character(),
## modified = col_character(),
## num_sentence = col_integer(),
## ground_truth_category = col_character(),
## predicted_category = col_character(),
## id = col_integer(),
## text = col_character()
## )
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv'
urlfile2<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/senselabel.csv'
demo_data <- read_csv(urlfile)
## Parsed with column specification:
## cols(
## wid = col_integer(),
## age = col_character(),
## country = col_character(),
## gender = col_character(),
## marital = col_character(),
## parenthood = col_character()
## )
sense_data<-read_csv(urlfile2)
## Parsed with column specification:
## cols(
## hmid = col_integer(),
## tokenOffset = col_integer(),
## word = col_character(),
## lowercaseLemma = col_character(),
## POS = col_character(),
## MWE = col_character(),
## offsetParent = col_integer(),
## supersenseLabel = col_character()
## )
hm_data <- hm_data %>%
inner_join(sense_data, by="hmid")
hm_data<-hm_data %>%
inner_join(demo_data, by = "wid") %>%
select(wid,
predicted_category,
num_sentence,
gender,
marital,
parenthood,
reflection_period,
age,
country,
ground_truth_category,
POS,
supersenseLabel,
text) %>%
mutate(count = sapply(hm_data$text, wordcount)) %>%
filter(gender %in% c("m", "f")) %>%
filter(marital %in% c("single", "married")) %>%
filter(parenthood %in% c("n", "y")) %>%
filter(reflection_period %in% c("24h", "3m")) %>%
mutate(reflection_period = fct_recode(reflection_period,
months_3 = "3m", hours_24 = "24h"))
bag_of_words <- hm_data %>%
unnest_tokens(word, text)
library(dplyr)
hm_bigrams <- hm_data %>%
filter(count!=1) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
bigram_counts <- hm_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
count(word1, word2, sort = TRUE)
library(tidyr)
library(tidytext)
sentiment <-bag_of_words %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = country) %>%
summarise(sentiment = mean(score)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(fct_reorder(index,sentiment, .fun = mean,.desc = T),sentiment))+geom_bar(stat = "identity")+xlab("country")+theme(text = element_text(size=5),axis.text.x = element_text(angle = 90))
ASM(American Samoa), HRV(Croatia), NOR(Norway), TCA(Turks and Caicos Islands) are the top 4 happiest countries.
Norway is one of the happiest country in the world. Forbs ranked Norway as the second happiest country in the world in 2018. It confirms the results we have here.
I also visulize the sentiments across country level.
Sentiment across the world
Let’s take a look at the happiest countries.
bag_of_words %>% filter(country=="NOR"|country=="HRV"|country=="ASM") %>%
count(word, sort = TRUE) %>% wordcloud2(size = 0.6,rotateRatio = 0)
People seems to be fans for cooking and they love reciving gifts;have a great connection with natrual envirnoments;love dancing.
sentiment <-bag_of_words %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = age) %>%
summarise(sentiment = mean(score)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")+theme(text = element_text(size=7),axis.text.x = element_text(angle = 90))
Poeple are happier in their 30s and 70s.
sentiment <-bag_of_words %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = gender) %>%
summarise(sentiment = mean(score)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")+theme(text = element_text(size=5),axis.text.x = element_text(angle = 90))
There is no explicit difference.
sentiment <-bag_of_words %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = parenthood) %>%
summarise(sentiment = mean(score)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")
Poeple has no parenthood are happier.
sentiment <-bag_of_words %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index =marital) %>%
summarise(sentiment = mean(score)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")
Single is much happier.
sentiment <-bag_of_words %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index =ground_truth_category) %>%
summarise(sentiment = mean(score)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")+theme(text = element_text(size=10),axis.text.x = element_text(angle = 90))
Leisure time makes people the most happiest. But leisure time is a very general idea. We want to dig deeper inside.
I try to use Topic modeling to see the difference. I choose to use bigram because bigram contains more meaning compared to single word.
library(topicmodels)
d=filter(hm_bigrams,predicted_category=="leisure")[c(1,14,13)]
d=d %>%cast_dtm(wid, bigram, count)
ap_lda <- LDA(d, k = 2, control = list(seed = 1234))
library(tidytext)
ap_topics <- tidy(ap_lda, matrix = "beta")
library(ggplot2)
library(dplyr)
ap_top_terms <- ap_topics %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
arrange(topic, -beta)
ap_top_terms %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
Now. We know that the best thing in leasuire is video gamse!
In conclusion, those people who are happiest may have the following features:
1.Single
2.No Child
3.Playing Video Games
4.Live in Norway(or other happiest countries)
5.In their 30s or 70s.
I personally fulfill the top 3 requirements. But I am not so happy. :)
Sentiment across the world
Maybe my next plan is to immigrate to Norway.
Sentiment across the world