Who is the happiest people?

The main goal of this blog is to find out who is the happiest people.

Are they female or male?

Are they married?

Do they have children?

Where are they come from?

hm_data <- read_csv("/Users/zhongming/Documents/GitHub/Spring2019-Proj1-silverbulletKID/output/processed_moments.csv")
## Parsed with column specification:
## cols(
##   hmid = col_integer(),
##   wid = col_integer(),
##   reflection_period = col_character(),
##   original_hm = col_character(),
##   cleaned_hm = col_character(),
##   modified = col_character(),
##   num_sentence = col_integer(),
##   ground_truth_category = col_character(),
##   predicted_category = col_character(),
##   id = col_integer(),
##   text = col_character()
## )
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv'

urlfile2<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/senselabel.csv'

demo_data <- read_csv(urlfile)
## Parsed with column specification:
## cols(
##   wid = col_integer(),
##   age = col_character(),
##   country = col_character(),
##   gender = col_character(),
##   marital = col_character(),
##   parenthood = col_character()
## )
sense_data<-read_csv(urlfile2)
## Parsed with column specification:
## cols(
##   hmid = col_integer(),
##   tokenOffset = col_integer(),
##   word = col_character(),
##   lowercaseLemma = col_character(),
##   POS = col_character(),
##   MWE = col_character(),
##   offsetParent = col_integer(),
##   supersenseLabel = col_character()
## )
hm_data <- hm_data %>%
  inner_join(sense_data, by="hmid") 

hm_data<-hm_data %>%
  inner_join(demo_data, by = "wid") %>%
  select(wid,
         predicted_category,
         num_sentence,
         gender, 
         marital, 
         parenthood,
         reflection_period,
         age, 
         country, 
         ground_truth_category,
         POS,
         supersenseLabel,
         text) %>%
  mutate(count = sapply(hm_data$text, wordcount)) %>%
  filter(gender %in% c("m", "f")) %>%
  filter(marital %in% c("single", "married")) %>%
  filter(parenthood %in% c("n", "y")) %>%
  filter(reflection_period %in% c("24h", "3m")) %>%
  mutate(reflection_period = fct_recode(reflection_period, 
                                        months_3 = "3m", hours_24 = "24h"))

bag_of_words <-  hm_data %>%
  unnest_tokens(word, text)

library(dplyr)
hm_bigrams <- hm_data %>%
  filter(count!=1) %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)

bigram_counts <- hm_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  count(word1, word2, sort = TRUE)

Sentiment Analysis among country level

library(tidyr)
library(tidytext)
sentiment <-bag_of_words %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = country) %>% 
  summarise(sentiment = mean(score)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(fct_reorder(index,sentiment, .fun = mean,.desc = T),sentiment))+geom_bar(stat = "identity")+xlab("country")+theme(text = element_text(size=5),axis.text.x = element_text(angle = 90))

ASM(American Samoa), HRV(Croatia), NOR(Norway), TCA(Turks and Caicos Islands) are the top 4 happiest countries.

Norway is one of the happiest country in the world. Forbs ranked Norway as the second happiest country in the world in 2018. It confirms the results we have here.

I also visulize the sentiments across country level.

Sentiment across the world

Sentiment across the world

Let’s take a look at the happiest countries.

What do people from the most happiest country do?

bag_of_words %>% filter(country=="NOR"|country=="HRV"|country=="ASM") %>%
count(word, sort = TRUE) %>% wordcloud2(size = 0.6,rotateRatio = 0)

People seems to be fans for cooking and they love reciving gifts;have a great connection with natrual envirnoments;love dancing.

Is there any difference among ages?

sentiment <-bag_of_words %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = age) %>% 
  summarise(sentiment = mean(score)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")+theme(text = element_text(size=7),axis.text.x = element_text(angle = 90))

Poeple are happier in their 30s and 70s.

Who is happier? Male of Feamle

sentiment <-bag_of_words %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = gender) %>% 
  summarise(sentiment = mean(score)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")+theme(text = element_text(size=5),axis.text.x = element_text(angle = 90))

There is no explicit difference.

Who is happier? Having children or not?

sentiment <-bag_of_words %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = parenthood) %>% 
  summarise(sentiment = mean(score)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")

Poeple has no parenthood are happier.

Who is happier? Married or single?

sentiment <-bag_of_words %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index =marital) %>% 
  summarise(sentiment = mean(score)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")

Single is much happier.

What makes people happy? Which Topics?

sentiment <-bag_of_words %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index =ground_truth_category) %>% 
  summarise(sentiment = mean(score)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
ggplot(data=sentiment,aes(index,sentiment))+geom_bar(stat = "identity")+theme(text = element_text(size=10),axis.text.x = element_text(angle = 90))

Leisure time makes people the most happiest. But leisure time is a very general idea. We want to dig deeper inside.

What makes people happy in leisure time?

I try to use Topic modeling to see the difference. I choose to use bigram because bigram contains more meaning compared to single word.

library(topicmodels)
d=filter(hm_bigrams,predicted_category=="leisure")[c(1,14,13)]
d=d %>%cast_dtm(wid, bigram, count)
ap_lda <- LDA(d, k = 2, control = list(seed = 1234))
library(tidytext)
ap_topics <- tidy(ap_lda, matrix = "beta")
library(ggplot2)
library(dplyr)

ap_top_terms <- ap_topics %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  ungroup() %>%
  arrange(topic, -beta)

ap_top_terms %>%
  mutate(term = reorder(term, beta)) %>%
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

Now. We know that the best thing in leasuire is video gamse!

Conclusion

In conclusion, those people who are happiest may have the following features:

1.Single

2.No Child

3.Playing Video Games

4.Live in Norway(or other happiest countries)

5.In their 30s or 70s.

I personally fulfill the top 3 requirements. But I am not so happy. :)

Sentiment across the world

Sentiment across the world

Maybe my next plan is to immigrate to Norway.

Sentiment across the world

Sentiment across the world