Happiness is a life-long pursuit. Everyone experiences some degree of short term “happy moments” in their lives and continue to seek happiness. Everyone has different ways of nurturing, feeding and growing their happiness, but how do you quantify that growth? Is there a general equation or rule for happiness that applies to everyone?
Some people have tried to find out.
Mo Gawdat, chief business officer of Google X and author of “Solve for Happy” came up with a Happiness equation that states that: “your happiness is equal to or greater than the difference between the events of your life and your expectations of how life should behave”.
Happiness Equation.
Simply looking at the mathematics of this equation, you can determine that one can easily be happy through life if they have low expectations.
According to Mo, all you need to be happy is to have low expectations, but now the question becomes: how do you determine and quantify expectations? He tries to answer these questions in “Solve For Happy” in different abstract ways, but it seems like this problem needs more of a concrete approach, so I looked elsewhere.
I decided to look into a more data-driven attempt at solving for happiness:
HappyDB is a corpus of 100,000 crowd-sourced happy moments. The goal of the project is to gain a deeper understanding of the causes of happiness that can be gathered from text.
It is simply a collection of happy moments described by individuals in the past 24 hours, as seen in an example below:
In this project, I will use the HappyDB corpus dataset to get a more meaningful understanding of the moments that grow happiness.
A logical first step is to find out what people are talking about in their happy moments.
Let’s take a look at the HappyDB moments data!
After cleaning the happy moments data, combining it with demographic data and filtering out stop words that don’t hold any relevant information, I created a word cloud that displays the 200 most mentioned words in all the happy moments
bag_of_words <- new_data %>%
unnest_tokens(word, text)
word_count <- bag_of_words %>%
count(word, sort = TRUE)
library(wordcloud)
set.seed(1234)
wordcloud(words = word_count$word, freq = word_count$n, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
FRIEND
DAY
TIME
FAMILY
HOME
These are just some of the most mentioned words in people’s happy moments.
This suggests that more than everything else, people find happiness in: - each other - their time - their families
Based on their crowd-sourced moments, scientists who collaborated on the HappyDB project have separated happiness into potential categories:
Achievement
Affection
Bonding
Enjoying Moments
Exercise
Leisure
Nature
What do we think the most popular category is?
Based on the word cloud and simple intuition, I would guess that affection and bonding would be the most popular categories! Let’s find out!
categ_count = data.frame(count(new_data, predicted_category))
ggplot(categ_count, aes(x = predicted_category, y = n)) +
geom_bar(fill = "maroon", stat = "identity", color = "black") +
geom_text(aes(label = n), vjust = -0.3) +
xlab("Predicted Categories") +
ylab("Count") +
ggtitle ("Count of categories for all happy moments")
Just like I predicted, Affection is the most popular category, but it is almost equivalent to Achievement and even though my second guess bonding is mentioned about 3 times as little, it still stands at third place.
Even though it wasn’t particularly illustrated in the word cloud, this graphic indicates that people’s happiness is primarily caused by both giving (affection) and accomplishing (achievement)
It would be interesting to see if that were the truth for everyone?
The HappyDB project includes demographic data on people’s age, let’s find out what makes different age groups happy!
Before we can categorize ages, let’s look at how old everyone who contributed to the happydb project is!
missing.age <- is.na(new_data$age)
new_data <- new_data[!missing.age,] #removes all missing ages
#AGE HISTORGRAM
hist(as.numeric(new_data$age), breaks = 500, xlim =c(0,100), ylim =c(0,7000), xlab = "age", ylab = "Count", main = "How old is everyone?", fill = "Pink", col = "turquoise")
Hmm..looks like most people are in their 30s. To confirm that thought, I calculated the mean and found it to be 31.8.
The International Standardized Survey Classifications group classifies age groups as:
** 15-24: Teens/Mid Twenties**
** 25-34: Young Adults**
** 35-64: Adults**
** 65 + : Seniors**
which is how I decided to separate age groups for this data analysis project.
NULL
NULL
NULL
NULL
LETS DO SOME MONEY ANALYSIS