Background

Happy People

Are you happy?

What seems like a simple response, yes or no, is quite tricky to answer for many people. Why does a seemingly easy question with no wrong answer feel difficult to answer? To understand why that is, this project serves an in-depth look into the science of happiness and a simple guide on how to understand happiness better.

  • All links are SFW, and only attempt to enhance the level of understanding of any hard to define subject areas.

Jump to Executive Summary

What is happiness?

Happiness is like to a costume, in the sense that it has many forms, and by extension has many different definitions depending on who you ask.

According to the Oxford dictionary, Happiness is the “Feeling or showing pleasure or contentment.”

In the words of positive psychology researcher, Sonja Lyubomirsky, she describes that happiness is “the experience of joy, contentment, or positive well-being, combined with the sense that one’s life is good, meaningful and worthwhile.”

The list goes on and often riddled in debate on what truly defines happiness however when philosophers and psychologist define happiness, it takes the form of two types of meanings, one refers to a state of mind and the second is based on well-being.

A state of mind describes a psychological lens of happiness in which examines mental states in which life satisfaction, pleasure or positive emotional states are answers to what drives happiness.

Well-being describes the benefits of what makes a person happy, in which examines in a more as a philosophical lens where these benefits are seen as actionable characteristics that benefits a person.

Some questions of what well-being asks:

  • what is good for you?
  • what is better for you?
  • what is serves your interests?
  • what is more desirable for you?

What do these philosophers say about happiness?

1) Aristotle

Aristotle is a man of many titles, an early Greek philosopher and polymath, described to be a foundational influencer of modern science and philosophy. Aristotle believes that the central purpose of life is to achieve happiness as he describes happiness to be “dependent on ourselves.” In his book, Nicomachean Ethics, he wonders what the purpose of human existence is; within this thought, he surveys his environment, which he sees that everyone in life is seeking a pursuit in pleasure, wealth, or good reputation.

He believes that happiness is only found as “an ultimate end,” and an act towards the pursuit of happiness must be “self-sufficient and final,” and in a grander sense must be “attainable by man” in which many of these values cannot be fully attained. Can one truly obtain all the wealth, can one obtain all pleasure, can one obtain impeccable good reputation? In this approach, he shows his concern with the state of mind definition of happiness as these are considered to be momentary and not a final act.

He defines happiness as:

…the function of man is to live a certain kind of life, and this activity implies a rational principle. The function of a good man is the good and noble performance of these, and if any action is well performed, it is performed in accordance with the appropriate excellence: if this is the case, then happiness turns out to be an activity of the soul in accordance with virtue. (Nicomachean Ethics, 1098a13)

He believes in achieving happiness, one must have good moral character, and to have good moral character, one has to abide “complete virtues” which are good character traits, or behaviors showing high moral standards.

Aristotle believes that short-term pleasures that make someone happy can sometimes cause long-term pain. One example is taking drugs, in which, if someone takes drugs to make them happier now, and once the drugs wear off, the person might feel miserable afterward, which introduces a cycle of addiction. That in order to feel happy again, one must take drugs to constantly replenish the temporary feeling of happiness.

Also, he believes what separates an animal from a human is the ability of rational thought or reason and that happiness is the product of one’s reason.

To conclude Aristotle’s beliefs, he notes a few parameters on achieving happiness, in which, he believes that happiness is an ultimate end and is the purpose of human existence, to achieve happiness one must exercise virtue throughout life, and happiness depends on one’s reason.

2) Epicurus

Epicurus is a Greek philosopher and the founder of the school thought, Epicureanism, which believes that pleasure is the goal in life. In his thoughts on happiness, Epicurus agrees with Aristotle that the goal in life is to achieve happiness; however, Epicurus differs in approach in how this is done.

Hedonism is the pursuit of pleasure, and Epicurus believes that engaging in hedonistic activities is normal and natural, like consuming alcohol. However, over-indulgence in these activities can lead to the overall amount of pleasure to decrease and, by extension, can cause consequences and pain; for instance, over-indulgence in alcohol can lead to alcoholism, the dependence of alcohol.

To maximize long-term pleasure and tranquility (peace of mind), Epicurus submits three key obstacles that prevent us from achieving happiness: the tyranny of desire, the fear of the gods, and the fear of death.

The Tyranny Of Desire

In this thought, Epicurus believes that pleasure is nothing more than “desire-satisfaction,” In contrast, pain is “desire-frustration,” wherein this part, Epicurus denoted that there are two strategies in achieving pleasure either fulfill the desire or eliminate the desire.

He notes that some pleasures are natural and necessary like eating, having shelter and friendship and as a result are impossible to eliminate, but are easily attainable and provide long-term satisfaction

In contrast, “vain and empty” desires include power, wealth, fame, accumulation of material possessions, which is difficult and hard to maintain. Similar to Aristotle, he questions the achievability of these desires; if I have wealth, I can always obtain more wealth, or if I have 10 million twitter followers, then I can still get more followers. Is there a point where you can truly have enough?

In these unattainable desires, there is a point after one achieves something significant, like making a million dollars. Still, after the positive surge, it reverts to a baseline state of happiness known as the hedonic treadmill. Modern research shows the validity of hedonic treadmill, which states people repeatedly return to their baseline level of happiness, regardless of what happens to them.

Hedonic Treadmill

This treadmill prevents people from achieving long term gains in happiness once they have hit their goals. For example, achieving a million dollars is equally strong as achieving two million dollars. He would suggest the better way is to reduce unattainable desires or rather “Do not spoil what you have by desiring what you have not; remember that what you now have was once among the things you only hoped for.”

Epicurus further states that any limitless desires contribute to very little to tranquility and often inflicts more pain than pleasure. The first path to pleasure to eliminate “vain and empty” desires which are a far better long-term solution to replace the pursuit of limitless desires with natural and necessary desires.

Overcoming the Fear of the Gods’

Epicurus, if he believed in the gods, he would say that he did not believe that the gods were interested or even concerned with human affairs.

In Greek culture, Gods were claimed to be perfect, which he believed that it was inconsistent with their concerns for the administration of human affairs in the imperfect and frustrating world. He thought of that Gods as “psychological projections of peoples’ ideal selves,” which simply put that to be a god was in control of one’s happiness.

Epicurus was an early believer of atomism which refers to the belief that the universe (by extension reality) is composed of small, indivisible and indescribable building blocks known as atoms.

Epicurus claimed that since bodies are the result of random motion and combinations of these inseparable atoms, augured against infinite divisibility of matter because then there would be nothing to would stop bodies from being infinitely divided. Since we know that bodies exist, there would be a limit to this division, which in contrast to the belief of Gods creating the universe.

With this thought, the realization that gods either do not exist or care about human affairs has a strong ethical element for him. For many, religion is a form of consolation from reality. Which he viewed the opposite, he saw religion as a “source of anxiety and fear” that should be extinguished through a better understanding of the natural world.

He believes that if Gods existed, then he would want to question their intentions, and as a result he constructed a thought process to determine if Gods were who we say they are, and that known as the problem of evil.

Problem of Evil

Within the problem of evil, Epicurus shows the inconsistent nature of the Gods in the presence of evil as we know evil causes pain, and pain is something that we do not want, and questions if the Gods are who we say they are.

Overcoming the Fear of Death

Death evokes intense fear and anxiety more than anything else.

Epicurus provides two reasons why we should not fear death, the first he believes death is something we do not experience, and the second is known as the “symmetry argument.”

He believes that since death is not an experience, we as humans face, it cannot be painful since the mind is annihilated upon death and is incapable of experiencing pleasure or pain. He states that death is not a part of life since the living has not died, and death does not affect the dead since they are dead.

The symmetry argument argues that death is something we have experienced before, which is similar to the state we experienced before being born. If you have any reason to believe that the time before you were born was not painful, then you would have no reason to believe death should be unpleasant.

The philosophy of Epicurus, which prioritized the attainment of lasting tranquility, could not be complete until one confronts the king of all fears.

Like all theories, there is criticism and one problem for Epicureanism, hinges on the same belief as Epicurus and his belief of reality known as metaphysics, in that for him, life ends when the mind does. In addition, he asserts ways to avoid pain rather than seek the pursuit of happiness and suggests by avoiding this pain, then we can obtain than be happy.

The Importance of Happiness

In the views of philosophers Aristotle and Epicurus, happiness is the primary reason for life, and one should live life and try to achieve happiness following their belief system. One takeaway from these philosophers that I believe is worth noting is not what they say precisely instead of the way they say it. In their approach, both have created systems of defined determinations where one shows how to achieve happiness, and the other shows what obstacles to overcome to achieve happiness. While these systems work with their worldviews, it may not coincide with others, and it’s within this consideration, is to have a system that either defines what makes you happy or what makes you unhappy and understand how to overcome them.

But, does being happy truly matter?

According to Christine Carter at UC Berkeley, it does, she explains that on average, happy people are more successful than unhappy people at both work and love and states that happy people tend to be healthier and live longer. Also, she describes people who are very happy have higher incomes, academic achievement, job satisfaction and political participation than the happiest people. She argues that because of the difference in happiness, which is shown via a sense of contentment with status quo, or inclination to improve things, are more motivated towards action and as a result have higher success.

Executive Summary

The research questions, listed down below, are questions that help to better understand the effects of happiness on people. Given that are more characteristics and variables associated with what makes a person happy and that happiness is more often a subjective measure rather than a objective one. The goal of the research questions will be to show via a statistic lens which characteristic best represents what causes a person to be happy and the effects of being happy.

Dataset Information

The dataset, listed down below, is a part of a survey conducted by NORC at the University of Chicago apart of the survey called the General Social Survey, which gathers data on the contemporary American Society in the attempt to showcase the current state of the nation. The survey contains wide ranging topics including demographics, behavioral, attitudinal questions and additional special interest questions. In this project, a dataset has been created via the GSS data explorer to examine what drives happiness and data the helps explain it.

For the survey, the GSS design is a repeated cross-sectional survey of a nationally representative sample of non-institutionalized adults that speak English or Spanish. The survey is conducted in two samples, A and B, in which is conducted via a face-to-face interview format and sometimes over the phone if not able presently available to meet in person. Each person is then asked a repeated set of questions that examines socio-demographic background data, socio-political attitudes and behaviors and then asked additional questions where each respondent has a random 2/3 chance of answer a question in from a selected poll of questions. (either ballots A,B,C)

With any dataset, bias is possible and can influence the information that is being examined. However, due to the rigorous design of GSS, that is relied upon by industry professionals, scholars and law-makers, the data provided is one of the best presentations of the nature, feelings and thoughts of the American society at the time. In regards to the generalizability of this dataset, it should be noted that the design process is designed to reduce bias as much as possible to best represent the American society at the time, and the dataset is able to extent to a boarder population e.g. the entire American population.

To understand further about the bias and generalizability of the GSS process, check out their codebook here.

Casual Inference

In addition, for the selected GSS data below, the dataset contains observational data that is not apart of an experimental process or design and is only to determine if there is an association between any variable. The dataset represents the nature of Americans in 2018, where respondents were asked a series of questions including a measure of their happiness, satisfaction of financial , and satisfaction with social events.

nrow(Gss1)
## [1] 5215

In this dataset, there are 5215 observations, it should be noted that is only in 2016 and 2018, is enough to be able to represent the population of America given that this survey has an assumed 2% margin of error and taken on a 95% confidence level.

In my initial hypothesis, before conducting any statistical research, I believe that happier people are more successful than unhappier people as to measure one’s success is to view variables including financial situation, education and degree as successful people achieve more and as a result are happier. Another belief, is that people that self-employed are relatively happier than those that work for someone else.

Jump to Conclusion

Research Question(s)

  • Research Question 1: Are people with higher incomes more happy than those with lower incomes?
    • Sub Research Question: Are more successful people happier than less successful people?

Income Inequality is occurring at a rapid pace, all across the world, and it appears as this trend is not slowing down any time soon. As income is being swallowed up by wealthier and more affluent individuals, that should mean that common individuals are becoming less and less happy as a result, right? As one component to happiness is well-being (in certain cases), and that includes financial well-being which refers to the level of security and ability to have the freedom of choice to do what one wants.

In contrast, one phrase thrown around in regards to money and happiness is “money cannot buy happiness”, which has different meanings depending on who you ask, which in my opinion is situational where if you had unlimited wealth and trying to buy something that cannot be purchased with money like immorality leads to the person being unsatisfied and unhappy. I believe this is not the case for other situations in regards to general well-being that is understood to be a factor in one personal happiness.

According to Epicurus, pursuing unattainable desires is not a good way to live, however, what if you only pursue just enough until your relatively satisfied.

This question aims to answer the balance between the two points, whether or not one’s happiness pivots up or down as income increases and additionally find out if more successful are happier than less successful people.

  • Research Question 2: Are people that are self-employed happier than those that work for someone else?

Are you in a rat race? The rat race is a fierce competition that involves the pursuit of a goal or job in a endless and repetitive manner. Those that are in the rat race feel as if they must continue this pursuit in order to be happy and as a result, they are subjected to a continuous loop of a day to day lifestyle. And that in order to get out of the rat race, one seeks to find ways to get out of this cycle, and for some, they seek other types of employment such as self-employment.

Working for oneself, can be a major benefit as described by Betsy Mikel of INC, as you are able to gain more control of your environment, being further engaged into your work, and freedom to pursue your ideals. However, it is not for everyone, as the added burden of having to do everything oneself can be taxing.

This question seeks to aim to answer, which employment option results in happier individuals, and find the correlation, if available, of the two factors.

Packages + Data

library(dplyr)
library(knitr)
library(descr)
library(forcats)
#Datasets
Gss0 <- read.csv("Gss0.csv", header = TRUE, sep = ",") #Full dataset without mods
Gss1 <- read.csv("Gss1.csv", header = TRUE, sep = ",") #yrs 16-18, happy categorical, income levels changed <- main dataset
#Dataset Info
names(Gss1)
##  [1] "year"     "happy"    "socrel"   "socommun" "socfrend" "socbar"  
##  [7] "satjob"   "class_"   "rank"     "satfin"   "goodlife" "income16"
## [13] "polviews" "wrkstat"  "wrkslf"   "marital"  "divorce"  "sibs"    
## [19] "childs"   "educ"     "degree"   "sex"      "race"     "region"  
## [25] "satsoc"   "age"      "hrs1"
str(Gss1)
## 'data.frame':    5215 obs. of  27 variables:
##  $ year    : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ happy   : Factor w/ 3 levels "Not too happy",..: 2 2 3 2 3 3 2 3 2 2 ...
##  $ socrel  : Factor w/ 7 levels "Almost daily",..: 1 7 NA 6 NA 7 4 NA 4 NA ...
##  $ socommun: Factor w/ 7 levels "Almost daily",..: 6 6 NA 3 NA 5 6 NA 2 NA ...
##  $ socfrend: Factor w/ 7 levels "Almost daily",..: 7 7 NA 6 NA 3 6 NA 2 NA ...
##  $ socbar  : Factor w/ 7 levels "Almost daily",..: 5 2 NA 3 NA 4 6 NA 2 NA ...
##  $ satjob  : Factor w/ 4 levels "A little dissat",..: 2 4 NA 4 2 4 2 4 2 NA ...
##  $ class_  : Factor w/ 4 levels "Lower class",..: 2 NA 2 2 2 2 2 4 1 2 ...
##  $ rank    : int  1 5 4 3 3 5 5 4 5 4 ...
##  $ satfin  : Factor w/ 3 levels "More or less",..: 3 2 1 3 3 1 2 1 2 1 ...
##  $ goodlife: Factor w/ 6 levels "Agree","Cant choose",..: NA 3 1 NA 4 1 NA 1 NA 5 ...
##  $ income16: Factor w/ 6 levels "$0 to 19999",..: 5 3 4 5 5 4 5 2 4 4 ...
##  $ polviews: Factor w/ 7 levels "Conservative",..: 5 4 1 5 7 7 7 6 NA 1 ...
##  $ wrkstat : Factor w/ 8 levels "Keeping house",..: 7 7 3 8 8 1 7 8 7 3 ...
##  $ wrkslf  : Factor w/ 2 levels "Self-employed",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ marital : Factor w/ 5 levels "Divorced","Married",..: 2 3 2 2 2 2 2 2 2 1 ...
##  $ divorce : Factor w/ 2 levels "No","Yes": 1 NA 1 1 1 1 1 1 2 NA ...
##  $ sibs    : int  2 3 3 3 2 2 2 6 5 1 ...
##  $ childs  : Factor w/ 10 levels "0","1","2","3",..: 4 1 3 5 3 3 3 4 4 5 ...
##  $ educ    : int  16 12 16 12 18 14 14 11 12 14 ...
##  $ degree  : Factor w/ 5 levels "Bachelor","Graduate",..: 1 3 1 3 2 4 3 3 3 4 ...
##  $ sex     : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ race    : Factor w/ 3 levels "Black","Other",..: 3 3 3 3 3 3 3 2 1 3 ...
##  $ region  : Factor w/ 9 levels "E. nor. central",..: 5 5 5 5 5 5 5 3 3 3 ...
##  $ satsoc  : Factor w/ 5 levels "Excellent","Fair",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ age     : Factor w/ 73 levels "18","19","20",..: 30 44 55 26 38 36 33 6 28 54 ...
##  $ hrs1    : Factor w/ 82 levels "1","10","11",..: 46 37 NA 24 45 NA 51 24 73 NA ...

Exploratory Data Analysis

Research Question 1

#Data Manipulation (rename,relevel) for readability (happy)
Gss1 <- Gss1 %>% mutate(tidyhappy = fct_relevel(happy,"Not too happy",
                                                "Pretty happy",
                                                "Very happy"))

Gss1 <- Gss1 %>% mutate(happy = factor(happy, levels = c("Not too happy","Pretty happy","Very happy"),
                         labels = c("Not","Somewhat","Very")))

#Gss1$happy <- factor(Gss1$happy,labels = c("Not","Somewhat","Very"))

summary(Gss1$happy) # GSS happy: Taken all together, how would you say things are these days--would you say that you are very happy, pretty happy, or not too happy?
##      Not Somewhat     Very     NA's 
##      788     2908     1507       12
#Data Manipulation (rename,relevel) for readability (Satfin)
Gss1 <- Gss1 %>% mutate(tidysatfin = fct_relevel(satfin,"Not at all sat",
                                                 "More or less",
                                                 "Satisfied"))

Gss1 <- Gss1 %>% mutate(satfin = factor(satfin, levels = c("Not at all sat","More or less","Satisfied"),
                                        labels = c("Low","Med","High")))

summary(Gss1$satfin) #GSS satfin: We are interested in how people are getting along financially these days. So far as you and your family are concerned, would you say that you are pretty well satisfied with your present financial situation, more or less satisfied, or not satisfied at all?
##  Low  Med High NA's 
## 1339 2302 1553   21
#Data Manipulation (rename) for readability (income16)
Gss1 <- Gss1 %>% mutate(income16 = factor(income16, levels = c("$0 to 19999","$20000 to 39999","$40000 to 59999", "$60000 to 89999", "$90000 or more","Refused"), 
                                          labels = c("<$20K","<$40K","<$60K","<$90K","$90K+",NA)))
summary(Gss1$income16) #GSS income16: In which of these groups did your total family income, from all sources, fall last year -- 2015 -- before taxes, that is.  Total income includes interest or dividends, rent, Social Security, other pensions,  alimony or child support, unemployment compensation, public aid (welfare),  armed forces or veteran's allotment
## <$20K <$40K <$60K <$90K $90K+  <NA>  NA's 
##   894  1007   771   854  1222   242   225

To do this, we will use income16 and satfin, to show a correlation with how much income the respondent’s family receives and the level of satisfaction that are with their given income.

CrossTable(Gss1$satfin,Gss1$income16,prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ============================================================
##                Gss1$income16
## Gss1$satfin    <$20K   <$40K   <$60K   <$90K   $90K+   Total
## ------------------------------------------------------------
## Low             415     339     214     139     116    1223 
##                46.5%   33.7%   27.8%   16.3%    9.5%        
## ------------------------------------------------------------
## Med             342     460     380     452     474    2108 
##                38.3%   45.7%   49.3%   53.0%   38.8%        
## ------------------------------------------------------------
## High            135     207     177     262     632    1413 
##                15.1%   20.6%   23.0%   30.7%   51.7%        
## ------------------------------------------------------------
## Total           892    1006     771     853    1222    4744 
##                18.8%   21.2%   16.3%   18.0%   25.8%        
## ============================================================

In the contingency table above, shows is the relationship between the variables satfin and income16 in which the explanatory variable (cause) is income16 and the response variable (effect) is satfin.

In this table, 46.5% of individuals than earn less than 20K dollars have the highest percentages of low financial satisfaction, which is around the federal poverty line for a 4 person household, whereas within the same column, 15.1% of those that earn less than 20K dollars have reported high financial satisfaction, which shows that income is not the sole factor of financial satisfaction.

49.3% of respondents that make between 40K dollars and less than 60K dollars claim have a medium feeling about their financial satisfaction, which is around the median income in America, shows that this is the income range is not enough to be satisfied and enough to be dissatisfied.

Additionally, 51.7% of individuals that earn 90K dollars or more claim to have high financial satisfaction and 9.5% of individuals in the same column claim to not be financial satisfied, which reaffirms that despite being statistically above average income or greater that there are external factors of financial satisfaction.

However, as this table, it does show that there is a causal relationship between income and financial satisfaction that has income increases, the level one of satisfaction increases with it.

The next part is to determine if those that earn more income are happier.

CrossTable(Gss1$happy,Gss1$income16, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE, format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ===========================================================
##               Gss1$income16
## Gss1$happy    <$20K   <$40K   <$60K   <$90K   $90K+   Total
## -----------------------------------------------------------
## Not            270     201      92      71      74     708 
##               30.3%   20.0%   11.9%    8.3%    6.1%        
## -----------------------------------------------------------
## Somewhat       466     574     477     474     673    2664 
##               52.4%   57.1%   61.9%   55.5%   55.1%        
## -----------------------------------------------------------
## Very           154     231     202     309     474    1370 
##               17.3%   23.0%   26.2%   36.2%   38.8%        
## -----------------------------------------------------------
## Total          890    1006     771     854    1221    4742 
##               18.8%   21.2%   16.3%   18.0%   25.7%        
## ===========================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 372.8225      d.f. = 8      p <2e-16 
## 
##         Minimum expected frequency: 115.1135

Similar to the previous table, there is a pattern that shows a casual relationship between income and their happiness, which as income increases, so does happiness in this sample.

82% of those that make less than 20K, are either not or somewhat happy relative to the point in which the respondents took the survey and inversely, 93.9% of those that make 90K or more are either very or somewhat happy.

This table illustrates a picture of how income effects one’s happiness, that the more income an individual and their family receive the greater amount of happiness each family is reported to be. When people say “Money cannot buy happiness,” it with a certainty they it does not refer to the money in general, as the first table indicates that as more earns a higher income, the level of satisfaction of their income increases. In addition to the second table which shows that as income increases, the levels of respondents that are happy increases, and conversely, the levels of respondents that are not happy decreases.

Inference

How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.

  • Step 1: Formulate a null hypothesis (H0) and the alternative hypothesis (H1):

The null hypothesis (H0) that we are making is that there is no relationship between income and happy. The alt hypothesis (H1) that we are making is that there is a relationship between income and happy.

  • Step 2: Determine the degrees of freedom (df) and set a significance level alpha:

df = (# of rows of the dependent variable - 1) * (# of columns of the independent variable - 1)

The df has already been calculated by the chisq test which gives us df = 8.

To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.

  • Step 3: Find the critical value (cv) in the chi-square distribution table using df and alpha:

To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 15.51 as our critical value.

  • Step 4: Calculate chi-square statistic based on contingency table:

This step is done by using chisq = TRUE in our contingency table.

  • Step 5: Compare chi-square statistic with the critical value:

The chi-square statistic is 372.83 and the critical value from step 3 is 15.51.

Since 372.83 > 15.51, we can know make an inference upon the population.

Since the chi-square is greater than cv, it is highly unlikely (p = 2 * 10^-16) to draw such a sample from a population in which there is not relationship so we must reject the null hypothesis (income & happy are not related), and accept the alternative that there is a relationship between income and happy and to conclude that the independent variable (income) and the dependent variable (happy) are related in the population.

It should be noted that happy and income may have a confounding variable in common, but is not the focus on this project.

  • note p-value assumes that the null hypothesis is true and the test- statistic would take a value as a extreme or more extreme that is actually observed.

  • this is only true if the confidence level is 95% which is assumed in this scenario, in all other instances, this would be an error.

Happiness & Success

Success in of itself is difficult to measure. Success is often times subjective and dependent on what one believes as an achievement to which is difficult to measure on a objective scope. However, academic advancement is a form of success that everyone is able to participate to a certain degree (college is questionable), and as college is a means for someone to increase their potential salaries once they enter the workforce.

To show a relationship, between happiness and academic advancement, we need to figure out if there is truly an association between education year or degree and income.

#variables used in this section
summary(Gss1$educ) # GSS educ: respondent's highest grade of school completed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   12.00   14.00   13.73   16.00   20.00      12
#Data Manipulation (rename,relevel) for readability (degree)
summary(Gss1$degree)
##       Bachelor       Graduate    High school Junior college Lt high school 
##           1001            565           2639            412            590 
##           NA's 
##              8
Gss1 <- Gss1 %>% mutate(tidydegree = fct_relevel(degree,"Lt high school",
                                                        "High school",
                                                        "Junior college", 
                                                        "Bachelor", 
                                                        "Graduate"))

Gss1 <- Gss1 %>% mutate(degree = factor(degree, levels = c("Lt high school",'High school',"Junior college", "Bachelor", "Graduate"), 
                                        labels = c("Ls HS","HS","AA","Bach","Grad")))

summary(Gss1$degree) # GSS degree: respondent's highest degree = Less than High School/ High School / Associate Degree/ Bachelors Degree/ Graduate Degree
## Ls HS    HS    AA  Bach  Grad  NA's 
##   590  2639   412  1001   565     8
summary(Gss1$income16)
## <$20K <$40K <$60K <$90K $90K+  <NA>  NA's 
##   894  1007   771   854  1222   242   225
summary(Gss1$happy)
##      Not Somewhat     Very     NA's 
##      788     2908     1507       12
#regress education years to degree to show relationship
lm(educ ~ degree, data = Gss1) %>% summary()
## 
## Call:
## lm(formula = educ ~ degree, data = Gss1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0017 -0.6705 -0.3836  0.7274  7.3295 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.00170    0.05910  152.30   <2e-16 ***
## degreeHS     3.66876    0.06536   56.13   <2e-16 ***
## degreeAA     5.64150    0.09208   61.27   <2e-16 ***
## degreeBach   7.38192    0.07447   99.13   <2e-16 ***
## degreeGrad   9.27087    0.08443  109.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.433 on 5198 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.7671, Adjusted R-squared:  0.7669 
## F-statistic:  4279 on 4 and 5198 DF,  p-value: < 2.2e-16

Formula for regression : \(\hat{y}= a + bX\)

As the regression model shows that the explanatory variable (x) or education years and the response variable (y) or degree, shows the relationship to each other.

The following regression can be read like this:

SAL-hat = 9 + 3.66 * HS + 5.64 * AA + 7.38 * Bach + 9.27 * Grad

The y-intercept, in this model, is 9 (yrs) which is the expected number of years for someone that did not graduate high school, additionally each coefficient represents one of the other types of degrees.

The regression, itself, has a strong positive relationship, as shown in the multiple r-square of .77, between the two variables that shows the validity of degree and how long it takes to achieve each degree type.

The next step to show the relationship to income and degree to show academic success as it pertains to income.

#regress income16 on degree to show how degree affects income
CrossTable(Gss1$income16,Gss1$degree, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ==============================================================
##                  Gss1$degree
## Gss1$income16    Ls HS      HS      AA    Bach    Grad   Total
## --------------------------------------------------------------
## <$20K             216     530      57      67      22     892 
##                  43.0%   22.1%   14.8%    7.2%    4.1%        
## --------------------------------------------------------------
## <$40K             128     616      86     117      60    1007 
##                  25.5%   25.7%   22.4%   12.6%   11.1%        
## --------------------------------------------------------------
## <$60K              70     448      60     142      50     770 
##                  13.9%   18.7%   15.6%   15.4%    9.2%        
## --------------------------------------------------------------
## <$90K              49     418      91     192     104     854 
##                   9.8%   17.5%   23.7%   20.8%   19.2%        
## --------------------------------------------------------------
## $90K+              39     381      90     407     305    1222 
##                   7.8%   15.9%   23.4%   44.0%   56.4%        
## --------------------------------------------------------------
## Total             502    2393     384     925     541    4745 
##                  10.6%   50.4%    8.1%   19.5%   11.4%        
## ==============================================================

In the contingency table above, shows is the relationship between the variables degree and income in which the explanatory variable (cause) is degree and the response variable (effect) is income.

43.0% of individuals that have less than a high school diploma make less than 20K dollars, and as one achieves a higher degree the percentages of those that make less than 20K dollars decreases.

Conversely, 7.8% of that that have less than a high school diploma make in excess of 90K dollars and and one achieves a high degree the percentages of those that make more than 90K dollars increases.

This table shows that there is a relationship between degree and income that as one receives a higher degree the more likely it is that one has a higher income.

To show a relationship between degree earned and happiness, a regression of degree and happy is conducted.

#regress degree (X) on happy(y)
CrossTable(Gss1$happy,Gss1$degree, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE, format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ===========================================================
##               Gss1$degree
## Gss1$happy    Ls HS      HS      AA    Bach    Grad   Total
## -----------------------------------------------------------
## Not            144     437      57     109      39     786 
##               24.5%   16.6%   13.9%   10.9%    6.9%        
## -----------------------------------------------------------
## Somewhat       259    1525     256     540     325    2905 
##               44.1%   57.9%   62.3%   53.9%   57.5%        
## -----------------------------------------------------------
## Very           184     672      98     352     201    1507 
##               31.3%   25.5%   23.8%   35.2%   35.6%        
## -----------------------------------------------------------
## Total          587    2634     411    1001     565    5198 
##               11.3%   50.7%    7.9%   19.3%   10.9%        
## ===========================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 133.5363      d.f. = 8      p <2e-16 
## 
##         Minimum expected frequency: 62.14813

In the contingency table above, shows is the relationship between the variables degree and happy in which the explanatory variable (cause) is degree and the response variable (effect) is happy.

In one trend of this table, is that those with higher forms of degrees there is a negative relationship with being not happy, where 24.5% of individuals that have less than a high school degree are not happy and 6.9% of individuals that have a graduates degree are not happy as well.

However, in one uprising detail in this table is that 31.3% of individuals with less than a high school degree are reported to be very happy despite previously mentioned that those with less than a high school diploma have the highest percentages of incomes less than 20K dollars, which is similar those that have a bachelors (35.2%) and graduates (35.6%) degree.

Inference

How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.

  • Step 1: Formulate a null hypothesis (H0) and the alternative hypothesis (H1):

The null hypothesis (H0) that we are making is that there is no relationship between degree and happy. The alt hypothesis (H1) that we are making is that there is a relationship between degree and happy.

  • Step 2: Determine the degrees of freedom (df) and set a significance level alpha:

The df has already been calculated by the chisq test which gives us df = 8.

To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.

  • Step 3: Find the critical value (cv) in the chi-square distribution table using df and alpha:

To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 15.51 as our critical value.

  • Step 4: Calculate chi-square statistic based on contingency table:

This step is done by using chisq = TRUE in our contingency table.

  • Step 5: Compare chi-square statistic with the critical value:

The chi-square statistic is 133.53 and the critical value from step 3 is 15.51.

Since 133.53 > 15.51, we can know make an inference upon the population.

Since the chi-square is greater than cv, it is highly unlikely (p = 2 * 10^-16) to draw such a sample from a population in which there is not relationship so we must reject the null hypothesis (degree & happy are not related), and accept the alternative that there is a relationship between degree and happy and to conclude that the independent variable (degree) and the dependent variable (happy) are related in the population.

  • this is only true if the confidence level is 95% which is assumed in this scenario, in all other instances, this would be an error.

Research Question 2

#variables used in this section
summary(Gss1$happy)
##      Not Somewhat     Very     NA's 
##      788     2908     1507       12
summary(Gss1$wrkslf) #GSS wrkslf: (Are/Were) you self employed or (do/did) you work for someone else? https://gssdataexplorer.norc.org/variables/9/vshow
## Self-employed  Someone else          NA's 
##           528          4501           186
#Data Manipulation (rename, relevel) for readability (satjob)
Gss1 <- Gss1 %>% mutate(tidysatjob = fct_relevel(satjob,
                                                 "Very dissatisfied",
                                                 "A little dissat",    
                                                 "Mod. satisfied",
                                                 "Very satisfied"))

Gss1 <- Gss1 %>% mutate(satjob = factor(satjob, 
                                        levels = c("Very dissatisfied","A little dissat","Mod. satisfied",                                                    "Very satisfied"),
                                        labels = c("Very Dissat","Litt Dissat","Mod Sat","Very Sat")))
summary(Gss1$satjob)
## Very Dissat Litt Dissat     Mod Sat    Very Sat        NA's 
##         132         341        1301        1744        1697
#Data Manipulation turn categorical varable to numeric for readability (hrs1)
Gss1$hrs1 <- as.numeric(as.character(Gss1$hrs1))
summary(Gss1$hrs1)# GSS hrs1: IF WORKING, FULL OR PART TIME: How many hours did you work last week, at all jobs?
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00   35.00   40.00   41.08   50.00   89.00    2188
#Data Manipulation (rename,relevel) for readability (wrkstat), select only full & part time
levels(Gss1$wrkstat) <- c("Keeping house", "Other",
                          "Not Working", "Other",
                          "Not Working", "Not Working",
                          "Working fulltime","Working parttime",NA)

Gss1$wrkstat <- factor(Gss1$wrkstat, levels = c("Working fulltime","Working parttime"))

summary(Gss1$wrkstat) #Gss Wrkstat: Last week were you working full time, part time, going to school, keeping house, or what?
## Working fulltime Working parttime             NA's 
##             2455              604             2156

In order to understand the relationship between employment type and happiness, it is important to understand some factors that show the benefits of being self-employed. One of which is making your own schedule, to do this, we will show a correlation between the numbers of hours worked and status of the individuals labor force (wrkstat).

lm(hrs1 ~ wrkstat, data = Gss1) %>% summary()
## 
## Call:
## lm(formula = hrs1 ~ wrkstat, data = Gss1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.248  -5.248  -3.837   4.752  65.163 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              45.2482     0.2368  191.12   <2e-16 ***
## wrkstatWorking parttime -21.4111     0.5367  -39.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.69 on 3025 degrees of freedom
##   (2188 observations deleted due to missingness)
## Multiple R-squared:  0.3447, Adjusted R-squared:  0.3445 
## F-statistic:  1591 on 1 and 3025 DF,  p-value: < 2.2e-16

As the regression model shows that the explanatory variable (x) or hours worked and the response variable (y) or wrkstat, shows the relationship to each other.

The following regression can be read like this:

hrs1-hat = 45 - 21 * partime

The y-intercept, in this model, is 45 (hrs) which is the expected number of hours worked for someone that is working fulltime and less by 21 hrs if worked part time.

The regression, itself, has a medium positive relationship, as shown in the multiple r-square of .34, which means there is a moderate relationship between the two variables.

CrossTable(Gss1$wrkstat,Gss1$wrkslf, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ========================================================
##                     Gss1$wrkslf
## Gss1$wrkstat        Self-employed   Someone else   Total
## --------------------------------------------------------
## Working fulltime             254           2201    2455 
##                             68.1%          81.9%        
## --------------------------------------------------------
## Working parttime             119            485     604 
##                             31.9%          18.1%        
## --------------------------------------------------------
## Total                        373           2686    3059 
##                             12.2%          87.8%        
## ========================================================

In the contingency table above, shows is the relationship between the variables wrkstat and wrkslf in which the explanatory variable (cause) is wrkslf and the response variable (effect) is wrkstat.

As the table shows, 31.9% of individuals that are self-employed are working part time and 81.9% of individuals that are working for someone else are working full time.

The relationship shows that people that are self-employed work less reported hours than those that work for someone else.

One thought is how do you define hours worked, another saying to think about is “if You Love What You Do, You’ll Never Work A Day In Your Life” in which those that work for themselves might not consider what they to as “work” but in a office setting its still considered work time, like eating lunch in some companies is still considered work time.

CrossTable(Gss1$satjob,Gss1$wrkslf, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE,format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ===================================================
##                Gss1$wrkslf
## Gss1$satjob    Self-employed   Someone else   Total
## ---------------------------------------------------
## Very Dissat              11            114     125 
##                         2.8%           3.7%        
## ---------------------------------------------------
## Litt Dissat              22            310     332 
##                         5.7%          10.1%        
## ---------------------------------------------------
## Mod Sat                 109           1168    1277 
##                        28.0%          38.2%        
## ---------------------------------------------------
## Very Sat                247           1465    1712 
##                        63.5%          47.9%        
## ---------------------------------------------------
## Total                   389           3057    3446 
##                        11.3%          88.7%        
## ===================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 34.48811      d.f. = 3      p = 1.56e-07 
## 
##         Minimum expected frequency: 14.11056

In the contingency table above, shows is the relationship between the variables satjob and wrkslf in which the explanatory variable (cause) is wrkslf and the response variable (effect) is satjob.

As the table shows 63.5% of individuals that are self-employed have very high job satisfaction, whereas 47.9% of those that work for someone else are feeling the same way. Conversely, despite the relatively small difference in percentages, 2.8% of those that are self-employed have are very dissatisfied with what they do and 3.7% of those that work for someone else feel the same way.

Digressing a bit, but what could explain this is that those that are self-employed, do what they want in pursuit of their objective which results in higher satisfaction whereas working for someone else sometimes one has to play in office politics which results in lesser amounts of very high satisfaction.

Inference

How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.

  • Step 1: Formulate a null hypothesis (H0) and the alternative hypothesis (H1):

The null hypothesis (H0) that we are making is that there is no relationship between wrkslf and satjob The alt hypothesis (H1) that we are making is that there is a relationship between wrkslf and satjob

  • Step 2: Determine the degrees of freedom (df) and set a significance level alpha:

The df has already been calculated by the chisq test which gives us df = 2.

To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.

  • Step 3: Find the critical value (cv) in the chi-square distribution table using df and alpha:

To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 7.815 as our critical value.

  • Step 4: Calculate chi-square statistic based on contingency table:

This step is done by using chisq = TRUE in our contingency table.

  • Step 5: Compare chi-square statistic with the critical value:

The chi-square statistic is 34.49 and the critical value from step 3 is 7.82.

Since 34.49 > 7.82, we can know make an inference upon the population.

Since the chi-square is greater than cv, it is highly unlikely (p = 1.56 * 10^-7) to draw such a sample from a population in which there is not relationship so we must reject the null hypothesis (wrkslf & satjob are not related), and accept the alternative that there is a relationship between wrkslf and satjob to conclude that the independent variable (wrkslf) and the dependent variable (satjob) are related in the population.

  • this is only true if the confidence level is 95% which is assumed in this scenario, in all other instances, this would be an error.
CrossTable(Gss1$happy,Gss1$wrkslf, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE,format = "SPSS")
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ==================================================
##               Gss1$wrkslf
## Gss1$happy    Self-employed   Someone else   Total
## --------------------------------------------------
## Not                     70            674     744 
##                       13.3%          15.0%        
## --------------------------------------------------
## Somewhat               287           2519    2806 
##                       54.6%          56.1%        
## --------------------------------------------------
## Very                   169           1300    1469 
##                       32.1%          28.9%        
## --------------------------------------------------
## Total                  526           4493    5019 
##                       10.5%          89.5%        
## ==================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 2.743327      d.f. = 2      p = 0.254 
## 
##         Minimum expected frequency: 77.9725

In the contingency table above, shows is the relationship between the variables happy and wrkslf in which the explanatory variable (cause) is wrkslf and the response variable (effect) is happy.

This table does not show much variance in terms of the percentages displayed as the majority of the table is similar in percentages across the table. As 13.3% individuals that are self-employed are not happy and 15% of individuals that work for someone else feel the same way.

Inference

How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.

  • Step 1: Formulate a null hypothesis (H0) and the alternative hypothesis (H1):

The null hypothesis (H0) that we are making is that there is no relationship between wrkslf and happy. The alt hypothesis (H1) that we are making is that there is a relationship between wrkslf and happy.

  • Step 2: Determine the degrees of freedom (df) and set a significance level alpha:

The df has already been calculated by the chisq test which gives us df = 2.

To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.

  • Step 3: Find the critical value (cv) in the chi-square distribution table using df and alpha:

To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 5.991 as our critical value.

  • Step 4: Calculate chi-square statistic based on contingency table:

This step is done by using chisq = TRUE in our contingency table.

  • Step 5: Compare chi-square statistic with the critical value:

The chi-square statistic is 2.74 and the critical value from step 3 is 5.9.

Since 2.74 < 5.9, we can know make an inference upon the population.

Since the chi-square is less than cv, it is highly likely (p = .254) to draw such a sample from a population in which there is not relationship so we cannot reject the null hypothesis (wrkslf & happy are not related), and accept the alternative that there is a relationship between degree and happy and to conclude that the independent variable (wrkslf) and the dependent variable (happy) are related in the population.

  • this is only true if the confidence level is 95% which is assumed in this scenario, in all other instances, this would be an error.

Conclusion

The analysis of the different types of relationship with happiness shows some interesting tidbits on what effects one happiness. As we know in this time of the pandemic Covid-19, it is imperative to find happiness despite of what is happening which is a difficult task even without this event occurring that makes it harder.

Is there a relationship between income and happiness?

The statistics show that for the given dataset that examines happiness and income, it can be said that is a relationship between the two variables, however it is not the only factor. As it shows that as one’s earns more income the percentages of those that are not happy decreases and inversely the amount of those that are very happy increases.

Is there a relationship between degree and happiness?

The statistics show that for the given dataset that examines happiness and degree, it can be said that is a relationship between the two variables, however it is not the only factor. As it shows that as one achieves higher degree types, it results in lower percentages of those that are not happy.

Is there a relationship between wrkslf and happiness?

The statistics show that for the given dataset that examines happiness and wrkslf, it can be said that is no relationship between the two variables.