What seems like a simple response, yes or no, is quite tricky to answer for many people. Why does a seemingly easy question with no wrong answer feel difficult to answer? To understand why that is, this project serves an in-depth look into the science of happiness and a simple guide on how to understand happiness better.
Jump to Executive Summary
Happiness is like to a costume, in the sense that it has many forms, and by extension has many different definitions depending on who you ask.
According to the Oxford dictionary, Happiness is the “Feeling or showing pleasure or contentment.”
In the words of positive psychology researcher, Sonja Lyubomirsky, she describes that happiness is “the experience of joy, contentment, or positive well-being, combined with the sense that one’s life is good, meaningful and worthwhile.”
The list goes on and often riddled in debate on what truly defines happiness however when philosophers and psychologist define happiness, it takes the form of two types of meanings, one refers to a state of mind and the second is based on well-being.
A state of mind describes a psychological lens of happiness in which examines mental states in which life satisfaction, pleasure or positive emotional states are answers to what drives happiness.
Well-being describes the benefits of what makes a person happy, in which examines in a more as a philosophical lens where these benefits are seen as actionable characteristics that benefits a person.
Some questions of what well-being asks:
Aristotle is a man of many titles, an early Greek philosopher and polymath, described to be a foundational influencer of modern science and philosophy. Aristotle believes that the central purpose of life is to achieve happiness as he describes happiness to be “dependent on ourselves.” In his book, Nicomachean Ethics, he wonders what the purpose of human existence is; within this thought, he surveys his environment, which he sees that everyone in life is seeking a pursuit in pleasure, wealth, or good reputation.
He believes that happiness is only found as “an ultimate end,” and an act towards the pursuit of happiness must be “self-sufficient and final,” and in a grander sense must be “attainable by man” in which many of these values cannot be fully attained. Can one truly obtain all the wealth, can one obtain all pleasure, can one obtain impeccable good reputation? In this approach, he shows his concern with the state of mind definition of happiness as these are considered to be momentary and not a final act.
He defines happiness as:
…the function of man is to live a certain kind of life, and this activity implies a rational principle. The function of a good man is the good and noble performance of these, and if any action is well performed, it is performed in accordance with the appropriate excellence: if this is the case, then happiness turns out to be an activity of the soul in accordance with virtue. (Nicomachean Ethics, 1098a13)
He believes in achieving happiness, one must have good moral character, and to have good moral character, one has to abide “complete virtues” which are good character traits, or behaviors showing high moral standards.
Aristotle believes that short-term pleasures that make someone happy can sometimes cause long-term pain. One example is taking drugs, in which, if someone takes drugs to make them happier now, and once the drugs wear off, the person might feel miserable afterward, which introduces a cycle of addiction. That in order to feel happy again, one must take drugs to constantly replenish the temporary feeling of happiness.
Also, he believes what separates an animal from a human is the ability of rational thought or reason and that happiness is the product of one’s reason.
To conclude Aristotle’s beliefs, he notes a few parameters on achieving happiness, in which, he believes that happiness is an ultimate end and is the purpose of human existence, to achieve happiness one must exercise virtue throughout life, and happiness depends on one’s reason.
Epicurus is a Greek philosopher and the founder of the school thought, Epicureanism, which believes that pleasure is the goal in life. In his thoughts on happiness, Epicurus agrees with Aristotle that the goal in life is to achieve happiness; however, Epicurus differs in approach in how this is done.
Hedonism is the pursuit of pleasure, and Epicurus believes that engaging in hedonistic activities is normal and natural, like consuming alcohol. However, over-indulgence in these activities can lead to the overall amount of pleasure to decrease and, by extension, can cause consequences and pain; for instance, over-indulgence in alcohol can lead to alcoholism, the dependence of alcohol.
To maximize long-term pleasure and tranquility (peace of mind), Epicurus submits three key obstacles that prevent us from achieving happiness: the tyranny of desire, the fear of the gods, and the fear of death.
The Tyranny Of Desire
In this thought, Epicurus believes that pleasure is nothing more than “desire-satisfaction,” In contrast, pain is “desire-frustration,” wherein this part, Epicurus denoted that there are two strategies in achieving pleasure either fulfill the desire or eliminate the desire.
He notes that some pleasures are natural and necessary like eating, having shelter and friendship and as a result are impossible to eliminate, but are easily attainable and provide long-term satisfaction
In contrast, “vain and empty” desires include power, wealth, fame, accumulation of material possessions, which is difficult and hard to maintain. Similar to Aristotle, he questions the achievability of these desires; if I have wealth, I can always obtain more wealth, or if I have 10 million twitter followers, then I can still get more followers. Is there a point where you can truly have enough?
In these unattainable desires, there is a point after one achieves something significant, like making a million dollars. Still, after the positive surge, it reverts to a baseline state of happiness known as the hedonic treadmill. Modern research shows the validity of hedonic treadmill, which states people repeatedly return to their baseline level of happiness, regardless of what happens to them.
This treadmill prevents people from achieving long term gains in happiness once they have hit their goals. For example, achieving a million dollars is equally strong as achieving two million dollars. He would suggest the better way is to reduce unattainable desires or rather “Do not spoil what you have by desiring what you have not; remember that what you now have was once among the things you only hoped for.”
Epicurus further states that any limitless desires contribute to very little to tranquility and often inflicts more pain than pleasure. The first path to pleasure to eliminate “vain and empty” desires which are a far better long-term solution to replace the pursuit of limitless desires with natural and necessary desires.
Overcoming the Fear of the Gods’
Epicurus, if he believed in the gods, he would say that he did not believe that the gods were interested or even concerned with human affairs.
In Greek culture, Gods were claimed to be perfect, which he believed that it was inconsistent with their concerns for the administration of human affairs in the imperfect and frustrating world. He thought of that Gods as “psychological projections of peoples’ ideal selves,” which simply put that to be a god was in control of one’s happiness.
Epicurus was an early believer of atomism which refers to the belief that the universe (by extension reality) is composed of small, indivisible and indescribable building blocks known as atoms.
Epicurus claimed that since bodies are the result of random motion and combinations of these inseparable atoms, augured against infinite divisibility of matter because then there would be nothing to would stop bodies from being infinitely divided. Since we know that bodies exist, there would be a limit to this division, which in contrast to the belief of Gods creating the universe.
With this thought, the realization that gods either do not exist or care about human affairs has a strong ethical element for him. For many, religion is a form of consolation from reality. Which he viewed the opposite, he saw religion as a “source of anxiety and fear” that should be extinguished through a better understanding of the natural world.
He believes that if Gods existed, then he would want to question their intentions, and as a result he constructed a thought process to determine if Gods were who we say they are, and that known as the problem of evil.
Within the problem of evil, Epicurus shows the inconsistent nature of the Gods in the presence of evil as we know evil causes pain, and pain is something that we do not want, and questions if the Gods are who we say they are.
Overcoming the Fear of Death
Death evokes intense fear and anxiety more than anything else.
Epicurus provides two reasons why we should not fear death, the first he believes death is something we do not experience, and the second is known as the “symmetry argument.”
He believes that since death is not an experience, we as humans face, it cannot be painful since the mind is annihilated upon death and is incapable of experiencing pleasure or pain. He states that death is not a part of life since the living has not died, and death does not affect the dead since they are dead.
The symmetry argument argues that death is something we have experienced before, which is similar to the state we experienced before being born. If you have any reason to believe that the time before you were born was not painful, then you would have no reason to believe death should be unpleasant.
The philosophy of Epicurus, which prioritized the attainment of lasting tranquility, could not be complete until one confronts the king of all fears.
Like all theories, there is criticism and one problem for Epicureanism, hinges on the same belief as Epicurus and his belief of reality known as metaphysics, in that for him, life ends when the mind does. In addition, he asserts ways to avoid pain rather than seek the pursuit of happiness and suggests by avoiding this pain, then we can obtain than be happy.
In the views of philosophers Aristotle and Epicurus, happiness is the primary reason for life, and one should live life and try to achieve happiness following their belief system. One takeaway from these philosophers that I believe is worth noting is not what they say precisely instead of the way they say it. In their approach, both have created systems of defined determinations where one shows how to achieve happiness, and the other shows what obstacles to overcome to achieve happiness. While these systems work with their worldviews, it may not coincide with others, and it’s within this consideration, is to have a system that either defines what makes you happy or what makes you unhappy and understand how to overcome them.
But, does being happy truly matter?
According to Christine Carter at UC Berkeley, it does, she explains that on average, happy people are more successful than unhappy people at both work and love and states that happy people tend to be healthier and live longer. Also, she describes people who are very happy have higher incomes, academic achievement, job satisfaction and political participation than the happiest people. She argues that because of the difference in happiness, which is shown via a sense of contentment with status quo, or inclination to improve things, are more motivated towards action and as a result have higher success.
The research questions, listed down below, are questions that help to better understand the effects of happiness on people. Given that are more characteristics and variables associated with what makes a person happy and that happiness is more often a subjective measure rather than a objective one. The goal of the research questions will be to show via a statistic lens which characteristic best represents what causes a person to be happy and the effects of being happy.
Dataset Information
The dataset, listed down below, is a part of a survey conducted by NORC at the University of Chicago apart of the survey called the General Social Survey, which gathers data on the contemporary American Society in the attempt to showcase the current state of the nation. The survey contains wide ranging topics including demographics, behavioral, attitudinal questions and additional special interest questions. In this project, a dataset has been created via the GSS data explorer to examine what drives happiness and data the helps explain it.
For the survey, the GSS design is a repeated cross-sectional survey of a nationally representative sample of non-institutionalized adults that speak English or Spanish. The survey is conducted in two samples, A and B, in which is conducted via a face-to-face interview format and sometimes over the phone if not able presently available to meet in person. Each person is then asked a repeated set of questions that examines socio-demographic background data, socio-political attitudes and behaviors and then asked additional questions where each respondent has a random 2/3 chance of answer a question in from a selected poll of questions. (either ballots A,B,C)
With any dataset, bias is possible and can influence the information that is being examined. However, due to the rigorous design of GSS, that is relied upon by industry professionals, scholars and law-makers, the data provided is one of the best presentations of the nature, feelings and thoughts of the American society at the time. In regards to the generalizability of this dataset, it should be noted that the design process is designed to reduce bias as much as possible to best represent the American society at the time, and the dataset is able to extent to a boarder population e.g. the entire American population.
To understand further about the bias and generalizability of the GSS process, check out their codebook here.
Casual Inference
In addition, for the selected GSS data below, the dataset contains observational data that is not apart of an experimental process or design and is only to determine if there is an association between any variable. The dataset represents the nature of Americans in 2018, where respondents were asked a series of questions including a measure of their happiness, satisfaction of financial , and satisfaction with social events.
nrow(Gss1)
## [1] 5215
In this dataset, there are 5215 observations, it should be noted that is only in 2016 and 2018, is enough to be able to represent the population of America given that this survey has an assumed 2% margin of error and taken on a 95% confidence level.
In my initial hypothesis, before conducting any statistical research, I believe that happier people are more successful than unhappier people as to measure one’s success is to view variables including financial situation, education and degree as successful people achieve more and as a result are happier. Another belief, is that people that self-employed are relatively happier than those that work for someone else.
Jump to Conclusion
Income Inequality is occurring at a rapid pace, all across the world, and it appears as this trend is not slowing down any time soon. As income is being swallowed up by wealthier and more affluent individuals, that should mean that common individuals are becoming less and less happy as a result, right? As one component to happiness is well-being (in certain cases), and that includes financial well-being which refers to the level of security and ability to have the freedom of choice to do what one wants.
In contrast, one phrase thrown around in regards to money and happiness is “money cannot buy happiness”, which has different meanings depending on who you ask, which in my opinion is situational where if you had unlimited wealth and trying to buy something that cannot be purchased with money like immorality leads to the person being unsatisfied and unhappy. I believe this is not the case for other situations in regards to general well-being that is understood to be a factor in one personal happiness.
According to Epicurus, pursuing unattainable desires is not a good way to live, however, what if you only pursue just enough until your relatively satisfied.
This question aims to answer the balance between the two points, whether or not one’s happiness pivots up or down as income increases and additionally find out if more successful are happier than less successful people.
Are you in a rat race? The rat race is a fierce competition that involves the pursuit of a goal or job in a endless and repetitive manner. Those that are in the rat race feel as if they must continue this pursuit in order to be happy and as a result, they are subjected to a continuous loop of a day to day lifestyle. And that in order to get out of the rat race, one seeks to find ways to get out of this cycle, and for some, they seek other types of employment such as self-employment.
Working for oneself, can be a major benefit as described by Betsy Mikel of INC, as you are able to gain more control of your environment, being further engaged into your work, and freedom to pursue your ideals. However, it is not for everyone, as the added burden of having to do everything oneself can be taxing.
This question seeks to aim to answer, which employment option results in happier individuals, and find the correlation, if available, of the two factors.
library(dplyr)
library(knitr)
library(descr)
library(forcats)
#Datasets
Gss0 <- read.csv("Gss0.csv", header = TRUE, sep = ",") #Full dataset without mods
Gss1 <- read.csv("Gss1.csv", header = TRUE, sep = ",") #yrs 16-18, happy categorical, income levels changed <- main dataset
#Dataset Info
names(Gss1)
## [1] "year" "happy" "socrel" "socommun" "socfrend" "socbar"
## [7] "satjob" "class_" "rank" "satfin" "goodlife" "income16"
## [13] "polviews" "wrkstat" "wrkslf" "marital" "divorce" "sibs"
## [19] "childs" "educ" "degree" "sex" "race" "region"
## [25] "satsoc" "age" "hrs1"
str(Gss1)
## 'data.frame': 5215 obs. of 27 variables:
## $ year : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
## $ happy : Factor w/ 3 levels "Not too happy",..: 2 2 3 2 3 3 2 3 2 2 ...
## $ socrel : Factor w/ 7 levels "Almost daily",..: 1 7 NA 6 NA 7 4 NA 4 NA ...
## $ socommun: Factor w/ 7 levels "Almost daily",..: 6 6 NA 3 NA 5 6 NA 2 NA ...
## $ socfrend: Factor w/ 7 levels "Almost daily",..: 7 7 NA 6 NA 3 6 NA 2 NA ...
## $ socbar : Factor w/ 7 levels "Almost daily",..: 5 2 NA 3 NA 4 6 NA 2 NA ...
## $ satjob : Factor w/ 4 levels "A little dissat",..: 2 4 NA 4 2 4 2 4 2 NA ...
## $ class_ : Factor w/ 4 levels "Lower class",..: 2 NA 2 2 2 2 2 4 1 2 ...
## $ rank : int 1 5 4 3 3 5 5 4 5 4 ...
## $ satfin : Factor w/ 3 levels "More or less",..: 3 2 1 3 3 1 2 1 2 1 ...
## $ goodlife: Factor w/ 6 levels "Agree","Cant choose",..: NA 3 1 NA 4 1 NA 1 NA 5 ...
## $ income16: Factor w/ 6 levels "$0 to 19999",..: 5 3 4 5 5 4 5 2 4 4 ...
## $ polviews: Factor w/ 7 levels "Conservative",..: 5 4 1 5 7 7 7 6 NA 1 ...
## $ wrkstat : Factor w/ 8 levels "Keeping house",..: 7 7 3 8 8 1 7 8 7 3 ...
## $ wrkslf : Factor w/ 2 levels "Self-employed",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ marital : Factor w/ 5 levels "Divorced","Married",..: 2 3 2 2 2 2 2 2 2 1 ...
## $ divorce : Factor w/ 2 levels "No","Yes": 1 NA 1 1 1 1 1 1 2 NA ...
## $ sibs : int 2 3 3 3 2 2 2 6 5 1 ...
## $ childs : Factor w/ 10 levels "0","1","2","3",..: 4 1 3 5 3 3 3 4 4 5 ...
## $ educ : int 16 12 16 12 18 14 14 11 12 14 ...
## $ degree : Factor w/ 5 levels "Bachelor","Graduate",..: 1 3 1 3 2 4 3 3 3 4 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ race : Factor w/ 3 levels "Black","Other",..: 3 3 3 3 3 3 3 2 1 3 ...
## $ region : Factor w/ 9 levels "E. nor. central",..: 5 5 5 5 5 5 5 3 3 3 ...
## $ satsoc : Factor w/ 5 levels "Excellent","Fair",..: NA NA NA NA NA NA NA NA NA NA ...
## $ age : Factor w/ 73 levels "18","19","20",..: 30 44 55 26 38 36 33 6 28 54 ...
## $ hrs1 : Factor w/ 82 levels "1","10","11",..: 46 37 NA 24 45 NA 51 24 73 NA ...
Research Question 1
#Data Manipulation (rename,relevel) for readability (happy)
Gss1 <- Gss1 %>% mutate(tidyhappy = fct_relevel(happy,"Not too happy",
"Pretty happy",
"Very happy"))
Gss1 <- Gss1 %>% mutate(happy = factor(happy, levels = c("Not too happy","Pretty happy","Very happy"),
labels = c("Not","Somewhat","Very")))
#Gss1$happy <- factor(Gss1$happy,labels = c("Not","Somewhat","Very"))
summary(Gss1$happy) # GSS happy: Taken all together, how would you say things are these days--would you say that you are very happy, pretty happy, or not too happy?
## Not Somewhat Very NA's
## 788 2908 1507 12
#Data Manipulation (rename,relevel) for readability (Satfin)
Gss1 <- Gss1 %>% mutate(tidysatfin = fct_relevel(satfin,"Not at all sat",
"More or less",
"Satisfied"))
Gss1 <- Gss1 %>% mutate(satfin = factor(satfin, levels = c("Not at all sat","More or less","Satisfied"),
labels = c("Low","Med","High")))
summary(Gss1$satfin) #GSS satfin: We are interested in how people are getting along financially these days. So far as you and your family are concerned, would you say that you are pretty well satisfied with your present financial situation, more or less satisfied, or not satisfied at all?
## Low Med High NA's
## 1339 2302 1553 21
#Data Manipulation (rename) for readability (income16)
Gss1 <- Gss1 %>% mutate(income16 = factor(income16, levels = c("$0 to 19999","$20000 to 39999","$40000 to 59999", "$60000 to 89999", "$90000 or more","Refused"),
labels = c("<$20K","<$40K","<$60K","<$90K","$90K+",NA)))
summary(Gss1$income16) #GSS income16: In which of these groups did your total family income, from all sources, fall last year -- 2015 -- before taxes, that is. Total income includes interest or dividends, rent, Social Security, other pensions, alimony or child support, unemployment compensation, public aid (welfare), armed forces or veteran's allotment
## <$20K <$40K <$60K <$90K $90K+ <NA> NA's
## 894 1007 771 854 1222 242 225
To do this, we will use income16 and satfin, to show a correlation with how much income the respondent’s family receives and the level of satisfaction that are with their given income.
CrossTable(Gss1$satfin,Gss1$income16,prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ============================================================
## Gss1$income16
## Gss1$satfin <$20K <$40K <$60K <$90K $90K+ Total
## ------------------------------------------------------------
## Low 415 339 214 139 116 1223
## 46.5% 33.7% 27.8% 16.3% 9.5%
## ------------------------------------------------------------
## Med 342 460 380 452 474 2108
## 38.3% 45.7% 49.3% 53.0% 38.8%
## ------------------------------------------------------------
## High 135 207 177 262 632 1413
## 15.1% 20.6% 23.0% 30.7% 51.7%
## ------------------------------------------------------------
## Total 892 1006 771 853 1222 4744
## 18.8% 21.2% 16.3% 18.0% 25.8%
## ============================================================
In the contingency table above, shows is the relationship between the variables satfin and income16 in which the explanatory variable (cause) is income16 and the response variable (effect) is satfin.
In this table, 46.5% of individuals than earn less than 20K dollars have the highest percentages of low financial satisfaction, which is around the federal poverty line for a 4 person household, whereas within the same column, 15.1% of those that earn less than 20K dollars have reported high financial satisfaction, which shows that income is not the sole factor of financial satisfaction.
49.3% of respondents that make between 40K dollars and less than 60K dollars claim have a medium feeling about their financial satisfaction, which is around the median income in America, shows that this is the income range is not enough to be satisfied and enough to be dissatisfied.
Additionally, 51.7% of individuals that earn 90K dollars or more claim to have high financial satisfaction and 9.5% of individuals in the same column claim to not be financial satisfied, which reaffirms that despite being statistically above average income or greater that there are external factors of financial satisfaction.
However, as this table, it does show that there is a causal relationship between income and financial satisfaction that has income increases, the level one of satisfaction increases with it.
The next part is to determine if those that earn more income are happier.
CrossTable(Gss1$happy,Gss1$income16, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE, format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===========================================================
## Gss1$income16
## Gss1$happy <$20K <$40K <$60K <$90K $90K+ Total
## -----------------------------------------------------------
## Not 270 201 92 71 74 708
## 30.3% 20.0% 11.9% 8.3% 6.1%
## -----------------------------------------------------------
## Somewhat 466 574 477 474 673 2664
## 52.4% 57.1% 61.9% 55.5% 55.1%
## -----------------------------------------------------------
## Very 154 231 202 309 474 1370
## 17.3% 23.0% 26.2% 36.2% 38.8%
## -----------------------------------------------------------
## Total 890 1006 771 854 1221 4742
## 18.8% 21.2% 16.3% 18.0% 25.7%
## ===========================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 372.8225 d.f. = 8 p <2e-16
##
## Minimum expected frequency: 115.1135
Similar to the previous table, there is a pattern that shows a casual relationship between income and their happiness, which as income increases, so does happiness in this sample.
82% of those that make less than 20K, are either not or somewhat happy relative to the point in which the respondents took the survey and inversely, 93.9% of those that make 90K or more are either very or somewhat happy.
This table illustrates a picture of how income effects one’s happiness, that the more income an individual and their family receive the greater amount of happiness each family is reported to be. When people say “Money cannot buy happiness,” it with a certainty they it does not refer to the money in general, as the first table indicates that as more earns a higher income, the level of satisfaction of their income increases. In addition to the second table which shows that as income increases, the levels of respondents that are happy increases, and conversely, the levels of respondents that are not happy decreases.
Inference
How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.
The null hypothesis (H0) that we are making is that there is no relationship between income and happy. The alt hypothesis (H1) that we are making is that there is a relationship between income and happy.
df = (# of rows of the dependent variable - 1) * (# of columns of the independent variable - 1)
The df has already been calculated by the chisq test which gives us df = 8.
To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.
To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 15.51 as our critical value.
This step is done by using chisq = TRUE in our contingency table.
The chi-square statistic is 372.83 and the critical value from step 3 is 15.51.
Since 372.83 > 15.51, we can know make an inference upon the population.
Since the chi-square is greater than cv, it is highly unlikely (p = 2 * 10^-16) to draw such a sample from a population in which there is not relationship so we must reject the null hypothesis (income & happy are not related), and accept the alternative that there is a relationship between income and happy and to conclude that the independent variable (income) and the dependent variable (happy) are related in the population.
It should be noted that happy and income may have a confounding variable in common, but is not the focus on this project.
note p-value assumes that the null hypothesis is true and the test- statistic would take a value as a extreme or more extreme that is actually observed.
this is only true if the confidence level is 95% which is assumed in this scenario, in all other instances, this would be an error.
Happiness & Success
Success in of itself is difficult to measure. Success is often times subjective and dependent on what one believes as an achievement to which is difficult to measure on a objective scope. However, academic advancement is a form of success that everyone is able to participate to a certain degree (college is questionable), and as college is a means for someone to increase their potential salaries once they enter the workforce.
To show a relationship, between happiness and academic advancement, we need to figure out if there is truly an association between education year or degree and income.
#variables used in this section
summary(Gss1$educ) # GSS educ: respondent's highest grade of school completed
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 12.00 14.00 13.73 16.00 20.00 12
#Data Manipulation (rename,relevel) for readability (degree)
summary(Gss1$degree)
## Bachelor Graduate High school Junior college Lt high school
## 1001 565 2639 412 590
## NA's
## 8
Gss1 <- Gss1 %>% mutate(tidydegree = fct_relevel(degree,"Lt high school",
"High school",
"Junior college",
"Bachelor",
"Graduate"))
Gss1 <- Gss1 %>% mutate(degree = factor(degree, levels = c("Lt high school",'High school',"Junior college", "Bachelor", "Graduate"),
labels = c("Ls HS","HS","AA","Bach","Grad")))
summary(Gss1$degree) # GSS degree: respondent's highest degree = Less than High School/ High School / Associate Degree/ Bachelors Degree/ Graduate Degree
## Ls HS HS AA Bach Grad NA's
## 590 2639 412 1001 565 8
summary(Gss1$income16)
## <$20K <$40K <$60K <$90K $90K+ <NA> NA's
## 894 1007 771 854 1222 242 225
summary(Gss1$happy)
## Not Somewhat Very NA's
## 788 2908 1507 12
#regress education years to degree to show relationship
lm(educ ~ degree, data = Gss1) %>% summary()
##
## Call:
## lm(formula = educ ~ degree, data = Gss1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0017 -0.6705 -0.3836 0.7274 7.3295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.00170 0.05910 152.30 <2e-16 ***
## degreeHS 3.66876 0.06536 56.13 <2e-16 ***
## degreeAA 5.64150 0.09208 61.27 <2e-16 ***
## degreeBach 7.38192 0.07447 99.13 <2e-16 ***
## degreeGrad 9.27087 0.08443 109.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.433 on 5198 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.7671, Adjusted R-squared: 0.7669
## F-statistic: 4279 on 4 and 5198 DF, p-value: < 2.2e-16
Formula for regression : \(\hat{y}= a + bX\)
As the regression model shows that the explanatory variable (x) or education years and the response variable (y) or degree, shows the relationship to each other.
The following regression can be read like this:
SAL-hat = 9 + 3.66 * HS + 5.64 * AA + 7.38 * Bach + 9.27 * Grad
The y-intercept, in this model, is 9 (yrs) which is the expected number of years for someone that did not graduate high school, additionally each coefficient represents one of the other types of degrees.
The regression, itself, has a strong positive relationship, as shown in the multiple r-square of .77, between the two variables that shows the validity of degree and how long it takes to achieve each degree type.
The next step to show the relationship to income and degree to show academic success as it pertains to income.
#regress income16 on degree to show how degree affects income
CrossTable(Gss1$income16,Gss1$degree, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================================
## Gss1$degree
## Gss1$income16 Ls HS HS AA Bach Grad Total
## --------------------------------------------------------------
## <$20K 216 530 57 67 22 892
## 43.0% 22.1% 14.8% 7.2% 4.1%
## --------------------------------------------------------------
## <$40K 128 616 86 117 60 1007
## 25.5% 25.7% 22.4% 12.6% 11.1%
## --------------------------------------------------------------
## <$60K 70 448 60 142 50 770
## 13.9% 18.7% 15.6% 15.4% 9.2%
## --------------------------------------------------------------
## <$90K 49 418 91 192 104 854
## 9.8% 17.5% 23.7% 20.8% 19.2%
## --------------------------------------------------------------
## $90K+ 39 381 90 407 305 1222
## 7.8% 15.9% 23.4% 44.0% 56.4%
## --------------------------------------------------------------
## Total 502 2393 384 925 541 4745
## 10.6% 50.4% 8.1% 19.5% 11.4%
## ==============================================================
In the contingency table above, shows is the relationship between the variables degree and income in which the explanatory variable (cause) is degree and the response variable (effect) is income.
43.0% of individuals that have less than a high school diploma make less than 20K dollars, and as one achieves a higher degree the percentages of those that make less than 20K dollars decreases.
Conversely, 7.8% of that that have less than a high school diploma make in excess of 90K dollars and and one achieves a high degree the percentages of those that make more than 90K dollars increases.
This table shows that there is a relationship between degree and income that as one receives a higher degree the more likely it is that one has a higher income.
To show a relationship between degree earned and happiness, a regression of degree and happy is conducted.
#regress degree (X) on happy(y)
CrossTable(Gss1$happy,Gss1$degree, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE, format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===========================================================
## Gss1$degree
## Gss1$happy Ls HS HS AA Bach Grad Total
## -----------------------------------------------------------
## Not 144 437 57 109 39 786
## 24.5% 16.6% 13.9% 10.9% 6.9%
## -----------------------------------------------------------
## Somewhat 259 1525 256 540 325 2905
## 44.1% 57.9% 62.3% 53.9% 57.5%
## -----------------------------------------------------------
## Very 184 672 98 352 201 1507
## 31.3% 25.5% 23.8% 35.2% 35.6%
## -----------------------------------------------------------
## Total 587 2634 411 1001 565 5198
## 11.3% 50.7% 7.9% 19.3% 10.9%
## ===========================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 133.5363 d.f. = 8 p <2e-16
##
## Minimum expected frequency: 62.14813
In the contingency table above, shows is the relationship between the variables degree and happy in which the explanatory variable (cause) is degree and the response variable (effect) is happy.
In one trend of this table, is that those with higher forms of degrees there is a negative relationship with being not happy, where 24.5% of individuals that have less than a high school degree are not happy and 6.9% of individuals that have a graduates degree are not happy as well.
However, in one uprising detail in this table is that 31.3% of individuals with less than a high school degree are reported to be very happy despite previously mentioned that those with less than a high school diploma have the highest percentages of incomes less than 20K dollars, which is similar those that have a bachelors (35.2%) and graduates (35.6%) degree.
Inference
How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.
The null hypothesis (H0) that we are making is that there is no relationship between degree and happy. The alt hypothesis (H1) that we are making is that there is a relationship between degree and happy.
The df has already been calculated by the chisq test which gives us df = 8.
To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.
To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 15.51 as our critical value.
This step is done by using chisq = TRUE in our contingency table.
The chi-square statistic is 133.53 and the critical value from step 3 is 15.51.
Since 133.53 > 15.51, we can know make an inference upon the population.
Since the chi-square is greater than cv, it is highly unlikely (p = 2 * 10^-16) to draw such a sample from a population in which there is not relationship so we must reject the null hypothesis (degree & happy are not related), and accept the alternative that there is a relationship between degree and happy and to conclude that the independent variable (degree) and the dependent variable (happy) are related in the population.
Research Question 2
#variables used in this section
summary(Gss1$happy)
## Not Somewhat Very NA's
## 788 2908 1507 12
summary(Gss1$wrkslf) #GSS wrkslf: (Are/Were) you self employed or (do/did) you work for someone else? https://gssdataexplorer.norc.org/variables/9/vshow
## Self-employed Someone else NA's
## 528 4501 186
#Data Manipulation (rename, relevel) for readability (satjob)
Gss1 <- Gss1 %>% mutate(tidysatjob = fct_relevel(satjob,
"Very dissatisfied",
"A little dissat",
"Mod. satisfied",
"Very satisfied"))
Gss1 <- Gss1 %>% mutate(satjob = factor(satjob,
levels = c("Very dissatisfied","A little dissat","Mod. satisfied", "Very satisfied"),
labels = c("Very Dissat","Litt Dissat","Mod Sat","Very Sat")))
summary(Gss1$satjob)
## Very Dissat Litt Dissat Mod Sat Very Sat NA's
## 132 341 1301 1744 1697
#Data Manipulation turn categorical varable to numeric for readability (hrs1)
Gss1$hrs1 <- as.numeric(as.character(Gss1$hrs1))
summary(Gss1$hrs1)# GSS hrs1: IF WORKING, FULL OR PART TIME: How many hours did you work last week, at all jobs?
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 35.00 40.00 41.08 50.00 89.00 2188
#Data Manipulation (rename,relevel) for readability (wrkstat), select only full & part time
levels(Gss1$wrkstat) <- c("Keeping house", "Other",
"Not Working", "Other",
"Not Working", "Not Working",
"Working fulltime","Working parttime",NA)
Gss1$wrkstat <- factor(Gss1$wrkstat, levels = c("Working fulltime","Working parttime"))
summary(Gss1$wrkstat) #Gss Wrkstat: Last week were you working full time, part time, going to school, keeping house, or what?
## Working fulltime Working parttime NA's
## 2455 604 2156
In order to understand the relationship between employment type and happiness, it is important to understand some factors that show the benefits of being self-employed. One of which is making your own schedule, to do this, we will show a correlation between the numbers of hours worked and status of the individuals labor force (wrkstat).
lm(hrs1 ~ wrkstat, data = Gss1) %>% summary()
##
## Call:
## lm(formula = hrs1 ~ wrkstat, data = Gss1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.248 -5.248 -3.837 4.752 65.163
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.2482 0.2368 191.12 <2e-16 ***
## wrkstatWorking parttime -21.4111 0.5367 -39.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.69 on 3025 degrees of freedom
## (2188 observations deleted due to missingness)
## Multiple R-squared: 0.3447, Adjusted R-squared: 0.3445
## F-statistic: 1591 on 1 and 3025 DF, p-value: < 2.2e-16
As the regression model shows that the explanatory variable (x) or hours worked and the response variable (y) or wrkstat, shows the relationship to each other.
The following regression can be read like this:
hrs1-hat = 45 - 21 * partime
The y-intercept, in this model, is 45 (hrs) which is the expected number of hours worked for someone that is working fulltime and less by 21 hrs if worked part time.
The regression, itself, has a medium positive relationship, as shown in the multiple r-square of .34, which means there is a moderate relationship between the two variables.
CrossTable(Gss1$wrkstat,Gss1$wrkslf, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ========================================================
## Gss1$wrkslf
## Gss1$wrkstat Self-employed Someone else Total
## --------------------------------------------------------
## Working fulltime 254 2201 2455
## 68.1% 81.9%
## --------------------------------------------------------
## Working parttime 119 485 604
## 31.9% 18.1%
## --------------------------------------------------------
## Total 373 2686 3059
## 12.2% 87.8%
## ========================================================
In the contingency table above, shows is the relationship between the variables wrkstat and wrkslf in which the explanatory variable (cause) is wrkslf and the response variable (effect) is wrkstat.
As the table shows, 31.9% of individuals that are self-employed are working part time and 81.9% of individuals that are working for someone else are working full time.
The relationship shows that people that are self-employed work less reported hours than those that work for someone else.
One thought is how do you define hours worked, another saying to think about is “if You Love What You Do, You’ll Never Work A Day In Your Life” in which those that work for themselves might not consider what they to as “work” but in a office setting its still considered work time, like eating lunch in some companies is still considered work time.
CrossTable(Gss1$satjob,Gss1$wrkslf, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE,format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===================================================
## Gss1$wrkslf
## Gss1$satjob Self-employed Someone else Total
## ---------------------------------------------------
## Very Dissat 11 114 125
## 2.8% 3.7%
## ---------------------------------------------------
## Litt Dissat 22 310 332
## 5.7% 10.1%
## ---------------------------------------------------
## Mod Sat 109 1168 1277
## 28.0% 38.2%
## ---------------------------------------------------
## Very Sat 247 1465 1712
## 63.5% 47.9%
## ---------------------------------------------------
## Total 389 3057 3446
## 11.3% 88.7%
## ===================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 34.48811 d.f. = 3 p = 1.56e-07
##
## Minimum expected frequency: 14.11056
In the contingency table above, shows is the relationship between the variables satjob and wrkslf in which the explanatory variable (cause) is wrkslf and the response variable (effect) is satjob.
As the table shows 63.5% of individuals that are self-employed have very high job satisfaction, whereas 47.9% of those that work for someone else are feeling the same way. Conversely, despite the relatively small difference in percentages, 2.8% of those that are self-employed have are very dissatisfied with what they do and 3.7% of those that work for someone else feel the same way.
Digressing a bit, but what could explain this is that those that are self-employed, do what they want in pursuit of their objective which results in higher satisfaction whereas working for someone else sometimes one has to play in office politics which results in lesser amounts of very high satisfaction.
Inference
How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.
The null hypothesis (H0) that we are making is that there is no relationship between wrkslf and satjob The alt hypothesis (H1) that we are making is that there is a relationship between wrkslf and satjob
The df has already been calculated by the chisq test which gives us df = 2.
To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.
To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 7.815 as our critical value.
This step is done by using chisq = TRUE in our contingency table.
The chi-square statistic is 34.49 and the critical value from step 3 is 7.82.
Since 34.49 > 7.82, we can know make an inference upon the population.
Since the chi-square is greater than cv, it is highly unlikely (p = 1.56 * 10^-7) to draw such a sample from a population in which there is not relationship so we must reject the null hypothesis (wrkslf & satjob are not related), and accept the alternative that there is a relationship between wrkslf and satjob to conclude that the independent variable (wrkslf) and the dependent variable (satjob) are related in the population.
CrossTable(Gss1$happy,Gss1$wrkslf, prop.r = F,prop.c = T, prop.t = F, prop.chisq = F, chisq = TRUE,format = "SPSS")
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==================================================
## Gss1$wrkslf
## Gss1$happy Self-employed Someone else Total
## --------------------------------------------------
## Not 70 674 744
## 13.3% 15.0%
## --------------------------------------------------
## Somewhat 287 2519 2806
## 54.6% 56.1%
## --------------------------------------------------
## Very 169 1300 1469
## 32.1% 28.9%
## --------------------------------------------------
## Total 526 4493 5019
## 10.5% 89.5%
## ==================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 2.743327 d.f. = 2 p = 0.254
##
## Minimum expected frequency: 77.9725
In the contingency table above, shows is the relationship between the variables happy and wrkslf in which the explanatory variable (cause) is wrkslf and the response variable (effect) is happy.
This table does not show much variance in terms of the percentages displayed as the majority of the table is similar in percentages across the table. As 13.3% individuals that are self-employed are not happy and 15% of individuals that work for someone else feel the same way.
Inference
How well does this sample represent the population in question? To do this, we have conducted a chi-square test which allows for us to determine whether or not the sample provided gives convincing evidence to represent the population.
The null hypothesis (H0) that we are making is that there is no relationship between wrkslf and happy. The alt hypothesis (H1) that we are making is that there is a relationship between wrkslf and happy.
The df has already been calculated by the chisq test which gives us df = 2.
To set a significance level alpha, we need to know the confidence level, which we will assume to be 95% and to get the significance level alpha, we subtract 1 minus the confidence level (1-.95) or .05.
To find this value, we will use this critical value chart. From this chart, since our df is 8 and our alpha is .05, we can use 5.991 as our critical value.
This step is done by using chisq = TRUE in our contingency table.
The chi-square statistic is 2.74 and the critical value from step 3 is 5.9.
Since 2.74 < 5.9, we can know make an inference upon the population.
Since the chi-square is less than cv, it is highly likely (p = .254) to draw such a sample from a population in which there is not relationship so we cannot reject the null hypothesis (wrkslf & happy are not related), and accept the alternative that there is a relationship between degree and happy and to conclude that the independent variable (wrkslf) and the dependent variable (happy) are related in the population.
The analysis of the different types of relationship with happiness shows some interesting tidbits on what effects one happiness. As we know in this time of the pandemic Covid-19, it is imperative to find happiness despite of what is happening which is a difficult task even without this event occurring that makes it harder.
Is there a relationship between income and happiness?
The statistics show that for the given dataset that examines happiness and income, it can be said that is a relationship between the two variables, however it is not the only factor. As it shows that as one’s earns more income the percentages of those that are not happy decreases and inversely the amount of those that are very happy increases.
Is there a relationship between degree and happiness?
The statistics show that for the given dataset that examines happiness and degree, it can be said that is a relationship between the two variables, however it is not the only factor. As it shows that as one achieves higher degree types, it results in lower percentages of those that are not happy.
Is there a relationship between wrkslf and happiness?
The statistics show that for the given dataset that examines happiness and wrkslf, it can be said that is no relationship between the two variables.