Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)

Load data

load("gss.Rdata")

Part 1: Data

In 1972 the GSS (General Social Survey) was started. It’s collect data about American society in order to explain and research attitude and behaviors. The targeted population was the 18+ adult of the US.

Generalization : Data were collected by random sampling. Since this is not a random experiment of the population we can’t generalize to the entire population of the US.

Part 2: Research question

From the GSS data, we can verify the financial satisfaction of the respondent’s changes as the number of children in a family increases from 0.I am curious to see if there are any significant findings in the dataset. Since it is observational, we can not make any causal relation between these variables.

Part 3: Exploratory data analysis

From the GSS dataset we have used the three variables childs , satfin, year for this analysis. First we have to complete the filtering missing values NA .Then we will summarize and visualize

gss %>%
  filter(!is.na(childs) &
           !is.na(satfin) &
           !is.na(year) &
           satfin != "More Or Less") %>%  
  select(childs,satfin,year) -> gss_satfin

dim(gss_satfin)
## [1] 29183     3
ggplot(data=gss_satfin,aes(x=year,y=childs))  + geom_smooth(aes(fill=satfin))
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Observations: The Satisfied group experienced has shown about 8 years where the mean was higher than Not At All ( 1984-1988,1992-1996.)

The late 80s and early 90s show a convergence of the two means, but overall, Not At All Satisfied has shown many more years with a higher mean. * * *

Part 4: Inference

Null hypothesis: The number of children and financial satisfaction are independent.

Alternative hypothesis: The number of children and financial satisfaction are dependent.

As we have two satisfaction groups and 8 number of children groups, the hypothesis test to be performed is the chi-sq test of independence.

chisq.test(gss_satfin$childs,gss_satfin$satfin)$expected
##                  gss_satfin$satfin
## gss_satfin$childs Satisfied Not At All Sat
##                 0 4113.9809      3735.0191
##                 1 2380.6474      2161.3526
##                 2 3742.8892      3398.1108
##                 3 2395.8474      2175.1526
##                 4 1276.2828      1158.7172
##                 5  623.2034       565.7966
##                 6  304.5258       276.4742
##                 7  183.4493       166.5507
##                 8  275.1739       249.8261

From the above table, the expected counts are above the minimum required of 5 for each cell.

Degrees of Freedom: The degrees of freedom is given by 8 (= (9-1)*(2-1)). All the conditions to perform chi-square test of independence are satisfied.

chisq.test(gss_satfin$childs,gss_satfin$satfin)
## 
##  Pearson's Chi-squared test
## 
## data:  gss_satfin$childs and gss_satfin$satfin
## X-squared = 93.573, df = 8, p-value < 2.2e-16

Findings With a p-value of almost zero, there is have strong evidence to reject the null hypothesis. Hence, we have convincing evidence to state that the number of children in a family and the financial satisfaction are dependent in the U.S.

Conclusion Considering the rejection of the null hypothesis, it will be wise for me to consider the overall number of children that I would like to have when it comes to overall financial satisfaction over the life time. It could be benefical to add additonal factors into follow on research (i.e., income of respondent, age of respondent, overall life satisfaction of respondent,) in order to properlly understand components that could allow one to maximize financial and overall well being, while also maximizing family size.