library(ggplot2)
library(dplyr)
library(statsr)load("gss.Rdata")In 1972 the GSS (General Social Survey) was started. It’s collect data about American society in order to explain and research attitude and behaviors. The targeted population was the 18+ adult of the US.
Generalization : Data were collected by random sampling. Since this is not a random experiment of the population we can’t generalize to the entire population of the US.
From the GSS data, we can verify the financial satisfaction of the respondent’s changes as the number of children in a family increases from 0.I am curious to see if there are any significant findings in the dataset. Since it is observational, we can not make any causal relation between these variables.
From the GSS dataset we have used the three variables childs , satfin, year for this analysis. First we have to complete the filtering missing values NA .Then we will summarize and visualize
gss %>%
filter(!is.na(childs) &
!is.na(satfin) &
!is.na(year) &
satfin != "More Or Less") %>%
select(childs,satfin,year) -> gss_satfin
dim(gss_satfin)## [1] 29183 3
ggplot(data=gss_satfin,aes(x=year,y=childs)) + geom_smooth(aes(fill=satfin))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Observations: The Satisfied group experienced has shown about 8 years where the mean was higher than Not At All ( 1984-1988,1992-1996.)
The late 80s and early 90s show a convergence of the two means, but overall, Not At All Satisfied has shown many more years with a higher mean. * * *
Null hypothesis: The number of children and financial satisfaction are independent.
Alternative hypothesis: The number of children and financial satisfaction are dependent.
As we have two satisfaction groups and 8 number of children groups, the hypothesis test to be performed is the chi-sq test of independence.
chisq.test(gss_satfin$childs,gss_satfin$satfin)$expected## gss_satfin$satfin
## gss_satfin$childs Satisfied Not At All Sat
## 0 4113.9809 3735.0191
## 1 2380.6474 2161.3526
## 2 3742.8892 3398.1108
## 3 2395.8474 2175.1526
## 4 1276.2828 1158.7172
## 5 623.2034 565.7966
## 6 304.5258 276.4742
## 7 183.4493 166.5507
## 8 275.1739 249.8261
From the above table, the expected counts are above the minimum required of 5 for each cell.
Degrees of Freedom: The degrees of freedom is given by 8 (= (9-1)*(2-1)). All the conditions to perform chi-square test of independence are satisfied.
chisq.test(gss_satfin$childs,gss_satfin$satfin)##
## Pearson's Chi-squared test
##
## data: gss_satfin$childs and gss_satfin$satfin
## X-squared = 93.573, df = 8, p-value < 2.2e-16
Findings With a p-value of almost zero, there is have strong evidence to reject the null hypothesis. Hence, we have convincing evidence to state that the number of children in a family and the financial satisfaction are dependent in the U.S.
Conclusion Considering the rejection of the null hypothesis, it will be wise for me to consider the overall number of children that I would like to have when it comes to overall financial satisfaction over the life time. It could be benefical to add additonal factors into follow on research (i.e., income of respondent, age of respondent, overall life satisfaction of respondent,) in order to properlly understand components that could allow one to maximize financial and overall well being, while also maximizing family size.