Introduction:

This research will try to find the relationship between education degree level and satisfaction with financial situation.Generally, most people believe that individuals with higer degree usually have higher income and then have more satisfaction with their financial situation. However, there is also a theory related to social status, which claims that individuals who have higer education degree usually get busy and expensive life, then become more stressful on their financial situation. Based on data from GSS,this research will try to testify whether people’s financial satisfaction levels are associated with people’s education degree.If so, it will try to testify whether people who have higer education will tend to give more satisfaction response on their financial situation. The result of this research might help other people to justify their belief about relationship between education and finnacial satisfaction. This justifaction might guide their sefl-incentive on wheter they should work hard to get higher education degree.

Data:

Data Background

The data is extracted from General Social Survey which has been monitoring societal change and studying the growing complexity of American society since 1972. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting. you can load data in R or Rstudio: load(url(“http://bit.ly/dasi_gss_data”)) or download: http://d396qusza40orc.cloudfront.net/statistics/project/gss.Rdata In this data set: degree: means - RS highest degree satfin: means - Satifaction with financial situation #### Data Collection First load the data, and then use Rstudio to get the subset for this course.

setwd("~/Desktop/Data Science/Data Analysis and Inference")
load(url('http://d396qusza40orc.cloudfront.net/statistics/project/gss.Rdata'))
gss_subset <- subset(gss,select=c('caseid','degree','satfin'))

Data review

The cases of data used in this research are units of observation as they get through inquiry from questionnaire. To answer the question in this research, I have to answer whether respondent who claims to have higher education degree also claims to have higher financial satisfaction.My first variable will be the highest degree that respondents claim to have The second variable will be satisfaction degree that respondents claim to have.The type of study is an observational study as it draws inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator.I will compare the degree of satisfaction between respondents who have bachelor or higher degree and respondents who have college or lower degree.The population of interest are residents of the United States. The findings from this analysis can be generalized to that population, as the data are collected by randomly and through the comprehensive demographic range, region area and a very long time. However, there are some potential sources of bias that might prevent generalizability: 1) The respondents might lie when answering the questions in the survey, 2) some missing data might bring trouble for analysing and making inferences justly.The data can not be sued to establish causal links between the variables of interest as there might be third factor to produce the correlation relationship between the two variables, for example, the annual income of individual or family might make respondents have higher education and higher finance satisfaction at the same time.

Exploratory data analysis:

Before exploring the data, it will be helpful to have a glance at the data.

str(gss_subset)
## 'data.frame':    57061 obs. of  3 variables:
##  $ caseid: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ degree: Factor w/ 5 levels "Lt High School",..: 4 1 2 4 2 2 2 4 2 2 ...
##  $ satfin: Factor w/ 3 levels "Satisfied","More Or Less",..: 3 2 1 3 1 2 2 3 2 3 ...
summary(gss_subset)
##      caseid                 degree                 satfin     
##  Min.   :    1   Lt High School:11822   Satisfied     :15344  
##  1st Qu.:14266   High School   :29287   More Or Less  :23176  
##  Median :28531   Junior College: 3070   Not At All Sat:13934  
##  Mean   :28531   Bachelor      : 8002   NA's          : 4607  
##  3rd Qu.:42796   Graduate      : 3870                         
##  Max.   :57061   NA's          : 1010

After a brief look, it is known that there are 57061 observations and 3 variables.Unfortunately, there are 4607 NAs. This means that the research has to be analyzed carefully when it comes to exclusion of NAs. Here is the table for all varaibales this research will investigate

deg_sat<-table(gss_subset$degree,gss_subset$satfin)
deg_sat
##                 
##                  Satisfied More Or Less Not At All Sat
##   Lt High School      3065         4710           3388
##   High School         7162        12068           7670
##   Junior College       669         1332            727
##   Bachelor            2669         3171           1393
##   Graduate            1504         1473            497

Data Visualization

plot(gss_subset$degree~gss_subset$satfin)
library(ggplot2)

library(vcd)
## Loading required package: grid
mosaic(~degree+satfin,data=gss_subset,highlighting='satfin',highlighting_fill=c('lightblue','pink'))

Based on the graphic,individuals who have graduate degree might have more satisfied financial situation. This indicates that a deep investigation is needed to find a comparable result between different groups in terms of satisfaction. The following part aims to calculate the proportation of different satisfaction levels for different groups. A data frame will be created to show the quantitative relationships, and finally there will be a data visualization based on quantitiative results.

round(prop.table(deg_sat),4)
##                 
##                  Satisfied More Or Less Not At All Sat
##   Lt High School    0.0595       0.0915         0.0658
##   High School       0.1391       0.2343         0.1489
##   Junior College    0.0130       0.0259         0.0141
##   Bachelor          0.0518       0.0616         0.0270
##   Graduate          0.0292       0.0286         0.0097
#**Of all observants, 5.95% of individuals who has Lt High School degree are satisfied with thier financial situation.13.91% of individuals who has High School degree are satisfied with thier financial situation,etc** 
table2<-round(prop.table(deg_sat,1),4)
table2
##                 
##                  Satisfied More Or Less Not At All Sat
##   Lt High School    0.2746       0.4219         0.3035
##   High School       0.2662       0.4486         0.2851
##   Junior College    0.2452       0.4883         0.2665
##   Bachelor          0.3690       0.4384         0.1926
##   Graduate          0.4329       0.4240         0.1431
#** The table shows the proportation calculated by education degree.**
table3<-round(prop.table(deg_sat,2),4)
table3
##                 
##                  Satisfied More Or Less Not At All Sat
##   Lt High School    0.2034       0.2070         0.2478
##   High School       0.4753       0.5304         0.5609
##   Junior College    0.0444       0.0585         0.0532
##   Bachelor          0.1771       0.1394         0.1019
##   Graduate          0.0998       0.0647         0.0363
#** This table shows the proportation calculated by satisfaction levels.For example, among people who are satisfied, 47.53% of that have High School degree. 

Among these three tables, table 2 are most valualbe since it gives us the proportation of satisfaction in diefferent groups. For instance, 43.29% of individuals who have graduate degree are satisfied with their financial situation, this is much bigger than 24.52 of individuals who have Junior College.

** Here is pie chart for all degrees**

par(mfrow=c(1,5))
pie(table2[5,],main='graduate')
pie(table2[4,],main='Bachelor')
pie(table2[3,],main='Junior College')
pie(table2[2,],main='High School')
pie(table2[1,],main='Lt High School')

Pie charts indicate that there might be relationships between education degrees and financial satisfaction levels.

Inference:

Thie research will use Chi-Squared Test to test the indenpendnce of education degrees and financial satisfaction levels.

Sate Hypothesis

  • Null Hypothesis - people’s financial satisfaction levels are Independent of people’s education degree

  • Alternative Hypothesis - people’s financial satisfaction levels are associated with people’s education degree.

Check the conditions

The conditions of inference will be checked for conducting a hypothesis test to compare two proportions

  • Indenpendence: yes, individuals are random sample and the sample also met.

  • 10% condition: yes, there are 57061 observations, which are less 10% of whol populaiton.

  • Sample Size & Skew: yes, 3026mutiplied by 0.4329 = 1504, 2728 mutiplied by 0.2452= 669.

In this research,Chi-Squared Test was choosed since there are more than three categorical variables. Furthermore, this research just trys to testify the relationships betweeen education degrees and financial satisfaction levels, rather than to compare the satisfaction levels within different groups. Chi-Squared Test is proper for testing the associated relationships between different categorical variables.

chisq.test(deg_sat)
## 
##  Pearson's Chi-squared test
## 
## data:  deg_sat
## X-squared = 944.7575, df = 8, p-value < 2.2e-16
chisq.test((deg_sat))$expected
##                 
##                  Satisfied More Or Less Not At All Sat
##   Lt High School 3266.4423     4932.287      2964.2709
##   High School    7871.2979    11885.561      7143.1415
##   Junior College  798.2491     1205.346       724.4048
##   Bachelor       2116.4720     3195.846      1920.6819
##   Graduate       1016.5386     1534.961       922.5009
deg_sat
##                 
##                  Satisfied More Or Less Not At All Sat
##   Lt High School      3065         4710           3388
##   High School         7162        12068           7670
##   Junior College       669         1332            727
##   Bachelor            2669         3171           1393
##   Graduate            1504         1473            497

The p value is nearly zero, so the null hypothesis should be rejected, and alternative hypothesis,which people’s financial satisfaction levels are associated with people’s education degree, should be accepted.By the way, if null hypothesis were true, there should be 1016 individuals who have graduate degree and are satisfied with their financial situations rather than 1504.

Conclusion:

Under Chi-Squared Test, With such a small p-value 2.2e-16,this research can state that there is relationship between education degree and financial satisfication levels.In addition, there is big difference on financial satisfaction between people who have Graduate degree and people who have Junior College degree.The difference is even bigger than that between people who have Graduate degree and people who have high school degree.This might indicate individuals who have higer education degree usually get busy and expensive life, then become more stressful on their financial situation. Therefore, a further investigation about financial satisfaction levels between Junior college degree and high school degree will be needed in the future.

References

Data Citation: Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1

Persistent URL: http://doi.org/10.3886/ICPSR34802.v1

Here is codebook for all GSS survuy: https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fgss1.html

Appendix

Here is the dataset with 80 rows

head(gss_subset,80)
##    caseid         degree         satfin
## 1       1       Bachelor Not At All Sat
## 2       2 Lt High School   More Or Less
## 3       3    High School      Satisfied
## 4       4       Bachelor Not At All Sat
## 5       5    High School      Satisfied
## 6       6    High School   More Or Less
## 7       7    High School   More Or Less
## 8       8       Bachelor Not At All Sat
## 9       9    High School   More Or Less
## 10     10    High School Not At All Sat
## 11     11    High School Not At All Sat
## 12     12 Lt High School      Satisfied
## 13     13 Lt High School      Satisfied
## 14     14 Lt High School   More Or Less
## 15     15 Lt High School      Satisfied
## 16     16    High School   More Or Less
## 17     17    High School      Satisfied
## 18     18 Lt High School      Satisfied
## 19     19       Bachelor   More Or Less
## 20     20    High School      Satisfied
## 21     21    High School      Satisfied
## 22     22    High School      Satisfied
## 23     23    High School      Satisfied
## 24     24    High School Not At All Sat
## 25     25       Bachelor      Satisfied
## 26     26    High School      Satisfied
## 27     27    High School   More Or Less
## 28     28    High School   More Or Less
## 29     29    High School   More Or Less
## 30     30 Lt High School   More Or Less
## 31     31 Lt High School   More Or Less
## 32     32    High School Not At All Sat
## 33     33       Bachelor      Satisfied
## 34     34 Lt High School      Satisfied
## 35     35    High School      Satisfied
## 36     36    High School   More Or Less
## 37     37    High School      Satisfied
## 38     38 Lt High School      Satisfied
## 39     39 Lt High School      Satisfied
## 40     40    High School   More Or Less
## 41     41 Lt High School      Satisfied
## 42     42    High School Not At All Sat
## 43     43 Lt High School      Satisfied
## 44     44 Lt High School      Satisfied
## 45     45 Lt High School   More Or Less
## 46     46    High School      Satisfied
## 47     47    High School      Satisfied
## 48     48    High School      Satisfied
## 49     49 Lt High School   More Or Less
## 50     50 Lt High School   More Or Less
## 51     51 Lt High School Not At All Sat
## 52     52    High School Not At All Sat
## 53     53 Lt High School   More Or Less
## 54     54    High School   More Or Less
## 55     55 Lt High School      Satisfied
## 56     56    High School   More Or Less
## 57     57    High School Not At All Sat
## 58     58    High School Not At All Sat
## 59     59    High School      Satisfied
## 60     60 Lt High School      Satisfied
## 61     61    High School Not At All Sat
## 62     62 Lt High School   More Or Less
## 63     63    High School   More Or Less
## 64     64 Lt High School   More Or Less
## 65     65    High School   More Or Less
## 66     66 Lt High School      Satisfied
## 67     67    High School Not At All Sat
## 68     68    High School Not At All Sat
## 69     69    High School      Satisfied
## 70     70       Bachelor Not At All Sat
## 71     71           <NA> Not At All Sat
## 72     72    High School   More Or Less
## 73     73 Lt High School   More Or Less
## 74     74 Lt High School      Satisfied
## 75     75    High School Not At All Sat
## 76     76       Bachelor   More Or Less
## 77     77 Lt High School      Satisfied
## 78     78 Lt High School Not At All Sat
## 79     79    High School   More Or Less
## 80     80    High School      Satisfied