Introduction

GSS Data - Race vs Income and Financial Satisfaction Analysis

This project is to study the relationship between the race vs respondent’s income and personal financial situation.

The source of research data is from General Social Survey (GSS), which is a sociological survey applied on US residents in order to collect data on demographic characteristics and behavior. By studying the survey, one could learn some interesting insights of American society.

Data Preparation

The General Social Survey (GSS) has provided politicians, policymakers, and scholars with a clear and unbiased perspective on what Americans think and feel about such issues as income, national spending priorities, crime and punishment, intergroup relations, and confidence in institutions

library(treemap)
library(tidyverse)
library(sqldf)
library(ggplot2)

load(url("http://bit.ly/dasi_gss_data"))
data <- gss %>% select("race","coninc","satfin") %>% filter(race != "NA") %>% filter(coninc != "NA")  %>% filter(satfin != "NA")

race <- sqldf("select race,count(*) as count from data group by race")
satfin1 <- sqldf("select satfin,race,count(*) as count from data group by satfin,race")

dim(gss)
## [1] 57061   114
head(data)
##    race coninc         satfin
## 1 White  25926 Not At All Sat
## 2 White  33333   More Or Less
## 3 White  33333      Satisfied
## 4 White  41667 Not At All Sat
## 5 White  69444      Satisfied
## 6 White  60185   More Or Less

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Research Question 1: Does race influences in money a person makes? in that case what is the relationship between race and income
Research Question 2: Does race affect the personal financial satisfaction levels of the public and its relationship

Cases

What are the cases, and how many are there?

The data is composed of 57,061 cases (rows) and 114 variables (columns) and each row corresponds to a person surveyed

Data Collection

Describe the method of data collection.

The GSS data was collected by computer-assisted personal interview (CAPI), face-to-face interview and telephone interview of adults (18+) in randomly selected households.

Type of study

What type of study is this (observational/experiment)?

This is an observational Study because it can establish only correlation between the variables examined and not causation

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

The General Social Survey (GSS) has provided politicians, policymakers, and scholars with a clear and unbiased perspective on what Americans think and feel about such issues as income, national spending priorities, crime and punishment, intergroup relations, and confidence in institutions
http://bit.ly/dasi_gss_data

Response

What is the response variable, and what type is it (numerical/categorical)?

satfin: Records whether the respondent is personally satisfied with their financial situation. (categorical)
coninc: Records the family continuous income (continuous numerical)

Exploratory

What is the explanatory variable, and what type is it (numerical/categorical)?

race: Records the race of the respondent (categorical)

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(data)
##     race           coninc                  satfin     
##  White:38791   Min.   :   383   Satisfied     :13660  
##  Black: 6381   1st Qu.: 18241   More Or Less  :20874  
##  Other: 2120   Median : 35471   Not At All Sat:12758  
##                Mean   : 43959                         
##                3rd Qu.: 58849                         
##                Max.   :180386
#treemap
treemap(dtf = race,
        index=c("race"),
        vSize="count",
        vColor="count",
        palette="Pastel2",
        type="value",
        border.col=c("grey70", "grey90"),
        fontsize.title = 18,
        algorithm="pivotSize",
        title ="Fig1: Race Distribution",
        title.legend="Count")

Fig1 : The distribution of race column, which is the variable self-declaration of their race, and it has a highest concentration in white.

#histogram
ggplot(data, aes(x=coninc)) + geom_histogram(binwidth=5000, colour="black") + xlab(" Continous Income") + ggtitle("Fig2: Family Income") + theme(plot.title = element_text(hjust = 0.5))

Fig2: The distribution for the family income is right-skewed and there is no negative income, we can say that count of respondents to decrease as the income increases

#box plot
ggplot(data, aes(x=race, y=coninc, fill=race)) + geom_boxplot(alpha=0.2,notch=TRUE) + xlab("Race") + ylab("Income") + ggtitle("Fig3: Family Income vs Race") + theme(plot.title = element_text(hjust = 0.5))

Fig3: From the boxplot, it seems that there is a great similarity in the relationship between income and races.

#density
ggplot(data, aes(coninc, color = race)) + geom_density (alpha = 0.1) + labs(title = "Fig4: Density - Family Income vs Race") + labs(x = "Family Income", y = "Density") + theme(plot.title = element_text(hjust = 0.5))

Fig4: On Comparing the Fig3, we can observe an overlapping income distribution across races

#plotting the data
ggplot(satfin1, aes(race, count, fill = satfin)) + geom_col() + labs(x="Race", y="Financial Satisfaction Level") + theme(plot.title = element_text(hjust = 0.5)) + labs(title = "Fig5: Financial Satisfaction Level vs Race")

Fig5: It appears that proportionally, black and other people are the most unsatisfied with their financial situation and other hand White people are most satisfied.