Is there a relationship between one’s political stand (party id)and family income in constant dollars?
General Social Survey Cumulative File, 1972-2012 Coursera Extract. Modified for Data Analysis and Statistical Inference course (Duke University).
R dataset could be downloaded at http://bit.ly/dasi_gss_data.
load(url("http://bit.ly/dasi_gss_data"))
Citation for the original data:
Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1 Persistent URL: http://doi.org/10.3886/ICPSR34802.v1
The study spans 40 years and nearly every decade the collection process was modified (see http://publicdata.norc.org:41000/gss/documents//BOOK/GSS_Codebook_AppendixA.pdf for details).
The data were collected from United States’ metropolitan and rural areas with household interview. Multiple level of stratification for region, race, age, income and sex was employed to guarantee a random sample. Each year were collected about 1500-2000 cases, with a slight increment in recent years.
The cases are adult persons resident in United States and interviewed in their household.
Party ID:
Answer to the question: “Did you ever get a high school diploma or a GED certificate?”.
Type of variable: categorical, ordinal.
summary(gss$partyid)
## Strong Democrat Not Str Democrat Ind,Near Dem
## 9117 12040 6743
## Independent Ind,Near Rep Not Str Republican
## 8499 4921 9005
## Strong Republican Other Party NA's
## 5548 861 327
str(gss$partyid)
## Factor w/ 8 levels "Strong Democrat",..: 3 2 4 2 1 3 3 3 1 1 ...
Family Income in Constant Dollars:
Inflation-adjusted family income.
Type of variable: numerical, continuous.
summary(gss$coninc)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 383 18440 35600 44500 59540 180400 5829
str(gss$coninc)
## int [1:57061] 25926 33333 33333 41667 69444 60185 50926 18519 3704 25926 ...
The study consists in interviews to a random sample of United States residents about their economic condition, their working status, their health, their beliefs, etc. So the study is observational.
The population of interest is composed by all US residents. The study employed random sampling, so the results could be generalized to the entire the population.
The study is observational, so we can only establish association but not causal links between the variables of interest.
The dataset, with only the partyid and coninc columns and filtered for NAs values, has 50393 cases.
partyid:
partyid is a categorical variable. We summarize it with table and plot.
table(gss$partyid)
##
## Strong Democrat Not Str Democrat Ind,Near Dem
## 9117 12040 6743
## Independent Ind,Near Rep Not Str Republican
## 8499 4921 9005
## Strong Republican Other Party
## 5548 861
plot(gss$partyid)
We can see that not strong democrat and not strong republican have the most instances.
Family Income in constant USD:
Family Income in constant USD is numerical continuous variable. We summarize it with summary and histogram.
summary(gss$coninc)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 383 18440 35600 44500 59540 180400 5829
hist(gss$coninc)
We can see that the distribution is right skewed.
Is there a relationship between political stand and family income in constant dollars?
To explore the relationship between a categorical and a numerical variable, we use ggplot to explore.
library(ggplot2)
ggplot(gss, aes(x=gss$partyid,y=gss$coninc)) + geom_bar(stat="identity")
## Warning: Removed 5829 rows containing missing values (position_stack).
We don’t see a positive or negative correlation between political stand and family income. However, not strong democrat and republican seem to be associated with higher higher family income.
head(gss)[,c(27,29)]
## coninc partyid
## 1 25926 Ind,Near Dem
## 2 33333 Not Str Democrat
## 3 33333 Independent
## 4 41667 Not Str Democrat
## 5 69444 Strong Democrat
## 6 60185 Ind,Near Dem