Research Proposal

Research Question

Is there a relationship between one’s political stand (party id) and family income in constant dollars?

Data - Citation

General Social Survey Cumulative File, 1972-2012 Coursera Extract. Modified for Data Analysis and Statistical Inference course (Duke University).

R dataset could be downloaded at http://bit.ly/dasi_gss_data.

load(url("http://bit.ly/dasi_gss_data"))

Data Collection

The study spans 40 years and nearly every decade the collection process was modified

The data were collected from United States’ metropolitan and rural areas with household interview. Multiple level of stratification for region, race, age, income and sex was employed to guarantee a random sample. Each year were collected about 1500-2000 cases, with a slight increment in recent years.

Data - Cases (observational/experimental units)

Kind of cases being involved while evaluation

The cases are adult persons resident in United States and interviewed in their household.

Data - Variables

Types of variables for research analysis

Type of variable: categorical, ordinal.

Levels: Lt High School, High School, Junior College, Bachelor, Graduate

Data - Type of study

The study consists in interviews to a random sample of United States residents about their economic condition, their working status, their health, their beliefs, etc. So the study is observational.

Data - Scope of inference - generalizability

The population of interest is composed by all United States residents. The study employed random sampling, so the results could be generalized to the entire the population.

Data - Scope of inference - causality

The study is observational, so we can only establish association links and not causal ones between the variables of interest.

Exploratory Data Analysis

The dataset, with only the partyid and coninc columns and filtered for NAs values, has 50393 cases.

partyid:

partyid is a categorical variable. We summarize it with table and plot.

table(gss$partyid)
## 
##    Strong Democrat   Not Str Democrat       Ind,Near Dem 
##               9117              12040               6743 
##        Independent       Ind,Near Rep Not Str Republican 
##               8499               4921               9005 
##  Strong Republican        Other Party 
##               5548                861
plot(gss$partyid)

plot of chunk unnamed-chunk-2

We can see that not strong democrat and not strong republican have the most instances.

Family Income in constant USD:

Family Income in constant USD is numerical continuous variable. We summarize it with summary and histogram.

summary(gss$coninc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     383   18400   35600   44500   59500  180000    5829
hist(gss$coninc)

plot of chunk unnamed-chunk-3

Data Set

head(gss)[,c(27,29)]
##   coninc          partyid
## 1  25926     Ind,Near Dem
## 2  33333 Not Str Democrat
## 3  33333      Independent
## 4  41667 Not Str Democrat
## 5  69444  Strong Democrat
## 6  60185     Ind,Near Dem