Selecting a Data Set

1. Selecting a dataset

I will be using the 2014 General Social Survey dataset which records hundreds of questions about the polled population from the year 2014. These questions can be used together to show different relationships between different variables, that might be relevant for 2014. For example, examining questions about abortion compared to a person’s age and gender, may be helpful in seeing the difference between generational and gender gaps on public opinion about abortions.

2. Data I will be using

As previously mentioned, I would be interested in looking at data that examine the relatively current population’s answers to questions about abortion compared with age, marital status, sex, and education level. The following will be the variables I will be using:

AGE Respondent’s Age SEX Respondent’s Sex MARITAL Marital Status EDUC Highest year of school completed ABDEFECT Abortion if there is a strong chance of a serious defect for the baby ABNOMORE Abortion if the woman is married but wants no more children ABHLTH Abortion if the woman’s health is seriously endangered ABPOOR Abortion if the family cannot afford more children ABRAPE Abortion if the woman became pregnant as a result of rape? ABSINGLE Abortion if the woman does not want to marry the father of the child ABANY Abortion if the woman wants one for any reason

3. Theories that may connect these variables

We can speculate that if we examine the relationships of these variables, females may be more likely to be pro-abortion in all cases, while males may be less likely in all cases. Also, we speculate that the younger a respondent, the more open they may be to supporting abortion for different scenarios. Education level may also show a positive relationship between supporting abortion and amount of schooling completed.

4. Creation of dataset

The dataset I have created is labeled “Abortion” There are still some errors I am processing, see below.

library(Zelig)
library(foreign)
library(DescTools)
d <- read.dta("/Users/laurenberkowitz/Downloads/GSS2014.DTA")

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

names(d)

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:Zelig':
## 
##     combine, summarize
## 
## The following object is masked from 'package:MASS':
## 
##     select
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Abortion <- select(d, age, sex, marital, educ, abdefect, abnomore, abhlth, abpoor, abrape, absingle, abany)
names(Abortion)

##  [1] "age"      "sex"      "marital"  "educ"     "abdefect" "abnomore"
##  [7] "abhlth"   "abpoor"   "abrape"   "absingle" "abany"

5. Regression Analysis

For a regression analysis to begin, I will compare the sex and age of the respondents with their answers to questions of agreement with abortion generally (abany).

female <-factor(Abortion$sex)

demog1 <- lm(abany ~ age, data=Abortion)

## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored

## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

demog2 <- lm(abany ~ age + sex, data=Abortion)

## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored

## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

demog3 <- lm(abany ~ age + sex + educ, data=Abortion)

## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored

## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

library(stargazer)

## 
## Please cite as: 
## 
##  Hlavac, Marek (2014). stargazer: LaTeX code and ASCII text for well-formatted regression and summary statistics tables.
##  R package version 5.1. http://CRAN.R-project.org/package=stargazer

stargazer(demog1, demog2, demog3, type="text")

## 
## ==========================================
##                   Dependent variable:     
##              -----------------------------
##                          abany            
##                 (1)        (2)      (3)   
## ------------------------------------------
## age            0.002      0.002    0.002  
##                                           
##                                           
## sexfemale                 0.023    0.014  
##                                           
##                                           
## educ                               -0.042 
##                                           
##                                           
## Constant       1.460      1.447    2.029  
##                                           
##                                           
## ------------------------------------------
## Observations   1,646      1,646    1,645  
## ==========================================
## Note:          *p<0.1; **p<0.05; ***p<0.01

6. Discussion

From the analysis conducted it appears I have a lot more work to do. Neither the analysis of the relationship between age and abortion in general nor the relationship when controlled for sex appeared to be statistically significant. This means that these relationships have very little to do with each other. Upon further inspection I’d need to see if sex and age had any significant bearing on more specific questions about abortion or if in general those polled in 2014 did not have answers greatly predicted based on their age or sex. Perhaps there are better indicators.

Notes

It is important to note that there were several errors throughout. Since I chose to use an original dataset I’m trying to work out the kinks and I’m not sure yet how to correct for some of the formatting which may need to be different. As I continue to get used to the data in the GSS 2014 dataset, I hope to become more comfortable and produce a better Rmarkdown file.

Homework from 3.2.15