Data Preparation

library(foreign)
library(dplyr)
library(psych)
library(ggplot2)

# Load data.
sspa <- read.dta("G:\\CUNY\\GitHub\\CUNY-DATA606\\Project\\sppa2012_public_stata.dta")

# Select relevant variables.
arts <- sspa %>% 
  select(DadEducation = PEE11A, MomEducation = PEE11B, 
         Concerts = PTC1Q3B, Books = PTC1Q13B) %>% 
  filter(!is.na(DadEducation) & !is.na(MomEducation) & (!is.na(Books) | !is.na(Concerts)))

# Drop unused factor levels.
arts$DadEducation <- as.factor(as.character(arts$DadEducation))
arts$MomEducation <- as.factor(as.character(arts$MomEducation))

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Is parents’ education level predictive of individual’s participation in the arts?

Cases

What are the cases, and how many are there?

Each case is an individual responding to the survey. There are 35,735 cases in the data set. There are 2,397 cases with both parents’ education level reported and with at least one of response variables available.

Data collection

Describe the method of data collection.

Data is collected by the National Endowment for the Arts through the survey, Survey of Public Participation in the Arts (SPPA), administered in July 2012 as a supplement to the U.S. Census Bureau’s Current Population Survey, and is nationally representative.

Type of study

What type of study is this (observational/experiment)?

This is an observational study.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

Data is collected by U.S. Census Bureau on behalf of NEA and is available online here: https://www.arts.gov/publications/additional-materials-related-to-2012-sppa. For the project data was downloaded in STATA format from the NEA site.

NEA Office of Research & Analysis, 2012 SPPA Questionnaire.

Triplett, T. (October 2014). 2012 SPPA Public-Use Data File User’s Guide. Statistical Methods Group. Urban Institute.

Response

What is the response variable, and what type is it (numerical/categorical)?

Two response variable selected for this research are number of classical music performances attended and number of books read in the last 12 months. Both are numerical.

Explanatory

What is the explanatory variable, and what type is it (numerical/categorical)?

Two explanatory variables are the highest degree or level of school completed by father and the highest degree or level of school completed by mother. Both are categorical.

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

describe(arts$Books)
##    vars    n  mean    sd median trimmed  mad min max range skew kurtosis
## X1    1 2358 15.27 22.03      6    9.87 5.93   1 100    99 2.64     6.82
##      se
## X1 0.45
describe(arts$Concerts)
##    vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 438 2.54 2.56      2    1.93 1.48   1  12    11 2.48     6.01 0.12
describeBy(arts$Books, group = arts$DadEducation, mat = TRUE)
##     item                        group1 vars   n     mean       sd median
## X11    1   Advanced or graduate degree    1 224 17.08482 22.18849     10
## X12    2 College graduate (BA, AB, BS)    1 425 15.42353 20.79537     10
## X13    3 High school graduate (or GED)    1 852 15.50117 22.38208      6
## X14    4           Less than 9th grade    1 313 15.08307 23.47037      6
## X15    5                  Some College    1 321 13.70405 20.37269      6
## X16    6              Some high school    1 223 14.75336 23.04944      5
##       trimmed     mad min max range     skew kurtosis        se
## X11 12.188889 10.3782   1 100    99 2.461469 6.141316 1.4825309
## X12 10.551320  8.8956   1 100    99 2.815778 8.139015 1.0087238
## X13 10.058651  5.9304   1 100    99 2.537260 6.257077 0.7667976
## X14  9.171315  5.9304   1 100    99 2.543586 5.903608 1.3266240
## X15  8.704280  5.9304   1 100    99 2.957886 8.959944 1.1370927
## X16  8.815642  4.4478   1 100    99 2.588064 6.222084 1.5435047
describeBy(arts$Concerts, group = arts$DadEducation, mat = TRUE)
##     item                        group1 vars   n     mean       sd median
## X11    1   Advanced or graduate degree    1  72 2.472222 2.239040      2
## X12    2 College graduate (BA, AB, BS)    1 106 2.500000 2.545771      2
## X13    3 High school graduate (or GED)    1 139 2.618705 2.722413      2
## X14    4           Less than 9th grade    1  33 3.242424 2.750344      2
## X15    5                  Some College    1  57 2.333333 2.437309      1
## X16    6              Some high school    1  31 2.096774 2.534493      1
##      trimmed    mad min max range     skew kurtosis        se
## X11 2.017241 1.4826   1  12    11 2.415449 6.752904 0.2638734
## X12 1.906977 1.4826   1  12    11 2.564474 6.548708 0.2472672
## X13 1.955752 1.4826   1  12    11 2.461160 5.490953 0.2309120
## X14 2.703704 1.4826   1  12    11 1.598541 2.017473 0.4787735
## X15 1.787234 0.0000   1  12    11 2.525441 6.545947 0.3228295
## X16 1.440000 0.0000   1  12    11 2.867085 7.605388 0.4552084
ggplot(data = subset(arts, !is.na(Books)), aes(x = Books)) + 
  geom_histogram(bins = 15)

ggplot(data = subset(arts, !is.na(Concerts)), aes(x = Concerts)) + 
  geom_histogram(bins = 15)

table(arts$DadEducation, useNA = "ifany")
## 
##   Advanced or graduate degree College graduate (BA, AB, BS) 
##                           229                           431 
## High school graduate (or GED)           Less than 9th grade 
##                           869                           315 
##                  Some College              Some high school 
##                           323                           230
table(arts$MomEducation, useNA = "ifany")
## 
##   Advanced or graduate degree College graduate (BA, AB, BS) 
##                           141                           413 
## High school graduate (or GED)           Less than 9th grade 
##                          1046                           233 
##                  Some College              Some high school 
##                           345                           219