library(foreign)
library(dplyr)
library(psych)
library(ggplot2)
# Load data.
sspa <- read.dta("G:\\CUNY\\GitHub\\CUNY-DATA606\\Project\\sppa2012_public_stata.dta")
# Select relevant variables.
arts <- sspa %>%
select(DadEducation = PEE11A, MomEducation = PEE11B,
Concerts = PTC1Q3B, Books = PTC1Q13B) %>%
filter(!is.na(DadEducation) & !is.na(MomEducation) & (!is.na(Books) | !is.na(Concerts)))
# Drop unused factor levels.
arts$DadEducation <- as.factor(as.character(arts$DadEducation))
arts$MomEducation <- as.factor(as.character(arts$MomEducation))
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Is parents’ education level predictive of individual’s participation in the arts?
What are the cases, and how many are there?
Each case is an individual responding to the survey. There are 35,735 cases in the data set. There are 2,397 cases with both parents’ education level reported and with at least one of response variables available.
Describe the method of data collection.
Data is collected by the National Endowment for the Arts through the survey, Survey of Public Participation in the Arts (SPPA), administered in July 2012 as a supplement to the U.S. Census Bureau’s Current Population Survey, and is nationally representative.
What type of study is this (observational/experiment)?
This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
Data is collected by U.S. Census Bureau on behalf of NEA and is available online here: https://www.arts.gov/publications/additional-materials-related-to-2012-sppa. For the project data was downloaded in STATA format from the NEA site.
NEA Office of Research & Analysis, 2012 SPPA Questionnaire.
Triplett, T. (October 2014). 2012 SPPA Public-Use Data File User’s Guide. Statistical Methods Group. Urban Institute.
What is the response variable, and what type is it (numerical/categorical)?
Two response variable selected for this research are number of classical music performances attended and number of books read in the last 12 months. Both are numerical.
What is the explanatory variable, and what type is it (numerical/categorical)?
Two explanatory variables are the highest degree or level of school completed by father and the highest degree or level of school completed by mother. Both are categorical.
Provide summary statistics relevant to your research question. For example, if youâre comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
describe(arts$Books)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2358 15.27 22.03 6 9.87 5.93 1 100 99 2.64 6.82
## se
## X1 0.45
describe(arts$Concerts)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 438 2.54 2.56 2 1.93 1.48 1 12 11 2.48 6.01 0.12
describeBy(arts$Books, group = arts$DadEducation, mat = TRUE)
## item group1 vars n mean sd median
## X11 1 Advanced or graduate degree 1 224 17.08482 22.18849 10
## X12 2 College graduate (BA, AB, BS) 1 425 15.42353 20.79537 10
## X13 3 High school graduate (or GED) 1 852 15.50117 22.38208 6
## X14 4 Less than 9th grade 1 313 15.08307 23.47037 6
## X15 5 Some College 1 321 13.70405 20.37269 6
## X16 6 Some high school 1 223 14.75336 23.04944 5
## trimmed mad min max range skew kurtosis se
## X11 12.188889 10.3782 1 100 99 2.461469 6.141316 1.4825309
## X12 10.551320 8.8956 1 100 99 2.815778 8.139015 1.0087238
## X13 10.058651 5.9304 1 100 99 2.537260 6.257077 0.7667976
## X14 9.171315 5.9304 1 100 99 2.543586 5.903608 1.3266240
## X15 8.704280 5.9304 1 100 99 2.957886 8.959944 1.1370927
## X16 8.815642 4.4478 1 100 99 2.588064 6.222084 1.5435047
describeBy(arts$Concerts, group = arts$DadEducation, mat = TRUE)
## item group1 vars n mean sd median
## X11 1 Advanced or graduate degree 1 72 2.472222 2.239040 2
## X12 2 College graduate (BA, AB, BS) 1 106 2.500000 2.545771 2
## X13 3 High school graduate (or GED) 1 139 2.618705 2.722413 2
## X14 4 Less than 9th grade 1 33 3.242424 2.750344 2
## X15 5 Some College 1 57 2.333333 2.437309 1
## X16 6 Some high school 1 31 2.096774 2.534493 1
## trimmed mad min max range skew kurtosis se
## X11 2.017241 1.4826 1 12 11 2.415449 6.752904 0.2638734
## X12 1.906977 1.4826 1 12 11 2.564474 6.548708 0.2472672
## X13 1.955752 1.4826 1 12 11 2.461160 5.490953 0.2309120
## X14 2.703704 1.4826 1 12 11 1.598541 2.017473 0.4787735
## X15 1.787234 0.0000 1 12 11 2.525441 6.545947 0.3228295
## X16 1.440000 0.0000 1 12 11 2.867085 7.605388 0.4552084
ggplot(data = subset(arts, !is.na(Books)), aes(x = Books)) +
geom_histogram(bins = 15)
ggplot(data = subset(arts, !is.na(Concerts)), aes(x = Concerts)) +
geom_histogram(bins = 15)
table(arts$DadEducation, useNA = "ifany")
##
## Advanced or graduate degree College graduate (BA, AB, BS)
## 229 431
## High school graduate (or GED) Less than 9th grade
## 869 315
## Some College Some high school
## 323 230
table(arts$MomEducation, useNA = "ifany")
##
## Advanced or graduate degree College graduate (BA, AB, BS)
## 141 413
## High school graduate (or GED) Less than 9th grade
## 1046 233
## Some College Some high school
## 345 219