# load data
<- read.csv(file = 'https://raw.githubusercontent.com/pkofy/DATA606/main/Data%20Project/F_SCH_SB_2020_latest.csv') schsb
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Are pension plans with higher liabilities more likely to be better funded? How does that vary by pension plan type: Single, Multi-, or Multiple Employer?
What are the cases, and how many are there?
Each case is a Schedule SB that was filed with the Department of Labor regarding a defined benefit pension plan’s funding status using a 2020 Form 5500. We have 39,524 cases.
Describe the method of data collection.
The data is compiled by the Employee Benefits Security Administration to comply with the Freedom of Information Act. The data is not processed, it’s the raw data fields from the electronic submissions.
What type of study is this (observational/experiment)?
This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
The data is stored on the DOL’s website. I’ve selected the Schedule SB information from the 2020 Form year filings as the 2021 filings have not all been submitted.
What is the response variable? Is it quantitative or qualitative?
The response variable is quantitative. It’s the funding percentage of
the plan represented by the column SB_ADJ_FNDNG_TGT_PRCNT
,
for example “101.49” means a plan that’s over 100% funded.
You should have two independent variables, one quantitative and one qualitative.
The quantitative independent variable is the size, or liability, of
the plan. This is represented by the column,
SB_TOT_FNDNG_TGT_AMT
, for example “1011315” means the plan
has a liability of $1,011,315.
The qualitative independent variable is the type of the plan. This is
represented by the column SB_PLAN_TYPE_CODE
, for example
“1” means the plan is a Single Employer plan. The other values, 2 and 3,
represent Multi- and Multiple Employer plan types. Multiemployer plans
are for two or more similar employers sharing in the same benefit plan
so that say a trucker working for multiple trucking companies can have
one benefit plan instead of small benefits in many plans. A multiple
employer plan is for two or more unrelated employers to help share the
administrative cost of running a pension plan.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
$SB_ADJ_FNDNG_TGT_PRCNT <- as.numeric(schsb$SB_ADJ_FNDNG_TGT_PRCNT)/100
schsb
qplot(schsb$SB_ADJ_FNDNG_TGT_PRCNT,
xlab="Funding Percentage",
xlim=c(0,3),
bins = 50)
$SB_TOT_FNDNG_TGT_AMT <- as.numeric(schsb$SB_TOT_FNDNG_TGT_AMT)
schsb
qplot(schsb$SB_TOT_FNDNG_TGT_AMT,
xlab="Pension Plan Liability in Dollars",
xlim=c(0,10000000),
bins = 50)
table(schsb$SB_PLAN_TYPE_CODE)
##
## 1 2 3
## 39378 58 88