Instructions and Information

Quiz

Imagine you have a dataset in which each observation is a school. The dataset has the following variables:

We1 ran a regression that included these four variables and we got the results below.

## Call:
## lm(formula = api00 ~ enroll + meals + full)
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 801.82983   26.42660  30.342  < 2e-16 ***
## enroll       -0.05146    0.01384  -3.719 0.000229 ***
## meals        -3.65973    0.10880 -33.639  < 2e-16 ***
## full          1.08109    0.23945   4.515 8.37e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.8308

Question 1: What are the independent variables in the regression above? (give a list of variables)

Answer 1: enroll, meal full

Question 2: What are the dependent variables in the regression above? (give a list of variables)

Answer 2: api00

Question 3: What is the equation predicted by the regression output above? (write out the equation)

Answer 3: \(api00=-0.05(enroll)-3.66(meals)+1.08(full)+801.83\)

Question 4: What does the regression model predict about a school with 100 students, 0 students who get free meals, and 0 teachers with full credentials? Show all of your work. (show your calculation and answer)

Answer 4: \(api00(100,0,0)=-0.05(100)-3.66(0)+1.08(0)+801.83 \approx 796\) (796 API points, to be precise)

Question 5: What does the number -0.05146 from the table (the coefficient of enroll) mean? (answer in a single full sentence)

Answer 5: For every additional student enrolled in the school, the school’s API is predicted to be lower by 0.05146.2

Question 6: Which variables in the model are statistically significant at the p < 0.05 level? (give a list of variables)

Answer 6: All independent variables

Question 7: What does it mean for a variable in a regression model to be statistically significant? (answer in a single full sentence)

Answer 7: It means that the predicted relationship (the slope, also known as coefficient) between the independent variable and the dependent variable is at least 95% likely to be true in the full population from which the sample (the data you have) was drawn.3

Question 8: What do we know about the relationship between the actual values of the dependent variable in the dataset and the predicted/fitted values of the dependent variable in the regression model? (answer in 1–3 sentences)

Answer 8: We look at the \(R^2\) statistic for this, which is 0.83 in this regression. \(R^2\) is a measure of the goodness-of-fit of our regression model. This means that the independent variables in this model predict 83% of the variation in the dependent variable. \(R^2\) is also related to the correlation of the actual and predicted values (of the dependent variable) in this regression: \(\sqrt{R^2} = \sqrt{0.83} = R = 0.91\). This is extremely high and rare in social science data analysis.


  1. We didn’t actually do it. It came from this source: Introduction to Regression in R (Part1, Simple and Multiple Regression). IDRE Statistical Consulting Group.

  2. This is a the slope of the relationship between api00 and enroll

  3. The process of using the results of your statistical test or model—in this case a linear regression—to learn something about a broader population is called inference. We use standard errors, confidence intervals, and p-values to do inferential statistics.