Hide Toolbars in the bottom-right corner, for a better view, if you want.Imagine you have a dataset in which each observation is a school. The dataset has the following variables:
api00 – Academic performance index of the school in 2000enroll – Number of students at the schoolmeals – percentage of students at the school who get free mealsfull – percentage of teachers at the school with full teaching credentialWe want to figure out the relationship between the outcome academic performance index and the following three measures: number of students at a school, percentage of students at the school who gets free meals, and percentage of teachers at the school with full teaching credential.
We1 ran a regression that included these four variables and we got the results below.
Question 1: What are the independent variables in the regression above? (give a list of variables)
Answer 1: enroll, meals, full
Question 2: What are the dependent variables in the regression above? (give a list of variables)
Answer 2: api00
Question 3: What is the equation predicted by the regression output above? (write out the equation)
Answer 3: \(api00=-0.05(enroll)-3.66(meals)+1.08(full)+801.83\)
Question 4: What does the regression model predict about a school with 100 students, 0 students who get free meals, and 0 teachers with full credentials? Show all of your work. (show your calculation and answer)
Answer 4: \(api00(100,0,0)=-0.05(100)-3.66(0)+1.08(0)+801.83 \approx 796\) (796 API points, to be precise)
Question 5: What does the number -0.05146 from the table (the coefficient of enroll) mean? (answer in a single full sentence)
Answer 5: For every additional student enrolled in the school, the school’s API is predicted to be lower by 0.05146, controlling for all other independent variables.2
Question 6: Which variables in the model are statistically significant at the p < 0.05 level? (give a list of variables)
Answer 6: All independent variables
Question 7: What does it mean for a variable in a regression model to be statistically significant? (answer in a single full sentence)
Answer 7: It means that the predicted relationship (the slope, also known as coefficient) between the independent variable and the dependent variable is at least 95% likely to be true in the full population from which the sample (the data you have) was drawn.3
Question 8: What do we know about the relationship between api00 and meals in only our sample?
Answer 8: On average in our sample, for each one unit increase in meals, api00 goes down by 3.7 units.
Question 9: What do we know about the relationship between api00 and meals in the population from which our sample was drawn? (Be sure to include some sort of range of possibilities in this answer)
Answer 9: In the population, for each one unit increase in meals, we are 95% confident that api00 goes down by a number of units that is between 3.4 and 3.9.
Question 10: What do we know about the relationship between the actual values of the dependent variable in the dataset and the predicted/fitted values of the dependent variable in the regression model? (answer in 1–3 sentences)
Answer 10: We look at the \(R^2\) statistic for this, which is 0.83 in this regression. \(R^2\) is a measure of the goodness-of-fit of our regression model. This means that the independent variables in this model predict 83% of the variation in the dependent variable. \(R^2\) is also related to the correlation of the actual and predicted values (of the dependent variable) in this regression: \(\sqrt{R^2} = \sqrt{0.83} = R = 0.91\). This value of \(R^2\) is extremely high and rare in social science data analysis.
We didn’t actually do it. It came from this source: Introduction to Regression in R (Part1, Simple and Multiple Regression). IDRE Statistical Consulting Group.↩
This is a the slope of the relationship between api00 and enroll↩
The process of using the results of your statistical test or model—in this case a linear regression—to learn something about a broader population is called inference. We use standard errors, confidence intervals, and p-values to do inferential statistics.↩