Data Steps

  • Drop all rural census tracts.
  • Create a variable that measures the growth of median home value from 2000 to 2010.
  • Omit cases that have a median home value less than $1,000 in 2000.
  • Omit cases with growth rates above 200%.
  • Print summary statistics about median home values in 2000 and 2010.
  • Visualize the distribution of changes across all urban tracts between 2000 and 2010 (these are replications of steps in the tutorial as well).
Statistic Min Pctl(25) Median Mean Pctl(75) Max
MedianHomeValue2000 1,280 105,661 154,884 187,119 224,337 1,288,551
MedianHomeValue2010 9,999 123,200 193,200 246,570 312,000 1,000,001
MHV.Change.00.to.10 -1,228,651 7,187 36,268 60,047 94,881 1,000,001
MHV.Growth.00.to.12 -97 6 25 34 50 17,494

# Part 1 - Select Three IVs

  1. Percentage of college-educated individuals in 2000 (p.col).

Hypothesis: Higher levels of college-educated individuals in 2000 will predict a larger increase in home value between 2000 and 2010.

  1. Percentage of vacant housing units in 2000 (p.vacant).

Hypothesis: Higher levels of vacant housing units in 2000 will predict a larger increase in home value between 2000 and 2010.

  1. Percentage of the population employed in professional fields in 2000 (p.prof).

Hypothesis: Higher levels of employment in professional fields in 2000 will predict a larger increase in home value between 2000 and 2010.

Part 02 - Variable Skew

I have decided to apply the log10 transformation to each variable. This choice facilitates interpretation and also significantly reduces the skew in the distributions. Although some skewness remains, the transformed variables exhibit a more acceptable degree of skewness compared to their original distributions.

Statistic Min Pctl(25) Median Mean Pctl(75) Max
p.col 0.00 11.88 21.11 25.67 35.89 100.00
p.vacant 0.00 3.10 5.11 7.17 8.53 100.00
p.prof 0.00 22.69 31.35 33.55 42.87 100.00

Part 03 - Multicollinearity

Test for mulitcollinearity:
Dependent variable:
mhv.growth
(1) (2) (3)
Log College Graduates 2.67*** 2.34**
(0.46) (1.15)
Log Professional Occupations 3.67***
(0.55)
Log Vacant Housing 4.27*** 2.45
(0.74) (1.84)
Constant 25.92*** 23.04*** 19.72***
(0.63) (1.11) (1.65)
Observations 58,822 58,797 58,788
Adjusted R2 0.001 0.001 0.001
Residual Std. Error 35.17 (df = 58820) 35.16 (df = 58795) 35.15 (df = 58784)
Note: p<0.1; p<0.05; p<0.01

There is a strong positive correlation between the percentage of college graduates (log_col) and the percentage of professional workers (log_prof) with a correlation of 0.91. This high correlation suggests that multicollinearity may be a concern when including both variables in a regression model.

In Model 3 the coefficients of both variables (log_col and log_prof) are smaller compared to Model 1 and Model 2. This may be a result of the high correlation between these variables causing them to partially cancel each other out. Even more of concern is that in model 3 log_prof is no longer significant.

The standard errors of the coefficients in Model 3 are larger than in Model 1 and Model 2, indicating increased uncertainty in the estimates when both variables are included.

The adjusted R-squared value does not increase substantially when both variables are included in Model 3, suggesting that the addition of the second correlated variable does not improve the model significantly.

The residual standard error remains almost unchanged across the models.

Given these results, multicollienearity is most likely an issue with this model. This can be addressed by choosing one variable that best captures the underlying construct or creating an index where both variables are combined.

Part 04 - Is the Relationship Linear?

Do you think the relationship between X and Y is a linear relationship, or do you have evidence that the slope changes depending upon the level of X?

I don’t see evidence of a strong linear or a strong non-linear relationship between these variables. It does look as though a small relationship exists between vacancies and MHV.Growth.

Part 04 - Descriptives

Statistic Min Pctl(25) Median Mean Pctl(75) Max
reg.data.mhv.00 1,280 105,661 154,884 187,119 224,337 1,288,551
reg.data.mhv.10 9,999 123,200 193,200 246,570 312,000 1,000,001
reg.data.mhv.change -1,228,651 7,187 36,268 60,047 94,881 1,000,001
reg.data.mhv.growth -97 6 25 29 49 200
reg.data.log_col 0 1 1 1 2 2
reg.data.log_vacant 0 1 1 1 1 2
reg.data.log_prof 0 1 2 1 2 2

What’s the typical change in home value between 2000 and 2010? What’s the largest change in home value between 2000 and 2010?

$36,267.8

What’s the relationship between the change in home value 2000-2010 and the growth in home value 2000-2010?

$1,000,001

What’s the relationship between the change in home value 2000-2010 and the growth in home value 2000-2010?

When using the reg.data dataset the coefficient is 0.78, indicating a strong positive relationship between the change in home value and the growth in home value between 2000 and 2010. If you use the d$ dataset, the correlation coefficient is 0.54, which suggests a moderate positive relationship. This d dataset includes growth rates over 200%. I went with reg.data because it made sense to me to use the dataset that omitted outliers.

Although the two variables have a relatively strong correlation, they do not necessarily measure the same thing. The change in home value represents the difference between home values in 2000 and 2010, while the growth in home value represents the percentage increase or decrease in home values during the same period. The correlation suggests that there is a relationship between the change and growth in home values, but they are not identical measures.

Part 05 - Models

(Despite there being a likely issue with multicollinearity, I’m keeping the log_prof in the model for explainatory purposes)

Regression Results
Dependent variable:
mhv.change mhv.growth
Model 1: Change in Median Home Value Model 2: Median Home Value Growth
(1) (2)
Log College Graduates 1,554.83 (3,951.38) 19.72*** (1.65)
Log Vacant Housing 51,011.43*** (2,740.31) 2.34** (1.15)
Log Professional Occupations -19,708.01*** (1,361.62) 3.67*** (0.55)
log_prof 4,565.85 (4,327.64) 2.45 (1.84)
Observations 59,500 58,788
R2 0.05 0.001
Adjusted R2 0.05 0.001
Residual Std. Error 88,432.31 (df = 59496) 35.15 (df = 58784)
F Statistic 958.51*** (df = 3; 59496) 26.79*** (df = 3; 58784)
Note: p<0.1; p<0.05; p<0.01

Did any of the variables predict changes to home value in a meaningful way (relationship is statistically significant)?

Yes, two of the variables, the number of college graduates (log_col) and vacant housing (log_vacant), predicted changes to home value and home value growth in a meaningful and statistically significant way.

Which variable had the largest impact?

In the first model (change in median home value), the number of college graduates (log_col) had the largest impact. In the second model (median home value growth), vacant housing (log_vacant) had the largest impact.

Did the results match your predictions?

The number of college graduates and vacant housing significantly influenced changes in home values and home value growth, while the number of professional occupations did not have a significant effect.

So, 2/3 predictions were correct.

In a short paragraph explain your findings to a general audience.

This analysis explored the relationship between the change in median home value and the growth in median home value with three variables: % of college graduates, % of vacant housing, and % of professional occupations. The % of college graduates and % of vacant housing significantly influenced changes in home values and home value growth.

In the first analysis (model 1), areas with more college graduates experienced a greater increase in median home value. In contrast, areas with more vacant housing saw a smaller increase or even a decrease in median home value. In the second analysis (model 2), the growth of median home values was also significantly influenced by the number of college graduates and vacant housing. However, the number of vacant houses had a more substantial impact on home value growth in this model.

The number of professional occupations did not have a significant effect on either change in home values or home value growth.

Part 06 - Effect Sizes

I will be calculating the effect sizes using the full models model1 <- lm(mhv.change ~ log_col + log_vacant + log_prof, data = reg.data) and model2 <- lm(mhv.growth ~ log_col + log_vacant + log_prof, data = reg.data) as to interpret the relative importance of each variable in predicting the outcome.

Effect Sizes
Model Log College Graduates Log Vacant Housing Log Professional Occupations
Change in Median Home Value 23315.80752 -7215.335742 1222.3555446
Median Home Value Growth 1.06811 1.343253 0.6547517

I model 1, college graduates has the largest effect size, indicating that it has the most significant impact on the outcome. In model 2, vacant housing has the largest effect size, and the log of professional occupations has the smallest effect size.