Analytical Problems

(1) Suppose that you run a regression of Yi on Xi with 102 observations and obtain an estimate for the slope (i.e., βˆ2). Your estimate for the standard error of βˆ2 is 1. You are considering two different hypothesis tests.

The first test is a one-sided test:

\[ H_0: \beta_2 = 0 , H_a: B_2 > 0 , \alpha = 0.05\] \[ H_0: \beta_2 = 0 , H_a: B_2 \neq 0 , \alpha = 0.05\]

Lets start off by stating the Tstat equation I will be using for solving the following hypothesis tests. \[ \lvert T_{stat}\rvert= \lvert{\frac{\hat\beta_2 - \beta_2}{SE(\hat\beta_2)}}\rvert \] \[ -T_{crit} \le \frac{\hat\beta_2 - \beta_2}{SE(\hat\beta_2)} \le T_{crit} \]

  1. What values of βˆ2 would lead you to reject the null hypothesis in the one-sided test? If the Tstat value was greater than 1.96 then we would reject \[H_0\] This is because we have an \[\alpha = 0.05\] for this first case the cofficient for \[\beta_2\] is greater than 1.96, then we reject \[H_0\] because the SE is 1.

  2. What values of βˆ2 would lead you to reject the null hypothesis in the two-sided test? Again if the Tstat value was greater than 1.96 then we would reject \[H_0\] This is because we have an \[\alpha = 0.05\] for this first case the cofficient for \[\beta_2\] is greater than 1.96, then we reject \[H_0\] because the SE is 1.

  3. What values of βˆ2 would lead you to reject the null hypothesis in the one-sided test, but not the two-sided test? Again I would reject if \[\beta_2\] is greater than 1.96 for the one-sided, then we reject \[H_0\] However if \[\beta_2\] is less than 1.96 for the two-sided, then we accept \[H_0\]

  4. What values of βˆ2 would lead you to reject the null hypothesis in the two-sided test, but not the one-sided test?

If \[\beta_2\] is greater than 1.96 for the two-sided test, then we reject \[H_0\] If \[\beta_2\] is greater than 1.96 for the one-sided, then we reject \[H_0\] Again this is all based off of the \[\alpha = 0.05\]

(2) Suppose that you are studying the effect of police officers on crime rates in several large American cities. When you estimate the model:

\[ Crime_i = \beta_0 + \beta_1 Police_i + u_i\] you obtain a postive slope estimate.

  1. What does your slope estimate imply about how the number of police officers affects crime?

The postive slope estimate would suggest that crime increases at a rate of beta1 for each police officer added to the force. However this doesnt make too much sense, and hence why we need an ommitted variable for part 2.

  1. Provide an example of an ommitted variable that could explain why the slope estimate is positive. There are many ommitted variables that could explain the postive slope:

Hence I would assume these types of ommitted variables would lead to a postive slope for beta1.

(3) Suppose that your friend wrote a computer program that runs both simple and multiple linear regressions using OLS. Your friend asks you to test the software. After importing a dataset with 2000 observations and three cryptically named variables—Y , X1, and X2—you run two regressions. The first is a regression of Y on X1, which gives you an intercept estimate of 10.4, a slope estimate of -3.8, and R2 = 0.215. The second is a regression of Y on X1 and X2, which gives you an intercept estimate of 9.3, an X1 slope estimate of -2.9, an X2 slope estimate of -0.07, and R2 = 0.198. You tell your friend that they must made a mistake somewhere in theircode. Why?

For this problem, our variable Y is contingent on both: \[X_1\] and \[X_2\] However when we compare the \[r^2\] score of regression \[X_1\] and \[X_2\] We can see that it is less than the:

\[r^2\] score of the first regression (Y on X1). This clearly shows that addition of X2 doesn’t influence the dependent variable Y. Thus this must mean that the programmer made some mistake in thier code to get a smaller \[r^2\] vaule.

(4) A useful application of multiple regression analysis is Hedonic modeling. Hedonic models seek to explain the price of a good—such as a house—in terms of its attributes (e.g., number of bedrooms, square footage, or distance from the nearest toxic waste dump). Consider the following Hedonic model of home sale prices:

\[Price_i = \beta_0 + \beta_1(square footage)_i + \beta_2 Bathrooms_i + \beta_3 Bedrooms_i +u_i\] Using data from 37 home sales, you estimate the model and obtain: \[ \hat\beta_o = 90000, \hat\beta_1 = 1100, \hat\beta_2 = 16000, \hat\beta_3 = 35000, SE(\hat\beta_1) = 650 \]

  1. Interpret each coefficient.

Starting with the coefficient \[ \hat\beta_o\] We can see that the y-intercept is 90,000. They way that I like to infere this in my head is the cost of the land.

The next coefficient is for beta 1, which is 1100 Thus the price of the home increases by 1100 for a single square foot added (marginal).

The next coefficient is for beta 2, which is 16,000. Thus the price of the home increases by 16,000 for a single bathroom added (marginal).

The next coefficient is for beta 3, which is 35,000. Thus the price of the home increases by 35,000 for a single bedroom added (marginal).

  1. What is the model’s forecasted sale price for a 2500-square-foot home with 3 bedrooms and 2.5 bathrooms?

Price = 90,000 + 1100(2500) + 16,000(2.5) + 35,000(3)

Price = $2,985,000

  1. In a remodeling frenzy, a homeowner adds an additional bedroom and an additional bathroom by splitting up existing rooms. What is the forecasted change in the price of her home?

Price = 90,000 + 1100(2500) + 16,000(3.5) + 35,000(4)

Price = $3,036,000

The price of the home goes up by $51,000

  1. A homeowner adds a 450-square-foot bedroom and a 75-square-foot bathroom by extending the footprint of his home into an area that used to be a driveway. What is the forecasted change in the price of his home?

Price = 90,000 + 1100(3025) + 16,000(3.5) + 35,000(4)

Price = $3,613,500

  1. Conduct two-sided tests of the hypothesis that square footage has no effect on sale price at the 10, 5, and 1 percent levels.

\[ \lvert T_{stat}\rvert= \lvert{\frac{\hat\beta_1 }{SE(\hat\beta_1)}}\rvert \] Then we substitute in the values of the coefficient and standand error.

\[ \lvert T_{stat}\rvert= \lvert{\frac{1100}{650}}\rvert \] Thus we get

\[ \lvert T_{stat}\rvert= \lvert{1.692307692}\rvert \] And we find out first T Critical value using the degrees of freedom and significance levels provided. I turned to Appendix A in our books to find the T Table.

These were the critical vaules I obtained from the book:

For the 10 percent level we get a critical value of \[ \lvert {T_{0.9,37-3}}\rvert = \lvert{1.691}\rvert \] For the 5 percent level we get a critical value of \[ \lvert {T_{0.95,37-3}}\rvert = \lvert{2.032}\rvert \]

For the 5 percent level we get a critical value of \[ \lvert {T_{0.99,37-3}}\rvert = \lvert{2.728}\rvert \] \[\lvert T_{stat}\rvert= \lvert{1.692307692}\rvert \ge \lvert {T_{0.9,37-3}}\rvert = \lvert{1.691}\rvert \] Thus we reject the null at 10% significance level.

\[\lvert T_{stat}\rvert= \lvert{1.692307692}\rvert \le \lvert {T_{0.95,37-3}}\rvert = \lvert{2.032}\rvert \] Thus here at the 5% level of significance we fail to reject the null.

\[\lvert T_{stat}\rvert= \lvert{1.692307692}\rvert \le \lvert {T_{0.95,37-3}}\rvert = \lvert{2.728}\rvert \] Thus here at the 1% level of significance we fail to reject the null.

  1. Construct a 95 percent confidence interval for β1.

\[ CI^{\hat\beta_1}_{0.95} = [\hat\beta_1 - 1.96 *SE(\hat\beta_1),\hat\beta_1 + 1.96 *SE(\hat\beta_1)] \] Thus then we substitute our cofficient values in as well as the standard error.

\[ CI^{1100}_{0.95} = [(1100 - 1.96 *650), (1100 + 1.96 *650)] \] Finally get the confidence interval:

\[ CI^{1100}_{0.95} = [-174, 2374] \]

Computational Problem

(1) The model we wish to estimate is given by:

\[reglbs_i = \beta_1 + \beta_2 regprc_i + u_i \]

We then use Stargazer to summarize the results of the first regression:

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               reglbs           
## -----------------------------------------------
## regprc                        -0.688           
##                               (0.463)          
##                                                
## Constant                     1.890***          
##                               (0.424)          
##                                                
## -----------------------------------------------
## Observations                    660            
## R2                             0.003           
## Adjusted R2                    0.002           
## Residual Std. Error      2.907 (df = 658)      
## F Statistic             2.205 (df = 1; 658)    
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The sign of the: \[\beta_2\] for independent variable: \[reglbs_i\] is negative. This make sense considering what we know about the relationship between price and quanity. The demand curve in economics is downward sloping; hence, the negative relationship between quatity demand and price of the good.

However the: \[ R^2\] values seem to be very small, which is conserning.

(2) The model we wish to estimate is given by:

\[ecolbs_i = \beta_1 + \beta_2 ecoprc_i + u_i \] We then use Stargazer to summarize the results of the second regression:

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               ecolbs           
## -----------------------------------------------
## ecoprc                       -0.845**          
##                               (0.331)          
##                                                
## Constant                     2.388***          
##                               (0.372)          
##                                                
## -----------------------------------------------
## Observations                    660            
## R2                             0.010           
## Adjusted R2                    0.008           
## Residual Std. Error      2.515 (df = 658)      
## F Statistic            6.501** (df = 1; 658)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

We can see that the relationship is negative, just like the first regression. It looks like quatity demanded of “eco” apples goes down by 0.845 lbs for a single increase price in of “eco” apples.

(3) The model we wish to estimate is given by:

\[reglbs_i = \beta_0 + \beta_1 regprc_i+ \beta_2ecoprc_i + u_i \]

We then use Stargazer to summarize the results of the third regression:

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               reglbs           
## -----------------------------------------------
## regprc                        -1.569*          
##                               (0.832)          
##                                                
## ecoprc                         0.877           
##                               (0.688)          
##                                                
## Constant                     1.719***          
##                               (0.445)          
##                                                
## -----------------------------------------------
## Observations                    660            
## R2                             0.006           
## Adjusted R2                    0.003           
## Residual Std. Error      2.906 (df = 657)      
## F Statistic             1.916 (df = 2; 657)    
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

When looking at the coefficient for regprc we can notice that it has double since the first regression. In the third regression the coefficient is -1.569. In the first regression the coefficient was -0.845. We can also notice the fact that coefficient for ecoprc went from negative in the second regression to postitive in the third regression. Hence this would conclude that regprc and ecoprc are correlated.

The: \[R^2\] vaule is almost zero. And the adjusted vaule is zero. This third regression has the smallest: \[R^2\] in all the regessions.

(4) The model we wish to estimate is given by:

\[ecolbs_i = \beta_0 + \beta_1 ecoprc_i + \beta_2regprc_i + u_i\]

We then use Stargazer to summarize the results of the fourth regression:

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               ecolbs           
## -----------------------------------------------
## regprc                       3.029***          
##                               (0.711)          
##                                                
## ecoprc                       -2.926***         
##                               (0.588)          
##                                                
## Constant                     1.965***          
##                               (0.380)          
##                                                
## -----------------------------------------------
## Observations                    660            
## R2                             0.036           
## Adjusted R2                    0.033           
## Residual Std. Error      2.483 (df = 657)      
## F Statistic           12.414*** (df = 2; 657)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The: \[R^2\] value of this fourth regression is 0.036. Even though this is still a small number, it is the largest \[R^2\] that we have out of all the regressions.

(5)

This is a Stargazer table displaying the coefficients and R values:

## 
## =========================================================================================================
##                                                      Dependent variable:                                 
##                     -------------------------------------------------------------------------------------
##                           reglbs               ecolbs               reglbs                ecolbs         
##                             (1)                  (2)                  (3)                   (4)          
## ---------------------------------------------------------------------------------------------------------
## regprc                    -0.688                                    -1.569*              3.029***        
##                           (0.463)                                   (0.832)               (0.711)        
##                                                                                                          
## ecoprc                                        -0.845**               0.877               -2.926***       
##                                                (0.331)              (0.688)               (0.588)        
##                                                                                                          
## Constant                 1.890***             2.388***             1.719***              1.965***        
##                           (0.424)              (0.372)              (0.445)               (0.380)        
##                                                                                                          
## ---------------------------------------------------------------------------------------------------------
## Observations                660                  660                  660                   660          
## R2                         0.003                0.010                0.006                 0.036         
## Adjusted R2                0.002                0.008                0.003                 0.033         
## Residual Std. Error  2.907 (df = 658)     2.515 (df = 658)     2.906 (df = 657)      2.483 (df = 657)    
## F Statistic         2.205 (df = 1; 658) 6.501** (df = 1; 658) 1.916 (df = 2; 657) 12.414*** (df = 2; 657)
## =========================================================================================================
## Note:                                                                         *p<0.1; **p<0.05; ***p<0.01

The two regressions that have the highest: \[R^2\] values are regressions 2 and 4. This make sense as the “eco” apples are more high end apples. It would makes sense that they have a negatice coefficient. Notice by adding in the regprc in regression 4, it made the coefficient for ecolabs go from -0.845 to -2.926. This would only happen if there is a correlation between regprc and ecoprc.

(6) We want to construct a confidence interval for the ecoprc coefficient from the fourth regression.

## # A tibble: 3 x 5
##   term        estimate std.error statistic     p.value
##   <chr>          <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)     1.97     0.380      5.17 0.000000310
## 2 regprc          3.03     0.711      4.26 0.0000233  
## 3 ecoprc         -2.93     0.588     -4.98 0.000000823

Thus we have -2.93 plus or minus T_crit_value * 0.588 = [-1.411 , -4.449].

Thus [-1.411 , -4.449] is the confidence interval for 99% significance.