library( dplyr )
library( pander )
library( stargazer )

Part I:

Questions

Q1.

Read the study below, and then use the dataset called “IncomeHappiness.csv” to estimate the following model:

##      income           happiness     
##  Min.   :    38.9   Min.   : 21.34  
##  1st Qu.: 39970.8   1st Qu.: 62.83  
##  Median : 79994.1   Median : 78.36  
##  Mean   : 89662.9   Mean   : 72.72  
##  3rd Qu.:138908.2   3rd Qu.: 85.57  
##  Max.   :199952.2   Max.   :102.19
## [1] "income"    "happiness"
## [1] "x" "y"
##        x                  y                w         
##  Min.   : 0.00389   Min.   : 21.34   Min.   :  0.00  
##  1st Qu.: 3.99708   1st Qu.: 62.83   1st Qu.: 15.98  
##  Median : 7.99941   Median : 78.36   Median : 63.99  
##  Mean   : 8.96628   Mean   : 72.72   Mean   :113.78  
##  3rd Qu.:13.89082   3rd Qu.: 85.57   3rd Qu.:192.96  
##  Max.   :19.99522   Max.   :102.19   Max.   :399.81

\(Happiness = b_0+b_1 Income+ b_2 (Income)^2+e\)

You will need to create a new variable x-squared. Report your results in a regression table.

Dependent variable:
y
x 2.437***
(0.037)
Constant 50.871***
(0.390)
Observations 2,000
Adjusted R2 0.690
Standard errors in parentheses p<0.1; p<0.05; p<0.01
Dependent variable:
y
x 7.361***
(0.089)
w -0.252***
(0.004)
Constant 35.348***
(0.361)
Observations 2,000
Adjusted R2 0.883
Standard errors in parentheses p<0.1; p<0.05; p<0.01

Call: lm(formula = y ~ x + w, data = dat)

Residuals: Min 1Q Median 3Q Max -19.1420 -3.9703 -0.0493 3.9720 20.4357

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.348269 0.361399 97.81 <2e-16 x 7.361023 0.088702 82.99 <2e-16 w -0.251607 0.004385 -57.38 <2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Residual standard error: 5.806 on 1997 degrees of freedom Multiple R-squared: 0.8829, Adjusted R-squared: 0.8828 F-statistic: 7529 on 2 and 1997 DF, p-value: < 2.2e-16

Q3.

How much happiness do you gain making an extra $10k when your initial income is $75k? –1610

## [1] 1610

Q4.

How much happiness do you gain making an extra $10k when your initial income is $100k? –2110

## [1] 2110

Part II: Outliers

For this part of the final assignment you will be using a dataset that examines compensation of nonprofit executive directors from the years 2012-2013. The data is extracted from the IRS E-Filer database available on AWS.

##     FILEREIN          FILERNAME1          NTMAJ12              NPAGE       
##  Min.   : 10024645   Length:65144       Length:65144       Min.   : -1.00  
##  1st Qu.:232997254   Class :character   Class :character   1st Qu.: 15.00  
##  Median :391318616   Mode  :character   Mode  :character   Median : 26.00  
##  Mean   :436908169                                         Mean   : 28.98  
##  3rd Qu.:593725701                                         3rd Qu.: 39.00  
##  Max.   :943151580                                         Max.   :110.00  
##      TAXYR         STATE              RULEDATE         REVENUE         
##  Min.   :2012   Length:65144       Min.   :190401   Min.   :6.000e+00  
##  1st Qu.:2012   Class :character   1st Qu.:197408   1st Qu.:4.986e+05  
##  Median :2012   Mode  :character   Median :198711   Median :1.437e+06  
##  Mean   :2012                      Mean   :198458   Mean   :1.278e+07  
##  3rd Qu.:2013                      3rd Qu.:199905   3rd Qu.:5.612e+06  
##  Max.   :2013                      Max.   :201404   Max.   :5.840e+09  
##      ASSETS             PERSONNM           TITLETXT             AVGHRS      
##  Min.   :-6.296e+06   Length:65144       Length:65144       Min.   :  1.15  
##  1st Qu.: 3.851e+05   Class :character   Class :character   1st Qu.: 40.00  
##  Median : 1.556e+06   Mode  :character   Mode  :character   Median : 40.00  
##  Mean   : 2.433e+07                                         Mean   : 39.33  
##  3rd Qu.: 7.029e+06                                         3rd Qu.: 40.00  
##  Max.   : 7.276e+10                                         Max.   :168.00  
##      SALARY            GENDER          PROPORTION_FEMALE    M2012CEO     
##  Min.   :       2   Length:65144       Min.   :0.0000    Min.   :0.0000  
##  1st Qu.:   54000   Class :character   1st Qu.:0.0041    1st Qu.:0.0000  
##  Median :   83690   Mode  :character   Median :0.4366    Median :1.0000  
##  Mean   :  117191                      Mean   :0.4969    Mean   :0.5019  
##  3rd Qu.:  133547                      3rd Qu.:0.9972    3rd Qu.:1.0000  
##  Max.   :13573496                      Max.   :1.0000    Max.   :1.0000  
##      TREAT              POST       
##  Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.0000  
##  Mean   :0.01822   Mean   :0.4988  
##  3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :1.0000

Codebook:

Q1. What is the likely impact of the outlier “A” on the regression line?

  • Will is make the slope larger or smaller? –larger
  • Would it contribute to a Type I or Type II error? –Type I

Q2. What is the likely impact of the outlier “B” on the regression line?

  • Will is make the slope larger or smaller? –smaller
  • Would it contribute to a Type I or Type II error? –Type II

Note: on the graph I saw, point B was located just right of the mean, under the regression line. When I knit the file, point B was located directly above the mean. If this was the case, the slope would not change at all but the intercept would be shifted up.

Q3. The average logged revenue of a nonprofit in this data is 14.44879. What does that translate to in normal dollars? Use the exp() function.

## [1] 1883778

Q4. What would be the typical salary for a director of a nonprofit of this size?

\(log(Salary) = 6.367 + 0.343 \cdot log(Revenue)\)

Dependent variable:
log(SALARY)
log(REVENUE) 0.354***
(0.009)
Constant 6.193***
(0.128)
Observations 2,000
Adjusted R2 0.445
Standard errors in parentheses p<0.1; p<0.05; p<0.01

Call: lm(formula = log(SALARY) ~ log(REVENUE), data = d2)

Residuals: Min 1Q Median 3Q Max -6.5050 -0.2418 0.0664 0.3376 2.2819

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.192936 0.128105 48.34 <2e-16 log(REVENUE) 0.353668 0.008825 40.08 <2e-16 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Residual standard error: 0.6652 on 1998 degrees of freedom Multiple R-squared: 0.4456, Adjusted R-squared: 0.4454 F-statistic: 1606 on 1 and 1998 DF, p-value: < 2.2e-16

[1] 82696.7

Q5. Interpret the coefficient b1 (slope for log of revenue) in the model above (i.e. “a one-unit change in X corresponds with a b1-unit change in Y”, but adjusted for the log-log context). See the hand-out for guidance.

–If revenue goes up by x percent, the salary would increase by b1/100 dollars




Submission Instructions

After you have completed your lab submit via Canvas. Login to the ASU portal at http://canvas.asu.edu and navigate to the assignments tab in the course repository. Upload your RMD and your HTML files to the appropriate lab submission link. Or else use the link from the Lab-02 tab on the Schedule page.

Remember to name your files according to the convention: Lab-##-LastName.xxx