URL <- "https://raw.githubusercontent.com/DS4PS/cpp-523-fall-2019/master/labs/data/IncomeHappiness.csv"
dat <- read.csv( URL )Read the study below, and then use the dataset called “IncomeHappiness.csv” to estimate the following model:
## income happiness
## Min. : 38.9 Min. : 21.34
## 1st Qu.: 39970.8 1st Qu.: 62.83
## Median : 79994.1 Median : 78.36
## Mean : 89662.9 Mean : 72.72
## 3rd Qu.:138908.2 3rd Qu.: 85.57
## Max. :199952.2 Max. :102.19
## [1] "income" "happiness"
## [1] "x" "y"
## x y w
## Min. : 0.00389 Min. : 21.34 Min. : 0.00
## 1st Qu.: 3.99708 1st Qu.: 62.83 1st Qu.: 15.98
## Median : 7.99941 Median : 78.36 Median : 63.99
## Mean : 8.96628 Mean : 72.72 Mean :113.78
## 3rd Qu.:13.89082 3rd Qu.: 85.57 3rd Qu.:192.96
## Max. :19.99522 Max. :102.19 Max. :399.81
\(Happiness = b_0+b_1 Income+ b_2 (Income)^2+e\)
You will need to create a new variable x-squared. Report your results in a regression table.
m <- lm( y ~ x, data=dat )
stargazer( m, type="html",
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses" )| Dependent variable: | |
| y | |
| x | 2.437*** |
| (0.037) | |
| Constant | 50.871*** |
| (0.390) | |
| Observations | 2,000 |
| Adjusted R2 | 0.690 |
| Standard errors in parentheses | p<0.1; p<0.05; p<0.01 |
m1 <- lm( y ~ x + w, data=dat )
stargazer( m1, type="html",
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses" )| Dependent variable: | |
| y | |
| x | 7.361*** |
| (0.089) | |
| w | -0.252*** |
| (0.004) | |
| Constant | 35.348*** |
| (0.361) | |
| Observations | 2,000 |
| Adjusted R2 | 0.883 |
| Standard errors in parentheses | p<0.1; p<0.05; p<0.01 |
#y_hat <- predict( m1, data.frame( income=1:200000, income2=(1:200000)^2 ) )
plot( dat$x, dat$y,
xlab="Income (Thousands of Dollars)", ylab="Hapiness Scale",
main="Does Money Make You Happy?",
pch=19, col="darkorange", bty="n",
xaxt="n" )
axis( side=1, at=c(0,50000,100000,150000,200000), labels=c("$0","$50k","$100k","$150k","$200k") )Call: lm(formula = y ~ x + w, data = dat)
Residuals: Min 1Q Median 3Q Max -19.1420 -3.9703 -0.0493 3.9720 20.4357
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.348269 0.361399 97.81 <2e-16 x 7.361023 0.088702 82.99 <2e-16 w -0.251607 0.004385 -57.38 <2e-16 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 5.806 on 1997 degrees of freedom Multiple R-squared: 0.8829, Adjusted R-squared: 0.8828 F-statistic: 7529 on 2 and 1997 DF, p-value: < 2.2e-16
How much happiness do you gain making an extra $10k when your initial income is $15k? –410
# replace with model coefficients
b0 <- 1
b1 <- 1
b2 <- 1
x <- 15 # use 15000 if you did not rescale above
happy.15k <- b0 + b1*x + b2*x*x
x <- 25 # use 25000 if you did not rescale above
happy.25k <- b0 + b1*x + b2*x*x
happy.25k - happy.15k # marginal effect of $10k increase at $15k starting salary## [1] 410
How much happiness do you gain making an extra $10k when your initial income is $75k? –1610
## [1] 1610
How much happiness do you gain making an extra $10k when your initial income is $100k? –2110
## [1] 2110
For this part of the final assignment you will be using a dataset that examines compensation of nonprofit executive directors from the years 2012-2013. The data is extracted from the IRS E-Filer database available on AWS.
URL <- "https://github.com/DS4PS/cpp-523-fall-2019/blob/master/labs/data/np-comp-data.rds?raw=true"
dat1 <- readRDS(gzcon(url( URL )))
summary(dat1)## FILEREIN FILERNAME1 NTMAJ12 NPAGE
## Min. : 10024645 Length:65144 Length:65144 Min. : -1.00
## 1st Qu.:232997254 Class :character Class :character 1st Qu.: 15.00
## Median :391318616 Mode :character Mode :character Median : 26.00
## Mean :436908169 Mean : 28.98
## 3rd Qu.:593725701 3rd Qu.: 39.00
## Max. :943151580 Max. :110.00
## TAXYR STATE RULEDATE REVENUE
## Min. :2012 Length:65144 Min. :190401 Min. :6.000e+00
## 1st Qu.:2012 Class :character 1st Qu.:197408 1st Qu.:4.986e+05
## Median :2012 Mode :character Median :198711 Median :1.437e+06
## Mean :2012 Mean :198458 Mean :1.278e+07
## 3rd Qu.:2013 3rd Qu.:199905 3rd Qu.:5.612e+06
## Max. :2013 Max. :201404 Max. :5.840e+09
## ASSETS PERSONNM TITLETXT AVGHRS
## Min. :-6.296e+06 Length:65144 Length:65144 Min. : 1.15
## 1st Qu.: 3.851e+05 Class :character Class :character 1st Qu.: 40.00
## Median : 1.556e+06 Mode :character Mode :character Median : 40.00
## Mean : 2.433e+07 Mean : 39.33
## 3rd Qu.: 7.029e+06 3rd Qu.: 40.00
## Max. : 7.276e+10 Max. :168.00
## SALARY GENDER PROPORTION_FEMALE M2012CEO
## Min. : 2 Length:65144 Min. :0.0000 Min. :0.0000
## 1st Qu.: 54000 Class :character 1st Qu.:0.0041 1st Qu.:0.0000
## Median : 83690 Mode :character Median :0.4366 Median :1.0000
## Mean : 117191 Mean :0.4969 Mean :0.5019
## 3rd Qu.: 133547 3rd Qu.:0.9972 3rd Qu.:1.0000
## Max. :13573496 Max. :1.0000 Max. :1.0000
## TREAT POST
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000
## Mean :0.01822 Mean :0.4988
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000
plot( log(d2$REVENUE), log(d2$SALARY), bty="n", pch=19, col="darkorange",
xlab="Nonprofit Revenue (logged)", ylab="Executive Director Salary (logged)",
xlim=c(5,25), ylim=c(5,16))
abline( h=seq( 1, 20, 0.5 ), col=gray(0.5,0.2), lwd=1 )
abline( v=seq( 1, 25, 0.5 ), col=gray(0.5,0.2), lwd=1 )
abline( lm( log(d2$SALARY) ~ log(d2$REVENUE) ), col=gray(0.5,0.5), lwd=3 )Codebook:
plot( log(d2$REVENUE), log(d2$SALARY), bty="n", pch=19, col=gray(0.5,0.2), cex=1.2,
xlab="Nonprofit Revenue (logged)", ylab="Executive Director Salary (logged)",
xlim=c(5,25), ylim=c(5,16))
abline( lm( log(d2$SALARY) ~ log(d2$REVENUE) ), col="darkorange", lwd=3 )
points( mean(log(d2$REVENUE)), mean(log(d2$SALARY)), pch=19, col="darkorange", cex=2 )
points( log(d2$REVENUE[c(1446,1681)]), log(d2$SALARY[c(1446,1681)]),
cex=3, col="steelblue", lwd=2 )
points( log(d2$REVENUE[c(1446,1681)]), log(d2$SALARY[c(1446,1681)]),
cex=1.5, col="steelblue", pch=19 )
text( log(d2$REVENUE[c(1446,1681)]), log(d2$SALARY[c(1446,1681)]), c("A","B"),
pos=4, offset=1.2, col="steelblue", cex=2 )Note: on the graph I saw, point B was located just right of the mean, under the regression line. When I knit the file, point B was located directly above the mean. If this was the case, the slope would not change at all but the intercept would be shifted up.
exp() function.## [1] 1883778
\(log(Salary) = 6.367 + 0.343 \cdot log(Revenue)\)
m <- lm( log(SALARY) ~ log(REVENUE), data=d2 )
stargazer( m, type="html",
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses" )| Dependent variable: | |
| log(SALARY) | |
| log(REVENUE) | 0.354*** |
| (0.009) | |
| Constant | 6.193*** |
| (0.128) | |
| Observations | 2,000 |
| Adjusted R2 | 0.445 |
| Standard errors in parentheses | p<0.1; p<0.05; p<0.01 |
Call: lm(formula = log(SALARY) ~ log(REVENUE), data = d2)
Residuals: Min 1Q Median 3Q Max -6.5050 -0.2418 0.0664 0.3376 2.2819
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.192936 0.128105 48.34 <2e-16 log(REVENUE) 0.353668 0.008825 40.08 <2e-16 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 0.6652 on 1998 degrees of freedom Multiple R-squared: 0.4456, Adjusted R-squared: 0.4454 F-statistic: 1606 on 1 and 1998 DF, p-value: < 2.2e-16
[1] 82696.7
–If revenue goes up by x percent, the salary would increase by b1/100 dollars
After you have completed your lab submit via Canvas. Login to the ASU portal at http://canvas.asu.edu and navigate to the assignments tab in the course repository. Upload your RMD and your HTML files to the appropriate lab submission link. Or else use the link from the Lab-02 tab on the Schedule page.
Remember to name your files according to the convention: Lab-##-LastName.xxx