## Counties with zero murders in 1996: 1051
## Counties with at least one execution in 1996: 31
## Warning in max(countymurders$countymurders$execs): no non-missing arguments to
## max; returning -Inf
## Largest number of executions in 1996: -Inf
##
## Call:
## lm(formula = murders ~ execs, data = subset(countymurders, year ==
## 1996))
##
## Residuals:
## Min 1Q Median 3Q Max
## -149.12 -5.46 -4.46 -2.46 1338.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.4572 0.8348 6.537 7.79e-11 ***
## execs 58.5555 5.8333 10.038 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.89 on 2195 degrees of freedom
## Multiple R-squared: 0.04389, Adjusted R-squared: 0.04346
## F-statistic: 100.8 on 1 and 2195 DF, p-value: < 2.2e-16
## The slope coefficient (ß1) represents the change in murders for a one-unit change in executions.
## If ß1 is negative, it suggests a deterrent effect of capital punishment.
## Smallest number of murders predicted: 5.457241
## Residual for a county with zero executions and zero murders: 5.457241
## A simple regression analysis may suffer from omitted variable bias and endogeneity issues.
## Factors other than executions could influence the murder rate, leading to biased estimates.
## Additionally, the decision to implement capital punishment may be influenced by the crime rate,
## creating endogeneity problems and making causal inference challenging.
## In the given model, it does not make sense to hold sleep, work, and leisure fixed while changing study.
## The reason is that the sum of hours in all four activities must be 168 for each student.
## Changing the hours spent on studying would inherently change the hours available for other activities.
## This model violates Assumption MLR.3, which assumes that the regressors are fixed and non-stochastic.
## In this case, the hours spent on different activities are not fixed; they must sum up to 168, which introduces
## stochasticity and correlation among the explanatory variables.
## To satisfy Assumption MLR.3, you could reformulate the model by using a set of independent variables that
## are not constrained to sum to a fixed value. For example, you could use the hours spent on three activities
## as independent variables, and the fourth one can be derived from the constraint (168 - study - work - leisure).
## If x1 is highly correlated with x2 and x3 and x2 and x3 have large partial effects on y,
## you would expect (B1 with ~ sign) and (adjusted B1) to be similar. The inclusion of x2 and x3 in the model
## should help in capturing the relationship between x1 and y more accurately, resulting in a similar effect.
## If x1 is almost uncorrelated with x2 and x3 but x2 and x3 are highly correlated,
## (B1 with ~ sign) and (adjusted B1) tend to be similar. The high correlation between x2 and x3
## may result in multicollinearity issues, leading to unstable coefficient estimates for x2 and x3.
## If x1 is highly correlated with x2 and x3, and x2 and x3 have small partial effects on y,
## you would expect se(B₁ with ~ sign) to be smaller. The high correlation can lead to
## multicollinearity, inflating standard errors for the individual coefficients.
## Average prpblck: NA
## Standard deviation of prpblck: NA
## Average income: NA
## Standard deviation of income: NA
## Units of measurement: prpblck is the proportion of the population that is black (in percentage),
## and income is the median income in the zip code.
##
## Call:
## lm(formula = psoda ~ prpblck + income, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29401 -0.05242 0.00333 0.04231 0.44322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.563e-01 1.899e-02 50.354 < 2e-16 ***
## prpblck 1.150e-01 2.600e-02 4.423 1.26e-05 ***
## income 1.603e-06 3.618e-07 4.430 1.22e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08611 on 398 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.06422, Adjusted R-squared: 0.05952
## F-statistic: 13.66 on 2 and 398 DF, p-value: 1.835e-06
## The coefficient on prpblck is the estimated change in psoda for a one-unit change in prpblck.
## In this context, it represents the change in the price of soda for a 1% increase in the proportion
## of the population that is black. Whether it is economically large depends on the magnitude and significance
## of the coefficient, which can be determined from the summary output.
##
## Call:
## lm(formula = psoda ~ prpblck, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30884 -0.05963 0.01135 0.03206 0.44840
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03740 0.00519 199.87 < 2e-16 ***
## prpblck 0.06493 0.02396 2.71 0.00702 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0881 on 399 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.01808, Adjusted R-squared: 0.01561
## F-statistic: 7.345 on 1 and 399 DF, p-value: 0.007015
##
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income), data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33563 -0.04695 0.00658 0.04334 0.35413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.79377 0.17943 -4.424 1.25e-05 ***
## prpblck 0.12158 0.02575 4.722 3.24e-06 ***
## log(income) 0.07651 0.01660 4.610 5.43e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0821 on 398 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.06809, Adjusted R-squared: 0.06341
## F-statistic: 14.54 on 2 and 398 DF, p-value: 8.039e-07
## Estimated percentage change in psoda for a 20% increase in prpblck: 2.46141
##
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32218 -0.04648 0.00651 0.04272 0.35622
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.46333 0.29371 -4.982 9.4e-07 ***
## prpblck 0.07281 0.03068 2.373 0.0181 *
## log(income) 0.13696 0.02676 5.119 4.8e-07 ***
## prppov 0.38036 0.13279 2.864 0.0044 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08137 on 397 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.08696, Adjusted R-squared: 0.08006
## F-statistic: 12.6 on 3 and 397 DF, p-value: 6.917e-08
## Correlation between lincome and prppov: NA
## The statement 'Because lincome and prppov are so highly correlated, they have no business being in the same regression.'
## The high negative correlation between lincome and prppov suggests multicollinearity between these two variables.
## Multicollinearity can lead to unstable coefficient estimates, making it challenging to interpret the individual effects
## of the variables. However, the decision to include or exclude variables should be based on the specific research question,
## theoretical considerations, and the goals of the analysis.
## In some cases, including both variables in the regression model might still be justified if they capture different aspects
## of the relationship with the dependent variable and contribute to a more comprehensive understanding of the phenomenon under study.
## i) Estimated percentage point change in Rdintens for a 10% increase in sales: 3.21
## ii) p-value for the test on log(sales) coefficient: 0.1480413
## (At 5% level): Fail to reject H0
## (At 10% level): Fail to reject H0
## iii) Coefficient on profmarg: 0.5
## iv) p-value for the test on profmarg coefficient: 0.2860082
## (At 5% level): Fail to reject H0
## (At 10% level): Fail to reject H0
## Number of single-person households: 2017
##
## Call:
## lm(formula = nettfa ~ inc + age, data = single_person_households)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.95 -14.16 -3.42 6.03 1113.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -43.03981 4.08039 -10.548 <2e-16 ***
## inc 0.79932 0.05973 13.382 <2e-16 ***
## age 0.84266 0.09202 9.158 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.68 on 2014 degrees of freedom
## Multiple R-squared: 0.1193, Adjusted R-squared: 0.1185
## F-statistic: 136.5 on 2 and 2014 DF, p-value: < 2.2e-16
## Interpretation of slope coefficients:
## B1 (inc): The estimated change in nettfa for a one-unit change in inc (annual family income).
## B2 (age): The estimated change in nettfa for a one-unit change in age.
## There might be surprises depending on the context and expectations of the relationship between variables.
## The intercept (B0) represents the estimated net financial wealth (nettfa) when both inc and age are zero.
## In this context, it may not have a meaningful interpretation, as having zero income and age is not practically meaningful.
## p-value for the test H0: B₂ = 1 against H₁: B₂ < 1: 1.265959e-19
## At the 1% significance level, we would reject H0 if the p-value is less than 0.01.
## We reject H0; there is evidence that B₂ is less than 1.
##
## Call:
## lm(formula = nettfa ~ inc, data = single_person_households)
##
## Residuals:
## Min 1Q Median 3Q Max
## -185.12 -12.85 -4.85 1.78 1112.66
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.5709 2.0607 -5.13 3.18e-07 ***
## inc 0.8207 0.0609 13.48 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 45.59 on 2015 degrees of freedom
## Multiple R-squared: 0.08267, Adjusted R-squared: 0.08222
## F-statistic: 181.6 on 1 and 2015 DF, p-value: < 2.2e-16
## Comparison of the estimated coefficient on inc:
## The estimated coefficient on inc in the simple regression is compared to the estimate in part (ii).
## Differences may arise due to the inclusion of age in the multiple regression model, which may affect
## the relationship between nettfa and inc. The context and goals of the analysis will determine
## whether the inclusion of age improves the model.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## (i). Probability that 'score' exceeds 100 using the normal distribution: 0.02044288
##
## Shapiro-Wilk normality test
##
## data: data
## W = 0.96973, p-value = 2.454e-12
##
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = wage_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6068 -1.7747 -0.6279 1.1969 14.6536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.87273 0.72896 -3.941 9.22e-05 ***
## educ 0.59897 0.05128 11.679 < 2e-16 ***
## exper 0.02234 0.01206 1.853 0.0645 .
## tenure 0.16927 0.02164 7.820 2.93e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared: 0.3064, Adjusted R-squared: 0.3024
## F-statistic: 76.87 on 3 and 522 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure, data = wage_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05802 -0.29645 -0.03265 0.28788 1.42809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.284360 0.104190 2.729 0.00656 **
## educ 0.092029 0.007330 12.555 < 2e-16 ***
## exper 0.004121 0.001723 2.391 0.01714 *
## tenure 0.022067 0.003094 7.133 3.29e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4409 on 522 degrees of freedom
## Multiple R-squared: 0.316, Adjusted R-squared: 0.3121
## F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2)) # Set up a 2x2 grid for Q-Q plots
##
## Summary Statistics - Level-Level Model:
##
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = wage_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6068 -1.7747 -0.6279 1.1969 14.6536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.87273 0.72896 -3.941 9.22e-05 ***
## educ 0.59897 0.05128 11.679 < 2e-16 ***
## exper 0.02234 0.01206 1.853 0.0645 .
## tenure 0.16927 0.02164 7.820 2.93e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared: 0.3064, Adjusted R-squared: 0.3024
## F-statistic: 76.87 on 3 and 522 DF, p-value: < 2.2e-16
##
## Summary Statistics - Log-Level Model:
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure, data = wage_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05802 -0.29645 -0.03265 0.28788 1.42809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.284360 0.104190 2.729 0.00656 **
## educ 0.092029 0.007330 12.555 < 2e-16 ***
## exper 0.004121 0.001723 2.391 0.01714 *
## tenure 0.022067 0.003094 7.133 3.29e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4409 on 522 degrees of freedom
## Multiple R-squared: 0.316, Adjusted R-squared: 0.3121
## F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16
rdintens=2.613+0.00030⋅sales−0.0000000070⋅sales^2 (0.429)(0.00014)(0.0000000037) Given:n=32, R^2=0.1484
To find the point at which the marginal effect of sales on rdintens becomes negative, need to calculate the derivative of rdintens with respect to sales and find where it equals zero. The marginal effect is given by the coefficient of the linear term:
## The marginal effect of sales on rdintens becomes negative when sales is greater than -0.4742839
If the coefficient is statistically significant (i.e., the p-value is small), might consider keeping the quadratic term.
## The decision to keep the quadratic term depends on the significance of the coefficient and the context of the analysis.
## If the quadratic term is statistically significant and improves the model fit, it may be kept.
Let salesbil=sales/1000 then salesbil^2= (sales/1000)^2 The equation becomes: rdintens=2.613+0.00030⋅salesbil−0.0000000070⋅salesbil^2
The standard errors would also need to be adjusted accordingly.
##
## Call:
## lm(formula = rdintens ~ salesbil + I(salesbil^2), data = rdchem)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1418 -1.3630 -0.2257 1.0688 5.5808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.612512 0.429442 6.084 1.27e-06 ***
## salesbil 0.300571 0.139295 2.158 0.0394 *
## I(salesbil^2) -0.006946 0.003726 -1.864 0.0725 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.788 on 29 degrees of freedom
## Multiple R-squared: 0.1484, Adjusted R-squared: 0.08969
## F-statistic: 2.527 on 2 and 29 DF, p-value: 0.09733
The preference for reporting the results depends on various factors, including the interpretability of coefficients, statistical significance, and goodness-of-fit (R-squared). If the quadratic term is not statistically significant and does not contribute much to the goodness-of-fit, you might prefer the simpler linear model for ease of interpretation.
## Adjusted R-squared - Model 1: 0.08969224
## Adjusted R-squared - Model 2: 0.08969224
## Preference: Both models have the same adjusted R-squared.
##
## Coefficients - Model 1:
## (Intercept) sales I(sales^2)
## 2.612512e+00 3.005713e-04 -6.945939e-09
##
## Coefficients - Model 2:
## (Intercept) salesbil I(salesbil^2)
## 2.612512085 0.300571301 -0.006945939
## The first equation is more relevant because it provides a direct estimate of the effect of per-student spending on math test performance.
## In the first equation, the coefficient on lexppp represents the estimated effect of a one-unit increase in log expenditures per student.
## To get the estimated effect of a 10% increase, you can multiply the coefficient by 0.1.
Grades in maths and reading are inherently separate. Because read4 has a strong correlation with other significant explanatory variables, its addition to the model will result in odd effects.
##
## Call:
## lm(formula = math4 ~ lexppp + free + lmedinc + pctsgle, data = meapsingle)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.259 -7.422 1.615 7.274 49.524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.48949 59.23781 0.413 0.6797
## lexppp 9.00648 4.03530 2.232 0.0266 *
## free -0.42164 0.07064 -5.969 9.27e-09 ***
## lmedinc -0.75221 5.35816 -0.140 0.8885
## pctsgle -0.27444 0.16086 -1.706 0.0894 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.59 on 224 degrees of freedom
## Multiple R-squared: 0.4716, Adjusted R-squared: 0.4622
## F-statistic: 49.98 on 4 and 224 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = math4 ~ lexppp + free + lmedinc + pctsgle + read4,
## data = meapsingle)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.5690 -4.6729 -0.0349 4.3644 24.8425
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 149.37870 41.70293 3.582 0.000419 ***
## lexppp 1.93215 2.82480 0.684 0.494688
## free -0.06004 0.05399 -1.112 0.267297
## lmedinc -10.77595 3.75746 -2.868 0.004529 **
## pctsgle -0.39663 0.11143 -3.559 0.000454 ***
## read4 0.66656 0.04249 15.687 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.012 on 223 degrees of freedom
## Multiple R-squared: 0.7488, Adjusted R-squared: 0.7432
## F-statistic: 132.9 on 5 and 223 DF, p-value: < 2.2e-16
## Comparison of Equations:
## 1. Equation without read4:
##
## Call:
## lm(formula = math4 ~ lexppp + free + lmedinc + pctsgle, data = meapsingle)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.259 -7.422 1.615 7.274 49.524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.48949 59.23781 0.413 0.6797
## lexppp 9.00648 4.03530 2.232 0.0266 *
## free -0.42164 0.07064 -5.969 9.27e-09 ***
## lmedinc -0.75221 5.35816 -0.140 0.8885
## pctsgle -0.27444 0.16086 -1.706 0.0894 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.59 on 224 degrees of freedom
## Multiple R-squared: 0.4716, Adjusted R-squared: 0.4622
## F-statistic: 49.98 on 4 and 224 DF, p-value: < 2.2e-16
##
## 2. Equation with read4:
##
## Call:
## lm(formula = math4 ~ lexppp + free + lmedinc + pctsgle + read4,
## data = meapsingle)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.5690 -4.6729 -0.0349 4.3644 24.8425
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 149.37870 41.70293 3.582 0.000419 ***
## lexppp 1.93215 2.82480 0.684 0.494688
## free -0.06004 0.05399 -1.112 0.267297
## lmedinc -10.77595 3.75746 -2.868 0.004529 **
## pctsgle -0.39663 0.11143 -3.559 0.000454 ***
## read4 0.66656 0.04249 15.687 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.012 on 223 degrees of freedom
## Multiple R-squared: 0.7488, Adjusted R-squared: 0.7432
## F-statistic: 132.9 on 5 and 223 DF, p-value: < 2.2e-16
##
## Explanation for (iii):
## The adjusted R-squared value is a measure of the proportion of variance in the dependent variable that is explained by the independent variables.
## In this case, a smaller adjusted R-squared indicates that the additional variable (read4) does not significantly improve the model's explanatory power.
## While the model with read4 has a higher overall R-squared, the adjusted R-squared considers the number of variables and penalizes for overfitting.
## Choosing the model with a smaller adjusted R-squared may be preferable if the additional variable does not contribute substantially to the model's accuracy.
# Assuming you have already loaded the "WAGE2" dataset
library(wooldridge)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
str(wage2)
## 'data.frame': 935 obs. of 17 variables:
## $ wage : int 769 808 825 650 562 1400 600 1081 1154 1000 ...
## $ hours : int 40 50 40 40 40 40 40 40 45 40 ...
## $ IQ : int 93 119 108 96 74 116 91 114 111 95 ...
## $ KWW : int 35 41 46 32 27 43 24 50 37 44 ...
## $ educ : int 12 18 14 12 11 16 10 18 15 12 ...
## $ exper : int 11 11 11 13 14 14 13 8 13 16 ...
## $ tenure : int 2 16 9 7 5 2 0 14 1 16 ...
## $ age : int 31 37 33 32 34 35 30 38 36 36 ...
## $ married: int 1 1 1 1 1 1 0 1 1 1 ...
## $ black : int 0 0 0 0 0 1 0 0 0 0 ...
## $ south : int 0 0 0 0 0 0 0 0 0 0 ...
## $ urban : int 1 1 1 1 1 1 1 1 0 1 ...
## $ sibs : int 1 1 1 4 10 1 1 2 2 1 ...
## $ brthord: int 2 NA 2 3 6 2 2 3 3 1 ...
## $ meduc : int 8 14 14 12 6 8 8 8 14 12 ...
## $ feduc : int 8 14 14 12 11 NA 8 NA 5 11 ...
## $ lwage : num 6.65 6.69 6.72 6.48 6.33 ...
## - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
## Return to another year of education (holding exper fixed): 0.04725277
##
## Call:
## lm(formula = log(wage) ~ educ + exper + educ * exper, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.88558 -0.24553 0.03558 0.26171 1.28836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.949455 0.240826 24.704 <2e-16 ***
## educ 0.044050 0.017391 2.533 0.0115 *
## exper -0.021496 0.019978 -1.076 0.2822
## educ:exper 0.003203 0.001529 2.095 0.0365 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3923 on 931 degrees of freedom
## Multiple R-squared: 0.1349, Adjusted R-squared: 0.1321
## F-statistic: 48.41 on 3 and 931 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(wage) ~ educ + exper + educ * exper, data = data_c6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.88558 -0.24553 0.03558 0.26171 1.28836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.949455 0.240826 24.704 <2e-16 ***
## educ 0.044050 0.017391 2.533 0.0115 *
## exper -0.021496 0.019978 -1.076 0.2822
## educ:exper 0.003203 0.001529 2.095 0.0365 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3923 on 931 degrees of freedom
## Multiple R-squared: 0.1349, Adjusted R-squared: 0.1321
## F-statistic: 48.41 on 3 and 931 DF, p-value: < 2.2e-16
## Theta: -0.1709096
## 95% Confidence Interval: -0.210067 -0.1317521
## Youngest age: 25
## Number of people at the youngest age: 211
##
## Call:
## lm(formula = nettfa ~ inc + age + agesq, data = k401ksubs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -504.93 -18.61 -3.08 9.96 1464.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.680388 10.080986 0.464 0.642
## inc 0.978252 0.025489 38.379 < 2e-16 ***
## age -2.231489 0.489712 -4.557 5.26e-06 ***
## agesq 0.037722 0.005621 6.710 2.05e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58.18 on 9271 degrees of freedom
## Multiple R-squared: 0.1731, Adjusted R-squared: 0.1728
## F-statistic: 646.8 on 3 and 9271 DF, p-value: < 2.2e-16
nettfa = -1.204+ 0.825* inc + -1.322* age + 0.0255*age^2 b1= 0.825 b2<0 and b3>0 U shaped.
## [1] 25
##
## Call:
## lm(formula = nettfa ~ inc + age + I(age^2), data = data_c6_12)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.36 -13.58 -2.97 5.67 1116.45
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.204212 15.280667 -0.079 0.93719
## inc 0.824816 0.060298 13.679 < 2e-16 ***
## age -1.321815 0.767496 -1.722 0.08518 .
## I(age^2) 0.025562 0.008999 2.841 0.00455 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.6 on 2013 degrees of freedom
## Multiple R-squared: 0.1229, Adjusted R-squared: 0.1216
## F-statistic: 93.99 on 3 and 2013 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = nettfa ~ inc + age + I(age^2 - age_50), data = data_c6_12)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.36 -13.58 -2.97 5.67 1116.45
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.204212 15.280667 -0.079 0.93719
## inc 0.824816 0.060298 13.679 < 2e-16 ***
## age -0.043695 0.325270 -0.134 0.89315
## I(age^2 - age_50) 0.025562 0.008999 2.841 0.00455 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.6 on 2013 degrees of freedom
## Multiple R-squared: 0.1229, Adjusted R-squared: 0.1216
## F-statistic: 93.99 on 3 and 2013 DF, p-value: < 2.2e-16
The outcome demonstrates that the Rsquared does not drop when age is taken out of the model. As a result, the adjusted Rsquared rises with fewer variables, improving the goodness of fit.
##
## Call:
## lm(formula = nettfa ~ inc + I(age_25^2), data = data_c6_12)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.37 -13.61 -3.01 5.63 1116.34
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18.488105 2.177584 -8.490 <2e-16 ***
## inc 0.823571 0.059567 13.826 <2e-16 ***
## I(age_25^2) 0.024403 0.002541 9.605 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.59 on 2014 degrees of freedom
## Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
## F-statistic: 141 on 2 and 2014 DF, p-value: < 2.2e-16
nettfa= -18.488105 + 0.823571* 30+ 0.024403 (age-25)^2 ->
nettfa= 6.219+ 0.024403 (age-25)^2
##
## Call:
## lm(formula = nettfa ~ inc + I(inc^2) + I(age_25^2), data = data_c6_12)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.46 -13.66 -3.00 5.76 1116.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.930e+01 3.688e+00 -5.234 1.83e-07 ***
## inc 8.722e-01 1.877e-01 4.648 3.57e-06 ***
## I(inc^2) -5.405e-04 1.978e-03 -0.273 0.785
## I(age_25^2) 2.440e-02 2.541e-03 9.603 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.6 on 2013 degrees of freedom
## Multiple R-squared: 0.1229, Adjusted R-squared: 0.1216
## F-statistic: 94.01 on 3 and 2013 DF, p-value: < 2.2e-16