Chapter2 C9 (i) From the data in COUNTYMURDERS, 1996, there were 1051 counties with zero murders.There were 31 counties with at least one execution. The largest number of executions was 3.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
##    arrests countyid   density  popul perc1019 perc2029 percblack percmale
## 1        8     1001  67.21535  40061 15.89077 13.17491 20.975510 48.70073
## 2        6     1003  77.05643 123023 13.93886 11.63929 13.496660 48.83233
## 3        1     1005  29.91548  26475 15.06327 13.69972 46.190750 49.15203
## 4        0     1009  67.20457  43392 14.17542 12.99318  1.415007 48.97446
## 5        1     1011  17.89899  11188 14.98927 14.13121 72.756520 49.91956
## 6        2     1013  27.71148  21530 15.68509 11.25871 41.384110 46.81839
## 7       20     1015 186.53970 113511 14.71135 14.28936 19.096830 47.99447
## 8        4     1017  61.51258  36748 14.65386 13.13813 37.253730 47.31142
## 9        2     1019  38.27024  21170 14.13321 12.13037  7.042985 49.22060
## 10       0     1021  50.89291  35323 14.80339 12.64332 11.921410 48.60006
##    rpcincmaint rpcpersinc rpcunemins year murders  murdrate arrestrate
## 1      192.038  11852.760     26.796 1996       7 1.7473350  1.9969550
## 2      139.084  13583.020     28.710 1996       6 0.4877137  0.4877137
## 3      405.768  10760.510     63.162 1996       1 0.3777148  0.3777148
## 4      184.382  11094.820     21.692 1996       2 0.4609145  0.0000000
## 5      485.518   8349.506     63.162 1996       0 0.0000000  0.8938148
## 6      357.918   9947.058     54.868 1996       2 0.9289364  0.9289364
## 7      248.820  11536.320     35.090 1996      14 1.2333610  1.7619440
## 8      243.078  10899.590     41.470 1996       3 0.8163710  1.0884950
## 9      200.970   9806.698     26.796 1996       0 0.0000000  0.9447331
## 10     231.594  10819.840     40.194 1996       0 0.0000000  0.0000000
##    statefips countyfips execs    lpopul execrate
## 1          1          1     0 10.598160        0
## 2          1          3     0 11.720130        0
## 3          1          5     0 10.183960        0
## 4          1          9     0 10.678030        0
## 5          1         11     0  9.322598        0
## 6          1         13     0  9.977202        0
## 7          1         15     0 11.639660        0
## 8          1         17     0 10.511840        0
## 9          1         19     0  9.960340        0
## 10         1         21     0 10.472290        0
## Counties with zero murders in 1996: 1051
## Counties with at least one execution in 1996: 31
## Largest number of executions in 1996: 3
  1. Estimate the equation murders = Bo + B1xecs + u by OLS and report the results in the usual way, including sample size and R-squared.
## 
## Call:
## lm(formula = murders ~ execs, data = subset(countymurders1996))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -149.12   -5.46   -4.46   -2.46 1338.99 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.4572     0.8348   6.537 7.79e-11 ***
## execs        58.5555     5.8333  10.038  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.89 on 2195 degrees of freedom
## Multiple R-squared:  0.04389,    Adjusted R-squared:  0.04346 
## F-statistic: 100.8 on 1 and 2195 DF,  p-value: < 2.2e-16
  1. Interpret the slope coefficient reported in part (ii). Does the estimated equation suggest a deterrent effect of capital punishment?
## The slope coefficient (ß1) represents the change in murders for a one-unit change in executions.
## If ß1 is negative, it suggests a deterrent effect of capital punishment.
  1. What is the smallest number of murders that can be predicted by the equation? What is the residual for a county with zero executions and zero murders?
## Smallest number of murders predicted: 5.457241
## Residual for a county with zero executions and zero murders: 5.457241
  1. Explain why a simple regression analysis is not well suited for determining whether capital punishment has a deterrent effect on murders.
## A simple regression analysis may suffer from omitted variable bias and endogeneity issues.
## Factors other than executions could influence the murder rate, leading to biased estimates.
## Additionally, the decision to implement capital punishment may be influenced by the crime rate,
## creating endogeneity problems and making causal inference challenging.

Chapter3 5 (i) In the model GPA = ß0 + ß1study + ß2sleep + ß3work + ß4leisure + u

##    age soph junior senior senior5 male campus business engineer colGPA hsGPA
## 1   21    0      0      1       0    0      0        1        0    3.0   3.0
## 2   21    0      0      1       0    0      0        1        0    3.4   3.2
## 3   20    0      1      0       0    0      0        1        0    3.0   3.6
## 4   19    1      0      0       0    1      1        1        0    3.5   3.5
## 5   20    0      1      0       0    0      0        1        0    3.6   3.9
## 6   20    0      0      1       0    1      1        1        0    3.0   3.4
## 7   22    0      0      0       1    0      0        1        0    2.7   3.5
## 8   22    0      0      0       1    0      0        0        0    2.7   3.0
## 9   22    0      0      0       1    0      0        0        0    2.7   3.0
## 10  19    1      0      0       0    0      0        1        0    3.8   4.0
##    ACT job19 job20 drive bike walk voluntr PC greek car siblings bgfriend clubs
## 1   21     0     1     1    0    0       0  0     0   1        1        0     0
## 2   24     0     1     1    0    0       0  0     0   1        0        1     1
## 3   26     1     0     0    0    1       0  0     0   1        1        0     1
## 4   27     1     0     0    0    1       0  0     0   0        1        0     0
## 5   28     0     1     0    1    0       0  0     0   1        1        1     0
## 6   25     0     0     0    0    1       0  0     0   1        1        0     0
## 7   25     0     0     0    1    0       0  0     1   1        1        0     1
## 8   22     1     0     1    0    0       0  1     0   0        1        1     0
## 9   21     1     0     1    0    0       0  0     0   1        1        1     1
## 10  27     1     0     0    0    1       0  1     0   0        1        0     1
##    skipped alcohol gradMI fathcoll mothcoll
## 1      2.0    1.00      1        0        0
## 2      0.0    1.00      1        1        1
## 3      0.0    1.00      1        1        1
## 4      0.0    0.00      0        0        0
## 5      0.0    1.50      1        1        0
## 6      0.0    0.00      0        1        0
## 7      0.0    2.00      1        0        1
## 8      3.0    3.00      1        1        1
## 9      2.0    2.50      1        1        1
## 10     0.5    0.75      1        0        1
## In the given model, it does not make sense to hold sleep, work, and leisure fixed while changing study.
## The reason is that the sum of hours in all four activities must be 168 for each student.
## Changing the hours spent on studying would inherently change the hours available for other activities.
  1. Explain why this model violates Assumption MLR.3.
## This model violates Assumption MLR.3, which assumes that the regressors are fixed and non-stochastic.
## In this case, the hours spent on different activities are not fixed; they must sum up to 168, which introduces
## stochasticity and correlation among the explanatory variables.
  1. How could you reformulate the model so that its parameters have a useful interpretation and it satisfies Assumption MLR.3?
## To satisfy Assumption MLR.3, you could reformulate the model by using a set of independent variables that
## are not constrained to sum to a fixed value. For example, you could use the hours spent on three activities
## as independent variables, and the fourth one can be derived from the constraint (168 - study - work - leisure).

Chapter3 10

##    inlf hours kidslt6 kidsge6 age educ   wage repwage hushrs husage huseduc
## 1     1  1610       1       0  32   12 3.3540    2.65   2708     34      12
## 2     1  1656       0       2  30   12 1.3889    2.65   2310     30       9
## 3     1  1980       1       3  35   12 4.5455    4.04   3072     40      12
## 4     1   456       0       3  34   12 1.0965    3.25   1920     53      10
## 5     1  1568       1       2  31   14 4.5918    3.60   2000     32      12
## 6     1  2032       0       0  54   12 4.7421    4.70   1040     57      11
## 7     1  1440       0       2  37   16 8.3333    5.95   2670     37      12
## 8     1  1020       0       0  54   12 7.8431    9.98   4120     53       8
## 9     1  1458       0       2  48   12 2.1262    0.00   1995     52       4
## 10    1  1600       0       2  39   12 4.6875    4.15   2100     43      12
##    huswage faminc    mtr motheduc fatheduc unem city exper  nwifeinc      lwage
## 1   4.0288  16310 0.7215       12        7  5.0    0    14 10.910060 1.21015370
## 2   8.4416  21800 0.6615        7        7 11.0    1     5 19.499981 0.32851210
## 3   3.5807  21040 0.6915       12        7  5.0    0    15 12.039910 1.51413774
## 4   3.5417   7300 0.7815        7        7  5.0    0     6  6.799996 0.09212332
## 5  10.0000  27300 0.6215       12       14  9.5    1     7 20.100058 1.52427220
## 6   6.7106  19495 0.6915       14        7  7.5    1    33  9.859054 1.55648005
## 7   3.4277  21152 0.6915       14        7  5.0    0    11  9.152048 2.12025952
## 8   2.5485  18900 0.6915        3        3  5.0    0    35 10.900038 2.05963421
## 9   4.2206  20405 0.7515        7        7  3.0    0    24 17.305000 0.75433636
## 10  5.7143  20425 0.6915        7        7  5.0    0    21 12.925000 1.54489934
##    expersq
## 1      196
## 2       25
## 3      225
## 4       36
## 5       49
## 6     1089
## 7      121
## 8     1225
## 9      576
## 10     441
  1. If x1 is highly correlated with x2 and x3 in the sample, and x2 and x3 have large partial effects on y, would you expect (B1 with ~ sign) and (adjusted B1) to be similar or very different? Explain.
## If x1 is highly correlated with x2 and x3 and x2 and x3 have large partial effects on y,
## you would expect (B1 with ~ sign) and (adjusted B1) to be similar. The inclusion of x2 and x3 in the model
## should help in capturing the relationship between x1 and y more accurately, resulting in a similar effect.
  1. If x1 is almost uncorrelated with x2 and x3, but x2 and x3 are highly correlated, will (B1 with ~ sign) and (adjusted B1) tend to be similar or very different? Explain.
## If x1 is almost uncorrelated with x2 and x3 but x2 and x3 are highly correlated,
## (B1 with ~ sign) and (adjusted B1) tend to be similar. The high correlation between x2 and x3
## may result in multicollinearity issues, leading to unstable coefficient estimates for x2 and x3.
  1. If x1 is highly correlated with x2 and x3, and x2 and x3 have small partial effects on y, would you expect se(B₁ with ~ sign) or se(adjusted B₁) to be smaller? Explain.
## If x1 is highly correlated with x2 and x3, and x2 and x3 have small partial effects on y,
## you would expect se(B₁ with ~ sign) to be smaller. The high correlation can lead to
## multicollinearity, inflating standard errors for the individual coefficients.
  1. If x1 is almost uncorrelated with x2 and x3, x2 and x3 have large partial effects on y, and x2 and x3 are highly correlated, would you expect se(B₁ with ~ sign) or se (adjusted B₁) to be smaller? Explain.
## If x1 is almost uncorrelated with x2 and x3, x2 and x3 have large partial effects on y,
## and x2 and x3 are highly correlated, you would expect se(adjusted B₁) to be smaller.
## The inclusion of highly correlated variables (x2 and x3) without much correlation with x1
## can improve the precision of the estimates for x1, resulting in smaller standard errors.

Chapter3 C8 (i) Find the average values of prpblck and income in the sample, along with their standard deviations.

## Average prpblck: NA
## Standard deviation of prpblck: NA
## Average income: NA
## Standard deviation of income: NA
## Units of measurement: prpblck is the proportion of the population that is black (in percentage),
## and income is the median income in the zip code.

(ii)Estimate the model psoda = B0 + B1prpblck + B2income + u by OLS

## 
## Call:
## lm(formula = psoda ~ prpblck + income, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29401 -0.05242  0.00333  0.04231  0.44322 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.563e-01  1.899e-02  50.354  < 2e-16 ***
## prpblck     1.150e-01  2.600e-02   4.423 1.26e-05 ***
## income      1.603e-06  3.618e-07   4.430 1.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08611 on 398 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.06422,    Adjusted R-squared:  0.05952 
## F-statistic: 13.66 on 2 and 398 DF,  p-value: 1.835e-06
## The coefficient on prpblck is the estimated change in psoda for a one-unit change in prpblck.
## In this context, it represents the change in the price of soda for a 1% increase in the proportion
## of the population that is black. Whether it is economically large depends on the magnitude and significance
## of the coefficient, which can be determined from the summary output.
  1. Compare the estimate from part (ii) with the simple regression estimate from psoda on prpblck.
## 
## Call:
## lm(formula = psoda ~ prpblck, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30884 -0.05963  0.01135  0.03206  0.44840 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03740    0.00519  199.87  < 2e-16 ***
## prpblck      0.06493    0.02396    2.71  0.00702 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0881 on 399 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.01808,    Adjusted R-squared:  0.01561 
## F-statistic: 7.345 on 1 and 399 DF,  p-value: 0.007015
  1. Estimate the model log(psoda) = B0 + B1prpblck + B2log(income) + u
## 
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income), data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33563 -0.04695  0.00658  0.04334  0.35413 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.79377    0.17943  -4.424 1.25e-05 ***
## prpblck      0.12158    0.02575   4.722 3.24e-06 ***
## log(income)  0.07651    0.01660   4.610 5.43e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0821 on 398 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.06809,    Adjusted R-squared:  0.06341 
## F-statistic: 14.54 on 2 and 398 DF,  p-value: 8.039e-07
## Estimated percentage change in psoda for a 20% increase in prpblck: 2.46141
  1. Add the variable prppov to the regression in part (iv)
## 
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32218 -0.04648  0.00651  0.04272  0.35622 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.46333    0.29371  -4.982  9.4e-07 ***
## prpblck      0.07281    0.03068   2.373   0.0181 *  
## log(income)  0.13696    0.02676   5.119  4.8e-07 ***
## prppov       0.38036    0.13279   2.864   0.0044 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08137 on 397 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.08696,    Adjusted R-squared:  0.08006 
## F-statistic:  12.6 on 3 and 397 DF,  p-value: 6.917e-08
  1. Find the correlation between log(income) and prppov
## Correlation between lincome and prppov: NA
  1. Evaluate the following statement
## The statement 'Because lincome and prppov are so highly correlated, they have no business being in the same regression.'
## The high negative correlation between lincome and prppov suggests multicollinearity between these two variables.
## Multicollinearity can lead to unstable coefficient estimates, making it challenging to interpret the individual effects
## of the variables. However, the decision to include or exclude variables should be based on the specific research question,
## theoretical considerations, and the goals of the analysis.
## In some cases, including both variables in the regression model might still be justified if they capture different aspects
## of the relationship with the dependent variable and contribute to a more comprehensive understanding of the phenomenon under study.

Chapter4 3 (i) Estimated percentage point change in Rdintens for a 10% increase in sales

## i) Estimated percentage point change in Rdintens for a 10% increase in sales: 3.21
  1. Test for log(sales) coefficient
## ii) p-value for the test on log(sales) coefficient: 0.1480413
##    (At 5% level): Fail to reject H0
##    (At 10% level): Fail to reject H0
  1. Interpretation of the coefficient on profmarg
## iii) Coefficient on profmarg: 0.5
  1. Test for profmarg coefficient
## d) p-value for the test on profmarg coefficient: 0.2860082
##    (At 5% level): Fail to reject H0
##    (At 10% level): Fail to reject H0

Chapter4 C8 (i) How many single-person households are there in the data set?

## Number of single-person households: 2017
  1. Use OLS to estimate the model: nettfa = B0 + B1inc + B2age + u
## 
## Call:
## lm(formula = nettfa ~ inc + age, data = single_person_households)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -179.95  -14.16   -3.42    6.03 1113.94 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -43.03981    4.08039 -10.548   <2e-16 ***
## inc           0.79932    0.05973  13.382   <2e-16 ***
## age           0.84266    0.09202   9.158   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.68 on 2014 degrees of freedom
## Multiple R-squared:  0.1193, Adjusted R-squared:  0.1185 
## F-statistic: 136.5 on 2 and 2014 DF,  p-value: < 2.2e-16
## Interpretation of slope coefficients:
## B1 (inc): The estimated change in nettfa for a one-unit change in inc (annual family income).
## B2 (age): The estimated change in nettfa for a one-unit change in age.
## There might be surprises depending on the context and expectations of the relationship between variables.
  1. Does the intercept from the regression in part (ii) have an interesting meaning? Explain.
## The intercept (B0) represents the estimated net financial wealth (nettfa) when both inc and age are zero.
## In this context, it may not have a meaningful interpretation, as having zero income and age is not practically meaningful.
  1. Find the p-value for the test H0: B₂ = 1 against H₁: B₂ < 1. Do you reject H0 at the 1% significance level?
## p-value for the test H0: B₂ = 1 against H₁: B₂ < 1: 1.265959e-19
## At the 1% significance level, we would reject H0 if the p-value is less than 0.01.
## We reject H0; there is evidence that B₂ is less than 1.
  1. If you do a simple regression of nettfa on inc, is the estimated coefficient on inc much different from the estimate in part (ii)? Why or why not?
## 
## Call:
## lm(formula = nettfa ~ inc, data = single_person_households)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -185.12  -12.85   -4.85    1.78 1112.66 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.5709     2.0607   -5.13 3.18e-07 ***
## inc           0.8207     0.0609   13.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45.59 on 2015 degrees of freedom
## Multiple R-squared:  0.08267,    Adjusted R-squared:  0.08222 
## F-statistic: 181.6 on 1 and 2015 DF,  p-value: < 2.2e-16
## Comparison of the estimated coefficient on inc:
## The estimated coefficient on inc in the simple regression is compared to the estimate in part (ii).
## Differences may arise due to the inclusion of age in the multiple regression model, which may affect
## the relationship between nettfa and inc. The context and goals of the analysis will determine
## whether the inclusion of age improves the model.

Chapter5 5

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

  1. Probability that “score” exceeds 100 using the normal distribution
## i. Probability that 'score' exceeds 100 using the normal distribution: 0.02044288

## 
##  Shapiro-Wilk normality test
## 
## data:  data
## W = 0.96973, p-value = 2.454e-12
  1. Explanation of the left tail
## (ii) The normal distribution may not fit well in the left tail, as percentile values are bounded by values of 0 and 100, and normal distribution assumes unbounded tails.

Chapter5 C1 (i) Estimate the equation wage = Bo + Bjeduc + Brexper + Bstenure + u. Save the residuals and plot a histogram.

## 
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = wage_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6068 -1.7747 -0.6279  1.1969 14.6536 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.87273    0.72896  -3.941 9.22e-05 ***
## educ         0.59897    0.05128  11.679  < 2e-16 ***
## exper        0.02234    0.01206   1.853   0.0645 .  
## tenure       0.16927    0.02164   7.820 2.93e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared:  0.3064, Adjusted R-squared:  0.3024 
## F-statistic: 76.87 on 3 and 522 DF,  p-value: < 2.2e-16
  1. Repeat part (i), but with log(wage) as the dependent variable.

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure, data = wage_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05802 -0.29645 -0.03265  0.28788  1.42809 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.284360   0.104190   2.729  0.00656 ** 
## educ        0.092029   0.007330  12.555  < 2e-16 ***
## exper       0.004121   0.001723   2.391  0.01714 *  
## tenure      0.022067   0.003094   7.133 3.29e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4409 on 522 degrees of freedom
## Multiple R-squared:  0.316,  Adjusted R-squared:  0.3121 
## F-statistic: 80.39 on 3 and 522 DF,  p-value: < 2.2e-16
  1. Q-Q plots for normality assessment

## 
## Summary Statistics - Level-Level Model:
## 
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = wage_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6068 -1.7747 -0.6279  1.1969 14.6536 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.87273    0.72896  -3.941 9.22e-05 ***
## educ         0.59897    0.05128  11.679  < 2e-16 ***
## exper        0.02234    0.01206   1.853   0.0645 .  
## tenure       0.16927    0.02164   7.820 2.93e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared:  0.3064, Adjusted R-squared:  0.3024 
## F-statistic: 76.87 on 3 and 522 DF,  p-value: < 2.2e-16
## 
## Summary Statistics - Log-Level Model:
## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure, data = wage_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05802 -0.29645 -0.03265  0.28788  1.42809 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.284360   0.104190   2.729  0.00656 ** 
## educ        0.092029   0.007330  12.555  < 2e-16 ***
## exper       0.004121   0.001723   2.391  0.01714 *  
## tenure      0.022067   0.003094   7.133 3.29e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4409 on 522 degrees of freedom
## Multiple R-squared:  0.316,  Adjusted R-squared:  0.3121 
## F-statistic: 80.39 on 3 and 522 DF,  p-value: < 2.2e-16

Chapter6 3 (i) At what point does the marginal effect of sales on rdintens become negative?

## The marginal effect of sales on rdintens becomes negative when sales is greater than -0.4742839
  1. Would you keep the quadratic term in the model? Explain.
## The decision to keep the quadratic term depends on the significance of the coefficient and the context of the analysis.
## If the quadratic term is statistically significant and improves the model fit, it may be kept.
  1. Define salesbil as sales measured in billions of dollars
## 
## Call:
## lm(formula = rdintens ~ salesbil + I(salesbil^2), data = rdchem)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1418 -1.3630 -0.2257  1.0688  5.5808 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.612512   0.429442   6.084 1.27e-06 ***
## salesbil       0.300571   0.139295   2.158   0.0394 *  
## I(salesbil^2) -0.006946   0.003726  -1.864   0.0725 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.788 on 29 degrees of freedom
## Multiple R-squared:  0.1484, Adjusted R-squared:  0.08969 
## F-statistic: 2.527 on 2 and 29 DF,  p-value: 0.09733
  1. For the purpose of reporting the results, which equation do you prefer?
## Adjusted R-squared - Model 1: 0.08969224
## Adjusted R-squared - Model 2: 0.08969224
## Preference: Model 2 is preferred based on adjusted R-squared.
## 
## Coefficients - Model 1:
##   (Intercept)         sales    I(sales^2) 
##  2.612512e+00  3.005713e-04 -6.945939e-09
## 
## Coefficients - Model 2:
##   (Intercept)      salesbil I(salesbil^2) 
##   2.612512085   0.300571301  -0.006945939