Stat 115s (Introduction to Econometrics)

Lesson 3.1- Qualitative Explanatory Variables in Regression Analysis

Author

Norberto E. Milla, Jr.

Published

April 19, 2023

1 Introduction

Two kinds of variables: quantitative vs. qualitative
So far we only used quantitative information in our regression models, e.g., wages, experience, house prices, number of rooms, GPA, attendance rate, etc.
In practice we would like to include qualitative variables in the regression.
For example: gender, ethnicity, religion of an individual, region or location of an individual or city, type of industry of a firm (manufacturing, retail, finance,…) etc.
This kind of categorical variables can be represented by binary or dummy variables.
In most cases qualitative factors come in the form of binary information: female/male, domestic/foreign, north/south, manufacturing/nonmanufacturing, countries with or without capital punishment laws, etc.
Dummy variables: also called binary (0/1) variable.
Any kind of categorical information can easily represented by dummy variables.
It does not matter which category is assigned the value 0 or 1. But we need to know the assignment to interpret the results.
For example:
- gender dummy in the wage equation: female=1, male=0.
- Marital status: married=1, single=0
- Location of the country: northern hemisphere=1, southern hemisphere=0

2 Single dummy independent variable

How to include binary information into regression model?
Let one of the x variables be a dummy variable: wage = \beta_0 + \delta_0 female + \beta_1 educ + u
For female workers female = 1; for male worker female = 0.
How to interpret \delta_0: the difference in hourly wage between females and males, given the same amount of education (and the same error term u).
Is there discrimination against women in the labor market?
- If \delta_0 < 0 then we will be able to say that given the same level of education female workers earn less than male workers, on average.
- This can easily be tested using t-statistic.
Conditional expectation of wage for women:

E(wage|female = 1, educ) = \beta_0 + \delta_0 +\beta_1 educ

For men:

E(wage|female = 0, educ) = \beta_0 +\beta_1 educ

Taking the difference:

\begin{align} E(wage|female = 1, educ) - E(wage|female = 0, educ) &= [\beta_0 + \delta_0 +\beta_1 educ] - [\beta_0 +\beta_1 educ] \notag \\ &= \delta_0 \notag \end{align}

As can be seen in the figure, \beta_0 is the intercept term for male workers (where: female=0).
The intercept term for the female workers is \beta_0 + \delta_0 (where: female=1).
A single dummy variable can differentiate between two categories. We do not need to include a separate dummy variable for males.
In general: the number of dummy variables = the number of categories minus 1
In the wage equation we have just two groups. Using two dummy variables would introduce perfect collinearity because female + male = 1.
This is called dummy variable trap.
Dummy=0 is called the base group or benchmark group.
This is the group against which comparisons are made. In the formulation above the base group is male workers.
The coefficient on female (\delta_0) gives the difference in intercepts between females and males.
An alternative is to write the model without the intercept term and including dummy variables for each group:

wage = \gamma_0female + \delta_0male + \beta_1educ + u

No dummy variable trap here as there is no intercept
Notice that coefficients on dummies give us intercepts for each group
We do not prefer this specification because it is not clear how to calculate R^2. It may even be negative.
Also, testing for a difference in intercepts is more difficult.

3 Creating dummy variables in R

There are many ways to create dummy variables in R. Basically, we can use the factor() function to internally generate dummy variables. The only limitation is the reference group is automatically created according to alphabetical arrangement of the levels of the categorical variables.

Another package which is very useful for creating dummy variables is fastDummies.

Code

datadummy <- read.csv("datadummy.csv")
head(datadummy)

  X.1 X      rank discipline yrs.since.phd yrs.service  sex salary
1   1 1      Prof          B            19          18 Male 139750
2   2 2      Prof          B            20          16 Male 173200
3   3 3  AsstProf          B             4           3 Male  79750
4   4 4      Prof          B            45          39 Male 115000
5   5 5      Prof          B            40          41 Male 141500
6   6 6 AssocProf          B             6           6 Male  97000

To generate dummy variables for sex, for example, we use the following code chunk.

Code

datadummy <- dummy_cols(datadummy,select_columns="sex")
head(datadummy)

  X.1 X      rank discipline yrs.since.phd yrs.service  sex salary sex_Female
1   1 1      Prof          B            19          18 Male 139750          0
2   2 2      Prof          B            20          16 Male 173200          0
3   3 3  AsstProf          B             4           3 Male  79750          0
4   4 4      Prof          B            45          39 Male 115000          0
5   5 5      Prof          B            40          41 Male 141500          0
6   6 6 AssocProf          B             6           6 Male  97000          0
  sex_Male
1        1
2        1
3        1
4        1
5        1
6        1

Notice that 2 dummy variables are actually generated, one for male and another for female. This is not a problem since, you can just use one of these two dummy variables as a regressor in a regression analysis.

Let us try another example. This time let us create dummy variables for the categorical variable rank.

Code

datadummy <- dummy_cols(datadummy,select_columns="rank")
head(datadummy)

  X.1 X      rank discipline yrs.since.phd yrs.service  sex salary sex_Female
1   1 1      Prof          B            19          18 Male 139750          0
2   2 2      Prof          B            20          16 Male 173200          0
3   3 3  AsstProf          B             4           3 Male  79750          0
4   4 4      Prof          B            45          39 Male 115000          0
5   5 5      Prof          B            40          41 Male 141500          0
6   6 6 AssocProf          B             6           6 Male  97000          0
  sex_Male rank_AssocProf rank_AsstProf rank_Prof
1        1              0             0         1
2        1              0             0         1
3        1              0             1         0
4        1              0             0         1
5        1              0             0         1
6        1              1             0         0

Using the dummy_cols() function offers some flexibility in the choice of the base or reference group. The category whose dummy variable is being left out automatically becomes the base group.

For example, if in a regression analysis, you only include dummy variables for AssocProf and AsstProf means that Prof is the base category.

4 Adding quantitative variables

Adding quantitative variables does not change the interpretation of dummy variables. Consider the following model with male workers as the base group: wage = \beta_0 + \delta_0 female + \beta_1 educ + \beta_2exper + \beta_3 tenure + u
\delta_0: Intercept difference between female and male workers at the same level of education, experience and tenure.
Testing for discrimination: H0 :\delta_0 = 0 vs H_1 : \delta_0 < 0
- If we reject H_0 in favor of the alternative there is evidence of discrimination against women in the labor market.
- Can easily be tested using t statistic

Code

reg1 <- lm(wage ~ female + educ + exper + tenure,
           data = wage1) 
summary(reg1)


Call:
lm(formula = wage ~ female + educ + exper + tenure, data = wage1)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.7675 -1.8080 -0.4229  1.0467 14.0075 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.56794    0.72455  -2.164   0.0309 *  
female      -1.81085    0.26483  -6.838 2.26e-11 ***
educ         0.57150    0.04934  11.584  < 2e-16 ***
exper        0.02540    0.01157   2.195   0.0286 *  
tenure       0.14101    0.02116   6.663 6.83e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.958 on 521 degrees of freedom
Multiple R-squared:  0.3635,    Adjusted R-squared:  0.3587 
F-statistic:  74.4 on 4 and 521 DF,  p-value: < 2.2e-16

On average, women earn $1.81 less than men, ceteris paribus. More specifically, if we take a woman and a man with the same levels of education, experience and tenure, the woman earns, on average, $1.81 less per hour than the man.
\hat{\beta}_0 = -1.57: this is the intercept for male workers. Not meaningful as there is no one in the sample with zero values of education, experience and tenure.

How do we interpret coefficients of dummy variables if the response is in logarithmic form?

Code

reg2 <- lm(log(wage) ~ female + educ + exper + expersq + tenure + tenursq,
   data = wage1)
summary(reg2)


Call:
lm(formula = log(wage) ~ female + educ + exper + expersq + tenure + 
    tenursq, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.83160 -0.25658 -0.02126  0.25500  1.13370 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.4166910  0.0989279   4.212 2.98e-05 ***
female      -0.2965110  0.0358054  -8.281 1.04e-15 ***
educ         0.0801966  0.0067573  11.868  < 2e-16 ***
exper        0.0294324  0.0049752   5.916 6.00e-09 ***
expersq     -0.0005827  0.0001073  -5.431 8.65e-08 ***
tenure       0.0317139  0.0068452   4.633 4.56e-06 ***
tenursq     -0.0005852  0.0002347  -2.493    0.013 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3998 on 519 degrees of freedom
Multiple R-squared:  0.4408,    Adjusted R-squared:  0.4343 
F-statistic: 68.18 on 6 and 519 DF,  p-value: < 2.2e-16

Interpretation of the coefficient on female: women earn about 100\times 0.297 = 29.7\% less than men for the same levels of education, experience, and tenure
A more accurate approximation for the proportionate difference in wages between men and women holding all other factors fixed is given by

\frac{\widehat{wage}_F - \widehat{wage}_M}{\widehat{wage}_M} = exp(-0.297) - 1 \approx -0.257

Women earn approximately 25.7% less than comparable men.

Suppose that we exclude all quantitative variables from the model.

Code

reg3 <- lm(wage ~ female,  data = wage1)
summary(reg3)


Call:
lm(formula = wage ~ female, data = wage1)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5995 -1.8495 -0.9877  1.4260 17.8805 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.0995     0.2100  33.806  < 2e-16 ***
female       -2.5118     0.3034  -8.279 1.04e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.476 on 524 degrees of freedom
Multiple R-squared:  0.1157,    Adjusted R-squared:  0.114 
F-statistic: 68.54 on 1 and 524 DF,  p-value: 1.042e-15

The intercept is simply the average wage for men in the sample ($7.1): 7.1 - 2.51(0)
Coefficient estimate on female: the difference in the average wage between women and men ($2.51): \underbrace{[7.1 - 2.51(1)]}_{female} - \underbrace{[7.1 - 2.51(0)]}_{male}
The average wage for women in the sample is: 7.1 - 2.51 = $4.59
If we calculate the sample averages for each group we will get the same results. Notice that we did not control for any explanatory variables in this case
The model above can be used to compute the simple comparison-of-means test between the two groups (in our example between two genders)
This is just a simple t-test on the dummy variable: t = - 2.51/0.303 \approx -8.28 which is highly significant . Thus, the evidence suggests that the means across groups are not the same. I The comparison-of-means t-test is valid under the assumption of homoscedasticity. If the variances are different across groups then we should use appropriate correction. I We again note that the model that includes additional factors (education, experience, tenure, etc.) in the model is more appropriate to estimate the ceteris paribus gender wage gap.

5 More than one dummy variables

Let us define two dummy variables: female = 1 if the worker is female; married = 1 if the worker is married.

Code

reg4 <- lm(wage ~ female + married,  data = wage1)
summary(reg4)


Call:
lm(formula = wage ~ female + married, data = wage1)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.2899 -2.0941 -0.9102  1.1186 17.7440 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6.1804     0.2963  20.856  < 2e-16 ***
female       -2.2944     0.3026  -7.582 1.56e-13 ***
married       1.3395     0.3097   4.325 1.83e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.419 on 523 degrees of freedom
Multiple R-squared:  0.1462,    Adjusted R-squared:  0.1429 
F-statistic: 44.78 on 2 and 523 DF,  p-value: < 2.2e-16

Coefficient on female is just the intercept difference between female workers and male workers holding marital status fixed
Similarly, the coefficient on married is the intercept difference between single and married workers (regardless of gender)
Note that in this equation the marriage differential is assumed to be the same across genders
We can add an interaction variable female \times married

Code

reg5 <- lm(wage ~ female + married + female*married,  data = wage1)
summary(reg5)


Call:
lm(formula = wage ~ female + married + female * married, data = wage1)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7530 -1.7327 -0.9973  1.2566 17.0184 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      5.1680     0.3614  14.299  < 2e-16 ***
female          -0.5564     0.4736  -1.175    0.241    
married          2.8150     0.4363   6.451 2.53e-10 ***
female:married  -2.8607     0.6076  -4.708 3.20e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.352 on 522 degrees of freedom
Multiple R-squared:  0.181, Adjusted R-squared:  0.1763 
F-statistic: 38.45 on 3 and 522 DF,  p-value: < 2.2e-16

According to the results above, married men earn $2.82 more than single men on average: \underbrace{5.17 - 0.56(0) _+ 2.82(1)-2.86(0)(1)}_{\text{married men}} - \underbrace{5.17 - 0.56(0) + 2.82(0) - 2.86(0)(0)}_{\text{single men}}
On the other hand, married women earn $0.60 less than single man, on average: \underbrace{5.17 - 0.56(1) _+ 2.82(1)-2.86(1)(1)}_{\text{married women}} - \underbrace{5.17 - 0.56(0) + 2.82(0) - 2.86(0)(0)}_{\text{single men}}

We can add relevant quantitative variables (education, experience, tenure, etc.) together with 2 or more dummy variables; here we can invoke ceteris paribus notion.

Code

reg6 <- lm(log(wage) ~ female + married + educ + exper + expersq + tenure + tenursq
           ,  data = wage1)
summary(reg6)


Call:
lm(formula = log(wage) ~ female + married + educ + exper + expersq + 
    tenure + tenursq, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.81906 -0.24904 -0.02119  0.24525  1.12752 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.4177837  0.0988662   4.226 2.81e-05 ***
female      -0.2901837  0.0361121  -8.036 6.33e-15 ***
married      0.0529219  0.0407561   1.299   0.1947    
educ         0.0791547  0.0068003  11.640  < 2e-16 ***
exper        0.0269535  0.0053258   5.061 5.80e-07 ***
expersq     -0.0005399  0.0001122  -4.814 1.95e-06 ***
tenure       0.0312962  0.0068482   4.570 6.10e-06 ***
tenursq     -0.0005744  0.0002347  -2.448   0.0147 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3995 on 518 degrees of freedom
Multiple R-squared:  0.4426,    Adjusted R-squared:  0.4351 
F-statistic: 58.76 on 7 and 518 DF,  p-value: < 2.2e-16

After controlling for the other factors is there still difference in average wages between single male workers and married male workers?
What about for married and single workers?
Again in the model above, we assumed that the marriage premium is the same for men and women. To relax this, we can just add an interaction dummy (just like we did in the previous model)

Code

reg7 <- lm(log(wage) ~ female + married + female*married+ educ + exper + expersq + tenure + tenursq
           ,  data = wage1)
summary(reg7)


Call:
lm(formula = log(wage) ~ female + married + female * married + 
    educ + exper + expersq + tenure + tenursq, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.89697 -0.24060 -0.02689  0.23144  1.09197 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.3213781  0.1000090   3.213 0.001393 ** 
female         -0.1103502  0.0557421  -1.980 0.048272 *  
married         0.2126757  0.0553572   3.842 0.000137 ***
educ            0.0789103  0.0066945  11.787  < 2e-16 ***
exper           0.0268006  0.0052428   5.112 4.50e-07 ***
expersq        -0.0005352  0.0001104  -4.847 1.66e-06 ***
tenure          0.0290875  0.0067620   4.302 2.03e-05 ***
tenursq        -0.0005331  0.0002312  -2.306 0.021531 *  
female:married -0.3005931  0.0717669  -4.188 3.30e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3933 on 517 degrees of freedom
Multiple R-squared:  0.4609,    Adjusted R-squared:  0.4525 
F-statistic: 55.25 on 8 and 517 DF,  p-value: < 2.2e-16

Married men are estimated to earn about 21.3% more than single men (proportionate difference relative to the base group which is single male), holding all other factors fixed.
- \underbrace{[0.321 - 0.110(0) + 0.213(1) - 0.301(0)(1)]}_{\text{married men}} - \underbrace{[0.321 - 0.110(0) + 0.213(0) - 0.301(0)(0)]}_{\text{single men}}
What is the wage differential between married women and single men?

Alternatively, we can create a set of dummy variables corresponding to 2 x 2 classification based on female and married dummies as follows:

\begin{align} marrmale &= married \times (1-female) \notag \\ marrfem &= married \times female \notag \\ singfem &= (1-married) \times female \notag \\ singmale &= (1-married) \times (1-female) \notag \end{align}

Code

wage1a <- wage1 %>% 
  mutate(marrmale = married*(1-female),
         marrfem = married*female,
         singmale = (1-married)*(1-female),
         singfem = (1-married)*female)

Need to choose one of these as the base group so we that we only include in the model 4-1=3 dummies

Code

reg8 <- lm(log(wage) ~ marrmale + marrfem + singfem + educ + exper + expersq + tenure + tenursq,  data = wage1a)
summary(reg8)


Call:
lm(formula = log(wage) ~ marrmale + marrfem + singfem + educ + 
    exper + expersq + tenure + tenursq, data = wage1a)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.89697 -0.24060 -0.02689  0.23144  1.09197 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.3213781  0.1000090   3.213 0.001393 ** 
marrmale     0.2126757  0.0553572   3.842 0.000137 ***
marrfem     -0.1982676  0.0578355  -3.428 0.000656 ***
singfem     -0.1103502  0.0557421  -1.980 0.048272 *  
educ         0.0789103  0.0066945  11.787  < 2e-16 ***
exper        0.0268006  0.0052428   5.112 4.50e-07 ***
expersq     -0.0005352  0.0001104  -4.847 1.66e-06 ***
tenure       0.0290875  0.0067620   4.302 2.03e-05 ***
tenursq     -0.0005331  0.0002312  -2.306 0.021531 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3933 on 517 degrees of freedom
Multiple R-squared:  0.4609,    Adjusted R-squared:  0.4525 
F-statistic: 55.25 on 8 and 517 DF,  p-value: < 2.2e-16

Coefficient on marrmale: 0.21: Married men are estimated to earn about 21% more than single men (proportionate difference relative to the base group which is single male), holding all other factors fixed.
A married woman earns 19.8% less than a single man with the same levels of the other variables.

6 Allowing for different slopes

So far we assumed that slope coefficients on the quantitative variables are constant but intercepts are different. In some cases we want to allow for different slopes as well as different intercepts.

For example, suppose that we want to test whether the return to education is the same for men and women. To estimate different slopes it suffices to include an interaction term involving female and educ: female \times educ.

Consider the following equation:

log(wage) = (\beta_0 + \delta_0\,female) + (\beta_1 + \delta_1\, female)\times educ + u

Plugging in female = 0 we see that \beta_0 is the intercept for male workers
\beta_1 is the slope on education for males
Plugging in female = 1, \delta_0 is the difference between intercepts for female and male workers. Thus, the intercept term for females is \beta_0 + \delta_0
Slope on education for female: \beta_1 + \delta_1
\delta_1 measures the difference in the return to education between women and men.
If \delta_1 > 0, then we can say that the return to education for women is larger than the return to education for men.

Code

knitr::include_graphics("pic17.png")

Graph (a): the intercept for women is below that for men, and the slope of the line is smaller for women than for men.
- This means that women earn less than men at all levels of education and the gap increases as educ gets larger.
Graph (b): the intercept for women is below that for men, but the slope on education is larger for women.
- This means that women earn less than men at low levels of education, but the gap narrows as education increases.
- At some point, a women earns more than a man given the same level of education.

The above model can be reformulated as follows:

log(wage) = \beta_0 + \delta_0\,female + \beta_1\,educ + \delta_1\,(female \times educ) + u

Can test H_0: \delta_1 = 0 [The return to education is the same for men and women.] versus H_1: \delta_1 \neq 0.
Can also test simultaneously H_0: \delta_0 = 0, \delta_1 = 0 using F test to see if “average wages are the same for men and women who have the same levels of education.

Code

reg9 <- lm(log(wage) ~ female + educ + female*educ + exper + expersq + tenure + tenursq,
           data=wage1)
summary(reg9)


Call:
lm(formula = log(wage) ~ female + educ + female * educ + exper + 
    expersq + tenure + tenursq, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.83265 -0.25261 -0.02374  0.25396  1.13584 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.3888060  0.1186871   3.276  0.00112 ** 
female      -0.2267886  0.1675394  -1.354  0.17644    
educ         0.0823692  0.0084699   9.725  < 2e-16 ***
exper        0.0293366  0.0049842   5.886 7.11e-09 ***
expersq     -0.0005804  0.0001075  -5.398 1.03e-07 ***
tenure       0.0318967  0.0068640   4.647 4.28e-06 ***
tenursq     -0.0005900  0.0002352  -2.509  0.01242 *  
female:educ -0.0055645  0.0130618  -0.426  0.67028    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4001 on 518 degrees of freedom
Multiple R-squared:  0.441, Adjusted R-squared:  0.4334 
F-statistic: 58.37 on 7 and 518 DF,  p-value: < 2.2e-16

Estimated return to education for men is 8.2%.
For women, return to education is 0.082 - 0.0056 = 0.0764, or about 7.6%
- This is not economically large and statistically insignificant
Coefficient on female measures the wage difference between men and women when educ = 0.
- Note that there is no one with 0 years of education in the sample.
- Also, due to high collinearity between female and female x educ, its standard error is high and t ratio is small
Instead of omitting female we will estimate its coefficient by redefining the interaction term
Instead of interacting female with educ we will interact it with the deviation from the mean education level. Average education level in the sample is 12.5 years
- Our new interaction term is: female \times (educ - 12.5).
In this regression, the coefficient on female will measure the average wage difference between women and men at the mean education level, educ = 12.5.

Code

wage1 <- wage1 %>% 
  mutate(educdev=educ-12.5)

reg10 <- lm(log(wage) ~ female + educdev + female*educdev + exper + expersq + tenure + tenursq,
           data=wage1)
summary(reg10)


Call:
lm(formula = log(wage) ~ female + educdev + female * educdev + 
    exper + expersq + tenure + tenursq, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.83265 -0.25261 -0.02374  0.25396  1.13584 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     1.4184208  0.0440456  32.203  < 2e-16 ***
female         -0.2963450  0.0358358  -8.270 1.14e-15 ***
educdev         0.0823692  0.0084699   9.725  < 2e-16 ***
exper           0.0293366  0.0049842   5.886 7.11e-09 ***
expersq        -0.0005804  0.0001075  -5.398 1.03e-07 ***
tenure          0.0318967  0.0068640   4.647 4.28e-06 ***
tenursq        -0.0005900  0.0002352  -2.509   0.0124 *  
female:educdev -0.0055645  0.0130618  -0.426   0.6703    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4001 on 518 degrees of freedom
Multiple R-squared:  0.441, Adjusted R-squared:  0.4334 
F-statistic: 58.37 on 7 and 518 DF,  p-value: < 2.2e-16