exploring the data

library(Zelig)
## Loading required package: survival
data(infert)

str(infert)
## 'data.frame':    248 obs. of  8 variables:
##  $ education     : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 1 1 1 1 2 2 2 2 2 2 ...
##  $ age           : num  26 42 39 34 35 36 23 32 21 28 ...
##  $ parity        : num  6 1 6 4 3 4 1 2 1 2 ...
##  $ induced       : num  1 1 2 2 1 2 0 0 0 0 ...
##  $ case          : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ spontaneous   : num  2 0 0 0 1 1 0 0 1 0 ...
##  $ stratum       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ pooled.stratum: num  3 1 4 2 32 36 6 22 5 19 ...

What influences Infertility

library(ggplot2)
ggplot(infert, aes (x = parity)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The chart above shows how many times women in this data set has been or currently is pregnant.

The relationship between spontaneous abortion and parity

m1 <- lm(spontaneous ~ parity, data = infert)
summary(m1)
## 
## Call:
## lm(formula = spontaneous ~ parity, data = infert)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2911 -0.5597 -0.3768  0.6232  1.4403 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.19396    0.08640   2.245   0.0257 *  
## parity       0.18285    0.03545   5.158 5.15e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6973 on 246 degrees of freedom
## Multiple R-squared:  0.09758,    Adjusted R-squared:  0.09392 
## F-statistic:  26.6 on 1 and 246 DF,  p-value: 5.145e-07

average between 1 and 2, 1.8, spontaneous abortions many resut compared to all other amount of times pregnant. On your fisrt or second spontaneous abortion may result compared to never gettting pregnant or having more than one child.

The relationship between spontaneous abortion and education

library(ggplot2)
ggplot (infert, aes (x = education)) + geom_bar()

m2 <- lm(spontaneous ~ education, data = infert)
summary(m2)
## 
## Call:
## lm(formula = spontaneous ~ education, data = infert)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6293 -0.5417 -0.5417  0.4583  1.5833 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        0.4167     0.2117   1.968   0.0502 .
## education6-11yrs   0.1250     0.2220   0.563   0.5740  
## education12+ yrs   0.2126     0.2224   0.956   0.3399  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7334 on 245 degrees of freedom
## Multiple R-squared:  0.005852,   Adjusted R-squared:  -0.002263 
## F-statistic: 0.7211 on 2 and 245 DF,  p-value: 0.4872

On average .12 middle schoolers and high schoolers have more spontaneous abortions than elementary schoolers. On average .21 high school+ people have more spontaneous abortions then elementary schoolers.

A slightly more complicated model

m3 <- lm(spontaneous ~ education + parity, data = infert)
summary(m3)
## 
## Call:
## lm(formula = spontaneous ~ education + parity, data = infert)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1316 -0.5182 -0.2832  0.5732  1.6421 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.5818     0.2546  -2.285 0.023155 *  
## education6-11yrs   0.6301     0.2223   2.835 0.004968 ** 
## education12+ yrs   0.7737     0.2260   3.423 0.000727 ***
## parity             0.2349     0.0379   6.199  2.4e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.683 on 244 degrees of freedom
## Multiple R-squared:  0.1411, Adjusted R-squared:  0.1306 
## F-statistic: 13.36 on 3 and 244 DF,  p-value: 4.181e-08

For every -.58 increase in unit of education, there is a .63 increase in spontaneous abortions for middle schools to high schoolers. There is a .77 increase in spontaneous abortions for highschool+. Controlling for number of children there is .63 increase in spontaneous abortion for middle school and high schooler, while there is a .77 increase in spontaneous abortions for the highschool+ group. Controlling for education, women, who have had been pregnant before, have a .23 increase spontaneous abortion.

Interaction between parity and education

m4 <- lm(spontaneous ~ education*parity, data = infert)
summary(m4)
## 
## Call:
## lm(formula = spontaneous ~ education * parity, data = infert)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1319 -0.5106 -0.1999  0.5651  1.5970 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)              0.18408    0.45149   0.408   0.6838  
## education6-11yrs        -0.29483    0.47172  -0.625   0.5326  
## education12+ yrs         0.02538    0.46840   0.054   0.9568  
## parity                   0.05473    0.09572   0.572   0.5680  
## education6-11yrs:parity  0.25595    0.11192   2.287   0.0231 *
## education12+ yrs:parity  0.17075    0.11181   1.527   0.1280  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6785 on 242 degrees of freedom
## Multiple R-squared:  0.1595, Adjusted R-squared:  0.1421 
## F-statistic: 9.182 on 5 and 242 DF,  p-value: 5.138e-08

The only t value that is statisitically significant is the group of participants with 6 to 11 years of education. The value for participants 6 to 11 years of education: parity is .02, which means this t value is statistically significant. By examining the P value one can determine if it is less than or equal to 0.05. If the p value is less than or equal to 0.05 then the t value is statistically significant.

Tables

library(texreg)
## Version:  1.36.23
## Date:     2017-03-03
## Author:   Philip Leifeld (University of Glasgow)
## 
## Please cite the JSS article in your publications -- see citation("texreg").
screenreg(list(m1, m2, m3, m4))
## 
## ==================================================================
##                          Model 1     Model 2  Model 3     Model 4 
## ------------------------------------------------------------------
## (Intercept)                0.19 *      0.42    -0.58 *      0.18  
##                           (0.09)      (0.21)   (0.25)      (0.45) 
## parity                     0.18 ***             0.23 ***    0.05  
##                           (0.04)               (0.04)      (0.10) 
## education6-11yrs                       0.13     0.63 **    -0.29  
##                                       (0.22)   (0.22)      (0.47) 
## education12+ yrs                       0.21     0.77 ***    0.03  
##                                       (0.22)   (0.23)      (0.47) 
## education6-11yrs:parity                                     0.26 *
##                                                            (0.11) 
## education12+ yrs:parity                                     0.17  
##                                                            (0.11) 
## ------------------------------------------------------------------
## R^2                        0.10        0.01     0.14        0.16  
## Adj. R^2                   0.09       -0.00     0.13        0.14  
## Num. obs.                248         248      248         248     
## RMSE                       0.70        0.73     0.68        0.68  
## ==================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05