Option 1 - Logistic regression

Dependent: male/female (dummy)

Predictors: latitude (continuous) + age + age latitude + date*

Sub-option 1.1: raw date included in the model (JULIAN DAY):

## glm(formula = SxDum ~ NewLat + Age + Age:NewLat + Jul, family = binomial(link = "probit"), 
##     data = ACROLA_comb)
##                Estimate Std. Error    z value   Pr(>|z|)
## (Intercept) -1.90858637 2.63566039 -0.7241397 0.46897996
## NewLat      -0.05067702 0.04125648 -1.2283410 0.21931900
## AgeI        -0.92134141 2.16038742 -0.4264705 0.66976507
## Jul          0.01607116 0.00702242  2.2885498 0.02210552
## NewLat:AgeI  0.02074509 0.04837955  0.4287987 0.66806971
## Analysis of Deviance Table
## 
## Model: binomial, link: probit
## 
## Response: SxDum
## 
## Terms added sequentially (first to last)
## 
## 
##            Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                         465     584.06           
## NewLat      1   5.7345       464     578.33  0.01663 *
## Age         1   1.0150       463     577.31  0.31372  
## Jul         1   5.1987       462     572.11  0.02260 *
## NewLat:Age  1   0.1851       461     571.93  0.66707  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Assesing the model fit:

##           index.orig training    test optimism index.corrected    n
## Dxy           0.1917   0.2108  0.1693   0.0415          0.1502 1000
## R2            0.0360   0.0468  0.0282   0.0186          0.0174 1000
## Intercept     0.0000   0.0000 -0.1325   0.1325         -0.1325 1000
## Slope         1.0000   1.0000  0.8196   0.1804          0.8196 1000
## Emax          0.0000   0.0000  0.0682   0.0682          0.0682 1000
## D             0.0239   0.0319  0.0182   0.0137          0.0102 1000
## U            -0.0043  -0.0043  0.0012  -0.0055          0.0012 1000
## Q             0.0282   0.0362  0.0170   0.0192          0.0090 1000
## B             0.2118   0.2098  0.2141  -0.0044          0.2162 1000
## g             0.3989   0.4451  0.3460   0.0991          0.2998 1000
## gp            0.0853   0.0938  0.0741   0.0196          0.0657 1000

## 
## n=466   Mean absolute error=0.012   Mean squared error=0.00019
## 0.9 Quantile of absolute error=0.019

The most insignificat factors: Age + Age:NewLat (P>0.6), excluded from the model:

## glm(formula = SxDum ~ NewLat + Jul, family = binomial(link = "probit"), 
##     data = ACROLA_comb)
##                Estimate  Std. Error   z value   Pr(>|z|)
## (Intercept) -2.54825944 1.999494107 -1.274452 0.20250324
## NewLat      -0.03597969 0.022183095 -1.621942 0.10481580
## Jul          0.01604409 0.006432541  2.494207 0.01262389
## Analysis of Deviance Table
## 
## Model: binomial, link: probit
## 
## Response: SxDum
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                     465     584.06           
## NewLat  1   5.7345       464     578.33  0.01663 *
## Jul     1   6.2132       463     572.11  0.01268 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Assesing the model fit:

##           index.orig training    test optimism index.corrected    n
## Dxy           0.1942   0.2038  0.1857   0.0182          0.1760 1000
## R2            0.0355   0.0415  0.0324   0.0092          0.0263 1000
## Intercept     0.0000   0.0000 -0.0161   0.0161         -0.0161 1000
## Slope         1.0000   1.0000  0.9807   0.0193          0.9807 1000
## Emax          0.0000   0.0000  0.0071   0.0071          0.0071 1000
## D             0.0235   0.0281  0.0213   0.0068          0.0167 1000
## U            -0.0043  -0.0043  0.0002  -0.0045          0.0002 1000
## Q             0.0278   0.0324  0.0211   0.0113          0.0165 1000
## B             0.2119   0.2108  0.2133  -0.0025          0.2144 1000
## g             0.3947   0.4154  0.3752   0.0402          0.3546 1000
## gp            0.0848   0.0886  0.0807   0.0079          0.0769 1000

## 
## n=466   Mean absolute error=0.005   Mean squared error=6e-05
## 0.9 Quantile of absolute error=0.013

Sub-option 1.2: latitud-adjusted date included in the model (DAY OF MIGRATION):

## glm(formula = SxDum ~ NewLat + Age + Age:NewLat + Dmigr, family = binomial(link = "probit"), 
##     data = ACROLA_comb)
##                  Estimate  Std. Error     z value  Pr(>|z|)
## (Intercept) -1.776157e+01 88.31797478 -0.20110930 0.8406131
## NewLat       4.135861e-01  2.09674843  0.19725118 0.8436310
## AgeI         2.904688e+01 91.15290801  0.31866106 0.7499835
## Dmigr       -3.988923e-04  0.01551108 -0.02571661 0.9794834
## NewLat:AgeI -6.870071e-01  2.16530153 -0.31728010 0.7510311
## Analysis of Deviance Table
## 
## Model: binomial, link: probit
## 
## Response: SxDum
## 
## Terms added sequentially (first to last)
## 
## 
##            Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL                         103     138.59         
## NewLat      1  0.87327       102     137.71   0.3501
## Age         1  0.21721       101     137.50   0.6412
## Dmigr       1  0.00415       100     137.49   0.9487
## NewLat:Age  1  0.10101        99     137.39   0.7506

Assesing the model fit:

##           index.orig training    test optimism index.corrected    n
## Dxy           0.1917   0.2135  0.1709   0.0426          0.1490 1000
## R2            0.0360   0.0478  0.0286   0.0192          0.0168 1000
## Intercept     0.0000   0.0000 -0.1330   0.1330         -0.1330 1000
## Slope         1.0000   1.0000  0.8190   0.1810          0.8190 1000
## Emax          0.0000   0.0000  0.0685   0.0685          0.0685 1000
## D             0.0239   0.0327  0.0185   0.0142          0.0097 1000
## U            -0.0043  -0.0043  0.0015  -0.0058          0.0015 1000
## Q             0.0282   0.0370  0.0171   0.0200          0.0082 1000
## B             0.2118   0.2095  0.2141  -0.0046          0.2164 1000
## g             0.3989   0.4502  0.3487   0.1014          0.2975 1000
## gp            0.0853   0.0948  0.0747   0.0200          0.0653 1000

## 
## n=466   Mean absolute error=0.013   Mean squared error=2e-04
## 0.9 Quantile of absolute error=0.019

The most insignificat factors: Age + Age:NewLat (P>0.6), excluded from the model:

## glm(formula = SxDum ~ NewLat + Dmigr, family = binomial(link = "probit"), 
##     data = ACROLA_comb)
##                 Estimate  Std. Error    z value  Pr(>|z|)
## (Intercept)  9.932094214 18.06000134  0.5499498 0.5823538
## NewLat      -0.243902229  0.42876487 -0.5688484 0.5694590
## Dmigr        0.002214197  0.01358626  0.1629733 0.8705394
## Analysis of Deviance Table
## 
## Model: binomial, link: probit
## 
## Response: SxDum
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL                     103     138.59         
## NewLat  1  0.87327       102     137.71   0.3501
## Dmigr   1  0.02606       101     137.69   0.8718

Assesing the model fit:

##           index.orig training    test optimism index.corrected    n
## Dxy           0.1942   0.2037  0.1858   0.0179          0.1762 1000
## R2            0.0355   0.0415  0.0324   0.0090          0.0264 1000
## Intercept     0.0000   0.0000  0.0109  -0.0109          0.0109 1000
## Slope         1.0000   1.0000  1.0033  -0.0033          1.0033 1000
## Emax          0.0000   0.0000  0.0029   0.0029          0.0029 1000
## D             0.0235   0.0280  0.0213   0.0067          0.0168 1000
## U            -0.0043  -0.0043 -0.0001  -0.0042         -0.0001 1000
## Q             0.0278   0.0323  0.0214   0.0109          0.0169 1000
## B             0.2119   0.2102  0.2132  -0.0031          0.2150 1000
## g             0.3947   0.4158  0.3755   0.0403          0.3544 1000
## gp            0.0848   0.0885  0.0808   0.0077          0.0770 1000

## 
## n=466   Mean absolute error=0.005   Mean squared error=5e-05
## 0.9 Quantile of absolute error=0.011

Option 2 - GLM

Dependent: male/total for the site (proportion) Predictors: latitude (continuous) + age + age:latitude

Testing the model assumptions:

## 
## Call:
## lm(formula = Mprop ~ NewLat + Age + NewLat:Age, data = ACROLA_prop_total)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17250 -0.06567 -0.02505  0.04498  0.28136 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.175533   0.602139  -0.292    0.774
## NewLat       0.019656   0.013237   1.485    0.154
## AgeI        -0.121797   0.790632  -0.154    0.879
## NewLat:AgeI  0.001468   0.017311   0.085    0.933
## 
## Residual standard error: 0.1138 on 19 degrees of freedom
## Multiple R-squared:  0.2611, Adjusted R-squared:  0.1445 
## F-statistic: 2.238 on 3 and 19 DF,  p-value: 0.1168

Deleting the outlier [both NewLat (AI) records associated with 19], and re-testing the model fit:

## 
## Call:
## lm(formula = Mprop ~ NewLat + Age + NewLat:Age, data = ACROLA_prop_total[-c(10, 
##     11), ])
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156678 -0.053237 -0.009386  0.044282  0.164771 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.190791   0.412044  -0.463    0.649  
## NewLat       0.019303   0.009058   2.131    0.048 *
## AgeI        -0.162599   0.541339  -0.300    0.768  
## NewLat:AgeI  0.002634   0.011849   0.222    0.827  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07788 on 17 degrees of freedom
## Multiple R-squared:  0.4462, Adjusted R-squared:  0.3485 
## F-statistic: 4.566 on 3 and 17 DF,  p-value: 0.01602
## Analysis of Variance Table
## 
## Response: Mprop
##            Df   Sum Sq  Mean Sq F value   Pr(>F)   
## NewLat      1 0.073543 0.073543 12.1261 0.002852 **
## Age         1 0.009239 0.009239  1.5234 0.233902   
## NewLat:Age  1 0.000300 0.000300  0.0494 0.826751   
## Residuals  17 0.103102 0.006065                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The insignificat factors: Age + Age:NewLat (P>0.2), excluded from the model (data set as for the model with all predictors included - doble representation for each location!):

## [1] "Testing the model assumptions"

## 
## Call:
## lm(formula = Mprop ~ NewLat, data = ACROLA_prop_total[-c(10, 
##     11), ])
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.168737 -0.050373  0.007571  0.042149  0.150044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.258864   0.263363  -0.983  0.33800   
## NewLat       0.020271   0.005755   3.522  0.00228 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.077 on 19 degrees of freedom
## Multiple R-squared:  0.395,  Adjusted R-squared:  0.3632 
## F-statistic: 12.41 on 1 and 19 DF,  p-value: 0.002278

The insignificat factors: Age + Age:NewLat (P>0.2), excluded from the model (rormated data set - single representation for each location!):

## 
## Call:
## lm(formula = Mprop ~ NewLat, data = ACROLA_Mprop_total)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.12318 -0.05643 -0.01042  0.02887  0.23383 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.059200   0.444054  -0.133    0.896
## NewLat       0.016311   0.009669   1.687    0.120
## 
## Residual standard error: 0.09864 on 11 degrees of freedom
## Multiple R-squared:  0.2055, Adjusted R-squared:  0.1333 
## F-statistic: 2.846 on 1 and 11 DF,  p-value: 0.1197

Deleting the outliers [NewLat: 12 and 22] re-testing the model fit:

## 
## Call:
## lm(formula = Mprop ~ NewLat, data = ACROLA_Mprop_total[-c(6, 
##     11), ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08223 -0.02779 -0.01716  0.03837  0.08054 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 0.011741   0.244527   0.048   0.9628  
## NewLat      0.014051   0.005341   2.631   0.0273 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05326 on 9 degrees of freedom
## Multiple R-squared:  0.4347, Adjusted R-squared:  0.3719 
## F-statistic: 6.921 on 1 and 9 DF,  p-value: 0.02733