Polynomial Regression

1 Biodiversity 2020 Indicators from England’s wildlife and ecosystem services

##    Year Hymenoptera     Moths
## 1  1970          NA 100.00000
## 2  1971          NA  97.40308
## 3  1972          NA  93.46015
## 4  1973          NA  90.20746
## 5  1974          NA  88.21933
## 6  1975          NA  89.98069
## 7  1976          NA  91.79424
## 8  1977          NA  87.66070
## 9  1978          NA  86.17904
## 10 1979          NA  85.04052
## 11 1980   100.00000  83.04061
## 12 1981    96.06412  81.25337
## 13 1982    94.50884  83.94502
## 14 1983    97.25261  85.60653
## 15 1984    99.61304  85.25475
## 16 1985   100.68708  81.65969
## 17 1986    98.32698  79.08266
## 18 1987    96.20649  77.54394
## 19 1988    94.51686  75.99746
## 20 1989    93.38365  76.72467
## 21 1990    95.23808  77.15220
## 22 1991    95.65064  75.93391
## 23 1992    98.79097  74.25514
## 24 1993   100.70310  71.61099
## 25 1994   101.82695  70.87709
## 26 1995   102.83995  73.25683
## 27 1996   105.26216  72.55038
## 28 1997   104.48161  72.08166
## 29 1998    98.12524  66.48195
## 30 1999    95.70980  64.81656
## 31 2000    95.08747  63.32317
## 32 2001    94.44489  64.34150
## 33 2002    90.79075  63.31262
## 34 2003    88.91001  64.03736
## 35 2004    87.97619  62.93375
##  [ reached 'max' / getOption("max.print") -- omitted 12 rows ]

1.1 Run a linear regression: Hymenoptera~Year

## 
## Call:
## lm(formula = Hymenoptera ~ Year, data = mydataindex)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.113 -3.839 -1.201  3.006 11.060 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1152.05540  145.93978   7.894 2.79e-09 ***
## Year          -0.52999    0.07304  -7.256 1.79e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.744 on 35 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.6007, Adjusted R-squared:  0.5893 
## F-statistic: 52.65 on 1 and 35 DF,  p-value: 1.791e-08

1.2 Run a Polynomial regression: Hymenoptera~ Year + I(Year^2)

## 
## Call:
## lm(formula = Hymenoptera ~ Year + I(Year^2), data = mydataindex)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6122 -2.5352 -0.1891  1.7453  7.4466 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.300e+05  2.135e+04  -6.088 6.62e-07 ***
## Year         1.307e+02  2.137e+01   6.117 6.07e-07 ***
## I(Year^2)   -3.285e-02  5.349e-03  -6.142 5.64e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.314 on 34 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.8107, Adjusted R-squared:  0.7996 
## F-statistic: 72.81 on 2 and 34 DF,  p-value: 5.143e-13

1.4 Conclusion

Both the 2nd order of Polynomial regression and linear regression can not model the relationship of Hymenoptera and Year correctly.

2 Sanitation and Drinking Water Conditions

  • Dataset downloaded from environment for development (http://geodata.grid.unep.ch/)
  • Variable Definition:
    • Total population for 114 countries from 1990 - 2012 of :
    • Sanitation:= Mean improved sanitation conditions
    • Water:= mean improved drinking water conditions in % o. Data from United Nations Environment
##    Year Sanitation Water
## 1  1990      44.81 73.06
## 2  1991      46.22 74.49
## 3  1992      47.39 75.32
## 4  1993      48.42 76.12
## 5  1994      50.24 78.16
## 6  1995      51.13 78.89
## 7  1996      52.65 79.62
## 8  1997      53.58 80.39
## 9  1998      54.47 81.10
## 10 1999      55.59 81.81
## 11 2000      56.38 82.52
## 12 2001      57.25 83.23
## 13 2002      58.18 83.82
## 14 2003      59.06 84.52
## 15 2004      59.91 85.19
## 16 2005      60.80 85.86
## 17 2006      61.60 86.51
## 18 2007      62.45 87.14
## 19 2008      62.79 87.43
## 20 2009      62.65 87.62
## 21 2010      63.43 88.22
## 22 2011      63.81 88.62
## 23 2012      64.15 88.96

2.1 Run a linear regression: Sanitation ~ Year

## 
## Call:
## lm(formula = Sanitation ~ Year, data = mydatasant)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0363 -0.9198  0.7135  0.8483  0.9817 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.726e+03  6.554e+01  -26.33   <2e-16 ***
## Year         8.906e-01  3.275e-02   27.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.042 on 21 degrees of freedom
## Multiple R-squared:  0.9724, Adjusted R-squared:  0.9711 
## F-statistic: 739.4 on 1 and 21 DF,  p-value: < 2.2e-16

2.2 Run a Polynomial regression: Sanitation ~ Year+I(Year^2)

## 
## Call:
## lm(formula = Sanitation ~ Year + I(Year^2), data = mydatasant)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3688 -0.1215 -0.0580  0.1398  0.5185 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.010e+05  4.850e+03  -20.82 5.01e-15 ***
## Year         1.001e+02  4.847e+00   20.65 5.87e-15 ***
## I(Year^2)   -2.479e-02  1.211e-03  -20.46 6.97e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2279 on 20 degrees of freedom
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9986 
## F-statistic:  7934 on 2 and 20 DF,  p-value: < 2.2e-16

2.4 Run a linear regression: Water ~ Year

## 
## Call:
## lm(formula = Water ~ Year, data = mydatasant)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7268 -0.7306  0.4900  0.5877  0.6822 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.329e+03  5.077e+01  -26.18   <2e-16 ***
## Year         7.055e-01  2.537e-02   27.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8071 on 21 degrees of freedom
## Multiple R-squared:  0.9736, Adjusted R-squared:  0.9723 
## F-statistic: 773.4 on 1 and 21 DF,  p-value: < 2.2e-16

2.5 Run a Polynomial regression: Water ~ Year+I(Year^2)

## 
## Call:
## lm(formula = Water ~ Year + I(Year^2), data = mydatasant)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40834 -0.16353  0.01231  0.08678  0.64476 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.643e+04  5.248e+03  -14.56 4.13e-12 ***
## Year         7.577e+01  5.245e+00   14.45 4.80e-12 ***
## I(Year^2)   -1.876e-02  1.311e-03  -14.31 5.70e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2467 on 20 degrees of freedom
## Multiple R-squared:  0.9976, Adjusted R-squared:  0.9974 
## F-statistic:  4243 on 2 and 20 DF,  p-value: < 2.2e-16

2.7 Conclusion

The 2nd order of Polynomial regression is better than linear regression to model the relationship of Sanitation and Year, and the relationship of Water and Year respectively.

DK WC

2020-01-22