Introduction

Using open source demographic data downloaded from the Province of New Brunswick (PNB) website https://www2.gnb.ca/content/dam/gnb/Departments/fin/pdf/esi/demographic-demographique/ComponentsofGrowth-ComposantesDeLaCroissance.xlsx, the goal of this exercise is to determine which has a bigger factor on population growth in New Brunswick - Natural Balance (births-deaths), Interprovincial Migration, or Immigration.

Data cleanup on the 48-year source dataset in Microsoft Excel, before saving as a CSV file (for data analysis in R), included editing to force all multi-row column headings to be only one row high, in English language text only, shortening some of the column headings, and then deleting the last row of the dataset which had only the starting population for the period 2019-20. There was no data key for the data variables with the document, or found anywhere on the PNB website.

Data Frame in R: NB_pop

names(NB_pop)
##  [1] "Period"                  "Population_begin_period"
##  [3] "Births"                  "Deaths"                 
##  [5] "Interprov_migration_In"  "Interprov_migration_Out"
##  [7] "Interprov_migration_Net" "Immigrants"             
##  [9] "Emigrants"               "Net_NPR"                
## [11] "Residual_deviation"      "Total_growth"

The column headings/variables in the dataset are mostly self-explanatory: Period (July 1 to June 30) , Population_begin_period, Births, Deaths, Interprov_migration_In, Interprov_migration_Out, Interprov_migration_Net, Immigrants, Emigrants.

However a few variable names needed further online research on population change measures: Net_NPR is the Net population of Non-Permanent Residents (Permanent Residents are included in the population); Residual_deviation is “obtained by distributing the error of closure linearly throughout the intercensal period. The error of closure is defined as the difference between the postcensal population estimates on Census Day and the population enumerated in that census adjusted for census net undercoverage and incompletely enumerated indian reserves” (from Statistics Canada https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000801 footnote 15).

Total Growth (per period) is the sum of: Births minus Deaths, Interprov_migration_Net, Immigrants minus Emigrants, Net_NPR and Residual_deviation.

Variables that will need to be added to the dataset are Natural_Balance (births minus deaths) and International_Migration (immigration minus emigration). This idea is adapted from a Brandon University document (https://www.brandonu.ca/rdi/files/2014/09/Components-of-Population-Change1.pdf)

Initial Analysis of the Data

We can see in the above plot that the population of New Brunswick has decreased several times on an annual basis since 1997.

The Variables

In the dataset, four variables are considered independent variables – Net Interprovincial Migration, Natural Balance, Net Immigration and Net Non-Permanent Residents. The dependent variable is the Total Annual Population Change. One variable, Residual Deviation, is not being included in the analysis as it is more or less a “fudge factor” used to make the population accounting “balance”.

We note in the above plot that in most years more New Brunswickers move out of the province than other Canadians moving to New Brunswick.

We note in the above plot that immigration to New Brunswick jumped dramatically starting in 2006-07.

We note above that the Natural Balance in New Brunswick went into the negative during the period 2014-15. In other words, since then more people are dying each year in New Brunswick than being born.

The number of Non-Permanent Residents took a huge jump in the last year of the dataset, which might be attributed to the arrival of Syrian refugees in New Brunswick.

Statistical Analysis

summary(NB_pop)
##     Period          Population_begin_period     Births          Deaths    
##  Length:48          Min.   :642471          Min.   : 6550   Min.   :5000  
##  Class :character   1st Qu.:712996          1st Qu.: 7121   1st Qu.:5286  
##  Mode  :character   Median :747467          Median : 8534   Median :5886  
##                     Mean   :731113          Mean   : 8797   Mean   :5907  
##                     3rd Qu.:750687          3rd Qu.:10406   3rd Qu.:6316  
##                     Max.   :770921          Max.   :12047   Max.   :7822  
##  Interprov_migration_In Interprov_migration_Out Interprov_migration_Net
##  Min.   : 8517          Min.   : 9702           Min.   :-4989.0        
##  1st Qu.:10704          1st Qu.:11662           1st Qu.:-1907.5        
##  Median :11674          Median :12632           Median : -875.5        
##  Mean   :12844          Mean   :13508           Mean   : -663.9        
##  3rd Qu.:13856          3rd Qu.:15079           3rd Qu.:  222.8        
##  Max.   :24072          Max.   :19806           Max.   : 6037.0        
##    Immigrants       Emigrants        Net_NPR       Residual_deviation
##  Min.   : 558.0   Min.   :183.0   Min.   :-249.0   Min.   :-1222.0   
##  1st Qu.: 686.2   1st Qu.:313.2   1st Qu.:  -6.5   1st Qu.:    0.0   
##  Median : 878.5   Median :478.0   Median : 100.0   Median :  891.5   
##  Mean   :1429.6   Mean   :473.2   Mean   : 209.4   Mean   :  593.4   
##  3rd Qu.:1942.5   3rd Qu.:588.5   3rd Qu.: 362.0   3rd Qu.: 1145.0   
##  Max.   :5076.0   Max.   :830.0   Max.   :1752.0   Max.   : 1821.0   
##   Total_growth    
##  Min.   :-2436.0  
##  1st Qu.:  205.8  
##  Median : 2625.5  
##  Mean   : 2799.1  
##  3rd Qu.: 4576.0  
##  Max.   :12486.0

Additional Variables:

Net_Immigration = Immigrants - Emigrants
Natural_Balance = Births - Deaths

Linear Regression including all four independent variables over Period:

summary(lm.pop)
## 
## Call:
## lm(formula = Total_growth ~ Natural_Balance + Net_Immigration + 
##     Interprov_migration_Net + Net_NPR, data = NB_pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -705.65 -413.61 -159.15   83.75 1573.02 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -374.93624  259.42935  -1.445   0.1556    
## Natural_Balance            0.83366    0.05239  15.912  < 2e-16 ***
## Net_Immigration            1.27104    0.11136  11.414 1.34e-14 ***
## Interprov_migration_Net    0.87846    0.04948  17.752  < 2e-16 ***
## Net_NPR                    0.62960    0.35040   1.797   0.0794 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 620.1 on 43 degrees of freedom
## Multiple R-squared:  0.9674, Adjusted R-squared:  0.9644 
## F-statistic: 319.1 on 4 and 43 DF,  p-value: < 2.2e-16

Linear Regression for Natural Balance over Period:

summary(lm.pop1)
## 
## Call:
## lm(formula = Total_growth ~ Natural_Balance, data = NB_pop)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -4556  -1733  -1065   1540   6818 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     577.9260   611.8060   0.945     0.35    
## Natural_Balance   0.7684     0.1622   4.737 2.11e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2723 on 46 degrees of freedom
## Multiple R-squared:  0.3279, Adjusted R-squared:  0.3132 
## F-statistic: 22.44 on 1 and 46 DF,  p-value: 2.114e-05

Linear Regression for Net Immigration over Period:

summary(lm.pop2)
## 
## Call:
## lm(formula = Total_growth ~ Net_Immigration, data = NB_pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4941.2 -2221.7  -754.4  1408.5  9195.1 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     2097.0088   603.3565   3.476  0.00112 **
## Net_Immigration    0.7341     0.4045   1.815  0.07605 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3208 on 46 degrees of freedom
## Multiple R-squared:  0.06683,    Adjusted R-squared:  0.04654 
## F-statistic: 3.294 on 1 and 46 DF,  p-value: 0.07605

Linear Regression for Interprovincial Migration over Period:

summary(lm.pop3)
## 
## Call:
## lm(formula = Total_growth ~ Interprov_migration_Net, data = NB_pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2694.1 -1422.7  -138.5  1408.7  3133.2 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             3676.0565   253.3528   14.51  < 2e-16 ***
## Interprov_migration_Net    1.3209     0.1138   11.61 2.89e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1675 on 46 degrees of freedom
## Multiple R-squared:  0.7455, Adjusted R-squared:  0.7399 
## F-statistic: 134.7 on 1 and 46 DF,  p-value: 2.889e-15

Linear Regression for Net Non-Permanent Resident over Period:

summary(lm.pop4)
## 
## Call:
## lm(formula = Total_growth ~ Net_NPR, data = NB_pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5232.0 -2591.5  -176.8  1757.1  9694.5 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.792e+03  5.558e+02   5.023 8.13e-06 ***
## Net_NPR     3.353e-02  1.344e+00   0.025     0.98    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3321 on 46 degrees of freedom
## Multiple R-squared:  1.353e-05,  Adjusted R-squared:  -0.02173 
## F-statistic: 0.0006225 on 1 and 46 DF,  p-value: 0.9802

Limit to most recent 10 years of data

NB_pop10 <- NB_pop[39:48,]

Additional Variables:

Net_Immigration = Immigrants - Emigrants
Natural_Balance = Births - Deaths

Linear Regression including all four independent variables over last 10 years:

summary(lm.pop10)
## 
## Call:
## lm(formula = Total_growth ~ Natural_Balance + Net_Immigration + 
##     Interprov_migration_Net + Net_NPR, data = NB_pop10)
## 
## Residuals:
##       1       2       3       4       5       6       7       8       9      10 
## -172.04  -21.39  402.62 -114.56 -175.00   70.53  -20.51  103.85  -24.02  -49.47 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -1.110e+03  5.208e+02  -2.130 0.086376 .  
## Natural_Balance          1.326e+00  1.914e-01   6.929 0.000961 ***
## Net_Immigration          1.533e+00  2.326e-01   6.589 0.001209 ** 
## Interprov_migration_Net  5.892e-01  5.466e-02  10.778 0.000119 ***
## Net_NPR                  5.762e-01  3.283e-01   1.755 0.139581    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 225.9 on 5 degrees of freedom
## Multiple R-squared:  0.9929, Adjusted R-squared:  0.9872 
## F-statistic: 174.8 on 4 and 5 DF,  p-value: 1.48e-05

Linear Regression for Natural Balance over last 10 years:

summary(lm.pop10_1)
## 
## Call:
## lm(formula = Total_growth ~ Natural_Balance, data = NB_pop10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3405.7 -1329.1   597.2  1365.7  2044.3 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     2627.8105   609.6689   4.310  0.00258 **
## Natural_Balance   -0.9816     0.7489  -1.311  0.22631   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1923 on 8 degrees of freedom
## Multiple R-squared:  0.1768, Adjusted R-squared:  0.0739 
## F-statistic: 1.718 on 1 and 8 DF,  p-value: 0.2263

Linear Regression for Net Immigration over last 10 years:

summary(lm.pop10_2)
## 
## Call:
## lm(formula = Total_growth ~ Net_Immigration, data = NB_pop10)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2231.29  -913.78    78.73   869.10  1854.99 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     -1048.9848  1156.5059  -0.907  0.39089   
## Net_Immigration     1.3603     0.3919   3.471  0.00843 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1339 on 8 degrees of freedom
## Multiple R-squared:  0.601,  Adjusted R-squared:  0.5511 
## F-statistic: 12.05 on 1 and 8 DF,  p-value: 0.008426

Linear Regression for Interprovincial Migration over last 10 years:

summary(lm.pop10_3)
## 
## Call:
## lm(formula = Total_growth ~ Interprov_migration_Net, data = NB_pop10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1229.4  -874.7  -180.0   595.5  1875.4 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             3740.1800   439.9795   8.501 2.81e-05 ***
## Interprov_migration_Net    0.9952     0.2299   4.328  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1159 on 8 degrees of freedom
## Multiple R-squared:  0.7007, Adjusted R-squared:  0.6633 
## F-statistic: 18.73 on 1 and 8 DF,  p-value: 0.002518

Linear Regression for Net Non-Permanent Residents over last 10 years:

summary(lm.pop10_4)
## 
## Call:
## lm(formula = Total_growth ~ Net_NPR, data = NB_pop10)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1695.5  -615.8   241.6   423.3  1806.4 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1140.3653   465.4316   2.450  0.03993 * 
## Net_NPR        3.0750     0.6357   4.838  0.00129 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1070 on 8 degrees of freedom
## Multiple R-squared:  0.7452, Adjusted R-squared:  0.7134 
## F-statistic:  23.4 on 1 and 8 DF,  p-value: 0.001292

Conclusion

In the period 1971-2019 Interprovincial Migration would appear to be the most influential factor in the yearly change in population in New Brunswick. Of the three variables investigated, Interprov_migration_Net has the largest intercept value, 3676.0565; lowest standard error, 253.3528; largest F-statistic, 134.7; largest t-value, 11.61; and smallest p-value, 2.89e-15.

If we limit the period examined to the most recent ten (10) years of data, 2009-10 to 2018-19, Interprovincial Migration would still appear to be the most influential factor in the yearly change in population in New Brunswick. Interprov_migration_Net has the largest intercept value, 3740.1800; lowest standard error, 439.9795; second largest F-statistic, 18.73; largest t-value, 11.61; and smallest p-value, 0.002518.