Final Part 1: Programming Project (Take Home)

US Population Growth (1790-1850)

Introduction

This project explores the growth of the U.S. population between 1790 and 1850 using non-linear regression (with brute force) and an exponential growth model. By analyzing population data over this period, key parameters of exponential growth were determined. This helped to see if the model was accurate in its predictions.

Looking at our x & y Data Sets

Our class was provided with a data set representing U.S. population growth (in millions) for the years 1790 through 1850. I set the year 1790 as year 0 to prevent problems with the exponent.

# Years (1790 to 1850)
year <- c(0, 10, 20, 30, 40, 50, 60)

# Population in millions
pop <- c(3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)

Plotting Our Data

The population data is plotted to observe trends in growth over the given time period.

# Plot the data
plot(year, pop, xlab = "Year (1790-1850)", ylab = "Population (millions)", 
     main = "US Population Growth from 1790 to 1850)", pch = 16)

Non-Linear Regression: Exponential Growth Model

We use non-linear regression (with brute force) to fit an exponential growth model to the data. Nonlinear regression with brute force means testing a lot of different values to find the best fit for a nonlinear model. This is different from using advanced methods.The model is defined as:

\[ pop = pop_{ini} \cdot e^{k \cdot year} \]

where: - \(pop_{ini}\) is the initial population size in 1790 (year 0). - \(k\) is the growth rate.

library(nls2)

## Loading required package: proto

# Exponential function for population growth
exp_model <- nls2(pop ~ pop_ini * exp(k * year), 
                  start = list(pop_ini = 3.9744064, k = 0.0293421))

# Model summary
summary(exp_model)

## 
## Formula: pop ~ pop_ini * exp(k * year)
## 
## Parameters:
##          Estimate Std. Error t value Pr(>|t|)    
## pop_ini 3.9744064  0.0407277   97.58 2.14e-09 ***
## k       0.0293421  0.0002023  145.02 2.96e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09817 on 5 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 7.207e-08

Predicting Population Values

Using the fitted model, I was able to predict population values and graph the exponential growth curve with the original data still shown.

# Predict population values
pred_pop <- predict(exp_model, newdata = data.frame(year = year))

# Plot the fitted exponential curve
plot(year, pop, xlab = "Year (1790-1850)", ylab = "Population (millions)", 
     main = "US Population Growth from 1790 to 1850", pch = 16)
lines(year, pred_pop, col = "darkgreen", lwd = 2)

# Annotate the plot
text(50, 10, paste("pop_ini =", round(coef(exp_model)["pop_ini"], 3)), col = "darkred")

text(50, 8, paste("k =", round(coef(exp_model)["k"], 3)), col = "darkred")

Model Parameters

The model estimates the following parameters: - \(pop_{ini}\): The estimated initial population size in 1790. - \(k\): The growth rate, which defines the rate of exponential increase.

# Display estimated parameters
params <- coef(exp_model)
params

##    pop_ini          k 
## 3.97440643 0.02934212

These parameters are annotated on the plot, as shown in the figure above.

Conclusion

The curve generated by the exponential model fits the data well, showing the trend of fast population growth during this period. Using nls2 was effective because it helped fit complex population growth patterns whereas other methods might not function as well. The \(pop_{ini}\) parameter provides a baseline population size, while the \(k\) value represents the exponential growth rate, showing historical patterns of population growth. The exponential model also accurately represents U.S. population growth from 1790 to 1850, showing how non-linear regression is useful in studying demographics. Future research could look at longer periods or other factors affecting growth.