Introduction:

The purpose of this project was to utilize my understanding of using posit-cloud R software to analyze a dataset that is provided. The dataset in this project talks about the population growth in the United States from 1790 to 1850. A vector dataset was provided of 7 values corresponding with years 1790, 1800, 1810, 1820, 1830, 1840, and 1850. The “years” dataset was denoted in terms of 0, 10, 20, 30, 40, 50, and 60, where 0 represents the year 1790, 10 represents the year 1800 and so on. This notation is necessary, because using actual years will explode the exponent and will cause errors in the analysis. Now, the other vector dataset that was provided was population in units of millions. It was important to note that the vector data values for population corresponded directly with the years. For example, the first population value of 3.929 million corresponds with the 0 value for the “years” dataset, which is technically the year 1790. With the two datasets provided, I read and created a plot with appropriate labels.

# 1. Provided Data: US population growth (in millions) from 1790 to 1850
year <- c(0, 10, 20, 30, 40, 50, 60)  # 1790 corresponds to year 0
pop <- c(3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)  # Population in millions

# 2. Plot data with labels
plot(year, pop, 
     main = "US Population Growth (1790 to 1850)", 
     xlab = "Year (1790-1850)", 
     ylab = "Population (millions)", 
     pch = 20, col = "purple")

Methods:

After the plot was created, a known R-function was also utilized which is called the “non-linear least squares” (nls2) to further analyze the data and plot the binding curve. This nls2 function is an enhanced version of the nls function t hat allows for more flexible fitting of nonlinear models particularly when the initial parameter guesses are uncertain. In this dataset, the nls2 function is used to fit an exponential growth model for U.S. population data from the years 1790 to 1850. Using this function, I was able to also establish a summary fit of the exponential growth model line. The initial population estimate “pop(ini)” and the growth rate “k” were determined due to nls2 function.

The dataset represents population growth which was modeled using the exponential function given:

“pop = pop_ini * exp (k*year)”

pop = population at a given year
pop_ini = initial population (parameter I want to estimate)
k = growth rate (parameter I want to estimate)
year = time in years since 1790 (where 1790 is year 0)

Overall, the nls2 function provides a more flexible and robust approach in fitting nonlinear models especially when initial guesses are uncertain. The nls2 function ensures a thorough exploration of parameter values which also improves the chances of finding an accurate fit for the exponential growth model.

# 1. Provided Data: US population growth (in millions) from 1790 to 1850
year <- c(0, 10, 20, 30, 40, 50, 60)  # 1790 corresponds to year 0
pop <- c(3.929, 5.308, 7.240, 9.638, 12.866, 17.069, 23.192)  # Population in millions

# 2. Plot data with labels
plot(year, pop, 
     main = "US Population Growth (1790 to 1850)", 
     xlab = "Year (1790-1850)", 
     ylab = "Population (millions)", 
     pch = 20, col = "purple")

# Load the nls2 library
library(nls2)

## Loading required package: proto

# 3. Fit exponential model using nls2 (nonlinear least squares)
# Exponential growth model: pop = pop_ini * exp(k * year)
fit <- nls2(pop ~ pop_ini * exp(k * year), 
            start = list(pop_ini = 3.9, k = 0.05))

# Extract fitted coefficients
coeffs <- coef(fit)
pop_ini <- coeffs["pop_ini"]
k <- coeffs["k"]

# Display model summary
summary(fit)

## 
## Formula: pop ~ pop_ini * exp(k * year)
## 
## Parameters:
##          Estimate Std. Error t value Pr(>|t|)    
## pop_ini 3.9744064  0.0407277   97.58 2.14e-09 ***
## k       0.0293421  0.0002023  145.02 2.96e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09817 on 5 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 5.082e-08

# 4. Plot fitted curve with the data
curve(pop_ini * exp(k * x), from = 0, to = 60, add = TRUE, col = "black", lwd = 2)

# Add legend
legend("topleft", legend = c("Observed Data", "Exponential Fit"), 
       col = c("purple", "black"), pch = c(20, NA), lwd = c(NA, 2))

Results:

Lastly, after plotting the fit curve with the data, a legend was created in the plot to illustrate what the points represent and what the line of fit represents. After running the nls2 function and extracting the parameters for initial population (pop(ini)) and growth rate (k), the obtained values were:

pop_ini = 3.9744064 (in millions)
k = 0.0293421

Interpreting these results, the value for pop_ini represents the estimated population at year 0 which corresponds to the year 1790. The actual recorded population given by the original dataset in 1790 was 3.929 million. The model’s estimate of 3.974 million is close to the actual population of 3.929 million. The slight difference might suggest that the exponential model approximates the initial population well though there might be minor errors or deviations due to the simplicity of the model.

For the growth rate, a value of k = 0.0293421 was obtained which is an approximately 2.93% annual growth rate. The growth rate describes how quickly the population increases exponentially over time. My obtained values of 2.93% per year means that the population grows by approximately 2.93% per year. This consistent rate indicates that the population was expanding at a steady rate during this period. Over a 60 year period from 1790 to 1850, such a growth rate results in a significant increase in population.

Using the estimated parameter from the nls2 function, the exponential growth model can be written as:

pop = (3.9744064) * e^(0.0293421t)

The fitted curve on the plot captures the general increasing trend of the population growth data set well. The exponential growth model assumes a continuous growth which works well with the dataset provided since it is a population growth, and usually in a normally functioning society, population increases throughout the years.

Overall, the analysis of this dataset was fairly easy due to my understanding of the functions of R script based on the previous projects completed throughout the semester in CHEM 243.

Final Programming Project (CHEM 243)

Shakhriyor Djuraev

2024-12-14

Introduction:

Methods:

Results: