Estimate Variances of Model Parameters Using Perturbed SSE Curve Fitting (PSCF) Method
Client: Prof. Robert Hayes (Department of Nuclear Engineering, NCSU)
Team: Chien-Lan Hsueh (Department of Statistics, NCSU)
Introduction
A new algorithm has been developed by Prof. Hayes in order to estimate variances of a model parameter based on perturbation of the best fitted parameter. This algorithm has been claimed to give a good assessment of the standard error of the model parameter. The obtained results have been published in several research articles.
This work is to reproduce the algorithm and compare it with the other common statistics methods including nonlinear regression models, bootstrap confidence intervals and the delta normality method.
Background
Prof. Hayes’ is an active faculty in the department of nuclear engineering at NCSU. His research of retrospective dosimetry (O’Mara and Hayes 2018) involves developing a new method to measure radiation dosages and analyze them to infer the actual radiation exposures. Although the traditional technique, which is a direct radiation measurement on the subjects, can provide an accurate depth profile measurement, it is costly and time consuming. Using the new method he developed (Hayes and O’Mara 2019), he can use forensic luminescence data that is high correlated to the actual radiation dosage and an industry standard Monte Carlo n-particle method (MCNP) (Brown et al. 2004) to fit the dose deposition profile and estimate the physical model parameter. Furthermore, with the obtained best fitted model parameter as a reference, a series of “perturbed” model fittings is then used in a normal curve fitting to estimate the variance of the parameter. A detailed description of this algorithm can be found in Dr. O’Mara’s Ph.D. dissertation (O’Mara 2020).
Perturbed SSE Curve Fitting (PSCF) Method
For convenience, we call this algorithm Perturbed SSE Curve Fitting (PSCF) method in this work.
The parameter is estimated by minimizing the sum square of errors (SSE) of the model prediction and measurement data.
By deviating from the best fitted value, the sum square of errors is expressed as a function of the parameter and has a convex “hyperbolic” curve.
After this curve is “flipped” upside down, a Gaussian curve fitting is performed to obtain the spread as the estimated variance of the model parameter.
Research Questions
The goal of this work is to answer the following questions:
how good the algorithm is in term of determining the variances of physical model parameters?
if there is a solid statistics ground to support and backup the validity of this method?
if yes in (2), can it be improved and further generalized to any physical models?
if no in (2), how well does it estimate as an approximation approach?
The main challenge of this project is to verify this algorithm and use statistics theory to explain when it works. If we can find the connection, then we are able to put this novel method on a solid statistics ground with high confidence for its reliability. Brake and West, former statistics graduate students at NCSU, have made some attempts (Brake and West 2021). Unfortunately, the work is not quite complete and many unanswered questions remain.
Methods and Data
Methods
In addition to the proposed algorithm Perturbed SSE Curve Fitting (PSCF) Method, we are going to use the following statistical methods to estimate the variance of an unknown parameter of interest. These include:
Non-linear regression model
Bootstrap confidence intervals
Parametric bootstrap
Non-parametric bootstrap
Delta method normality
The description of these method including their analysis techniques and procedures as well as the reason why we choose them for this study are given in each corresponding section below. At the end of this report, we compare the obtained results for a side-by-side comparison to justify if these method are sufficient and powerful to estimate a model parameter compared to Perturbed SSE Curve Fitting (PSCF) Method.
Data
In this study, we will look into a physical process. A physical model is used to generate simulated measurement data with a predetermined randomness in the model parameter. The detailed procedure is described in the next section.
Study Case: Attenuation Decay
In this work, we will use a simulated data set (details given later) to study a physical model attenuation decay.
Physical Model
Physical Model: Radiation strength measured at distance x
y = e^{-\lambda x} \tag{1}
Response variable: y= radiation strength
Explanatory variable (numeric): x= distance
Parameter of interest: \lambda = attenuation coefficient.
Estimator: \hat{\lambda} = -\frac{1}{x}\ln y
Goal: Estimate the parameter \lambda and its standard error
We treat the parameter \lambda as a random variable and follows a normal distribution. By central limit theorem (CLT), the sample mean of a random sample with sample size n is also distributed normally:
To make the analysis reproducible, we set the seed of the random generator:
# seed for random generatorseed <-2023# sample sizen <-100# model function and its inversefmod <-function(x, lambda){ exp(-lambda*x) }fmod_inv <-function(y, x){ -log(y)/x }# parameter - truthlambda_mean <-3lambda_sd <-1
We also assume \lambda_i\sim N(3, 1). With sample size of 100, the standard error of the sample mean is 0.1.
Next, we generate a simulated measurement data. For each measurement, y represents the recorded radiation strength measured at distance x.
# make this analysis reproducibleset.seed(seed)# parameter - random sampleslambdas <-rnorm(n, lambda_mean, lambda_sd)# histogram of lambda RShist(lambdas, breaks =10, freq = F, main ="", xlab ="")curve(dnorm(x, lambda_mean, lambda_sd), add = T)title(main ="Histogram of lambda (n = "%&% n %&%")", xlab ="lambda")
The data set contains 100 observations of the radiation strength and measurement distance.
# exam data setdf %>%select(-lambda)
# scatter plot between measured strength y and distance xplot(df$x, df$y, xlab ="x (measurement distance)", ylab ="y (measured radiation strength)")
It’s clearly to see the decay trend described in the physical model (the decay equation) and the randomness due to the uncertainty of the model parameter.
# Save session variables and function definition# summary table of truth (for later comparison)tbl_truth <-tibble(method ="Truth (All sample sizes = "%&% n %&%")", lambda = lambda_mean, std.error = lambda_sd/sqrt(n),lower_0.95 = lambda_mean +qt(0.05/2, n-1)*std.error,upper_0.95 = lambda_mean +qt(1-0.05/2, n-1)*std.error,width = upper_0.95- lower_0.95)# save session imagesave.image(here("data", "sim_decay.RData"))
Best Fit Model using Nonlinear Regression Model
To estimate the model parameter, the best fitted value of the model parameter is determined by minimizing the the residual sum-of-squares (sum squares of errors, SSE). The least squares estimation for the parameter is therefore determined by:
where f:\mathbb{R}^n\times \mathbb{R} \to \mathbb{R} is an unknown function with the independent variables x_i for i=1,\ldots,n and the unknown parameter \hat{\theta}. We are assuming the randomness of the measurement data y_i are independent and identically distributed (i.i.d.) and can be estimated by:
There are many possible way to perform the east squares estimation and obtain the parameter estimate numerically. In Prof. Hayes’s research, an industrial standard method, Monte Carlo N-particle (MCNP), is used.
For this work, since we use nonlinear regression instead to achieve the same minimization of SSE. There are two major reasons we choose this method. First, using nonlinear regression model doesn’t rely on the some strong assumptions including normality and linearity. With sufficiently large sample size, we can use nonlinear function f and its asymptotic normality of the least squares estimate (Seber and Wild 1989) to estimate the parameter:
Nonlinear regression model
model: y ~ fmod(x, lambda)
data: df
lambda
2.925
residual sum-of-squares: 0.349
Number of iterations to convergence: 2
Achieved convergence tolerance: 7.308e-06
# save as a comparison tabletbl_nlr <-tibble(method ="Nonlinear Regression", lambda = bestfit$lambda, std.error = bestfit$se,lower_0.95 = lambda_mean +qt(0.05/2, n-1)*std.error,upper_0.95 = lambda_mean +qt(1-0.05/2, n-1)*std.error,width = upper_0.95- lower_0.95)tbl_nlr
Note that the residual sum-of-squares is 0.3490421.
Perturbed SSE Curve Fitting (PSCF) Method
This proposed method fits a Gaussian curve on the sum square of errors (SSE) as a function of the model parameter \lambda. Since the estimate is obtained by minimizing SSE, the SSR curve is a convex curve with the minimum at \hat{\lambda}. To further investigate this visually, we first define functions to compute the SSE as a function of the mode parameter.
As seen above, the choice of h can be arbitrary. In the following attempts, we will use these three values to flip the curve and perform curve fitting as proposed.
# define plotting parametersfrom_to <- lambda_mean +c(-3, 3)*lambda_sd# plot SSE vs. lambdacurve(SSE(x, df, fmod), from = from_to[1], to = from_to[2],xlab =bquote(lambda), ylab =bquote(SSE(lambda)))
This means we need to know:
How to flip the SSE curve is an arbitrary choice, ie. what value of h should be used?
What is the range of \lambda should be included in the fitting?
The inverse SSE curve is skewed and is not bell shaped. Is Gaussian curve a good candidate for the fitting?
Curve Flipping and Range Selection
We first address the first two by parameterizing the objective function for SSE(\lambda) minimization as h-SSE(\lambda) so that we can fit a Gaussian curve:
By doing so, we include two parameters h to control how to flip the curve and a to control the scaling width. However we need to address that the width of a Gaussian curve is then determined by this scaling factor a and the standard deviation \sigma. We will need to expect how the fitting works and decide our fitting strategy.
In Prof. Hayes’ algorithm, if the initial fitting range is not specified, the reflection points calculated by solving h-SSE(\lambda)=0 will be used.
Next, we define a helper function to compute \widehat{SSE}(\lambda) based on a given fitting function fn(). In the case of Gaussian fitting, the R base function dnorm() is used.
# define a function to compute SSE fitted by the specified curveSSE_hat <-function(lambdas, h, a, fn, ...){ sse_fit <-abs(h) -abs(a)*fn(lambdas, ...)return(sse_fit)}
Finally, we define couple more helper functions to carry on the curve fitting to optimize a specified objective function in order to minimize the residual sum squares:
fit_optim(): perform optimization to minimize the objective function using a Gaussian curve.
fit_summarize(): summarize the results.
fit_plot(): plot \widehat{SSE}(\lambda) and SSE(\lambda) for comparison and visually verify the goodness of the fitting (optimization).
# minimize sse(lambda) and plot sse for comparison using Gaussian fit## pars = {h, a, mean, sd, range_lb, range, n}fit_optim <-function(pars, pars_control, df, fmod, bestfit){ update_range <-function(pars){# add range_lb and range if either is missing in pars# use "reflection points" as endpoints of fitting range #print(paste(pars[c("h", "range_lb")]))if(is.na(pars$range_lb) |is.na(pars$range)){ range_by_h <-get_range(pars$h, df, fmod, bestfit)if(is.na(pars$range_lb)) pars$range_lb <- range_by_h$lbif(is.na(pars$range)) pars$range <- range_by_h$range_std }#print(paste(pars[c("h", "range_lb")])) pars } # update parameters update_pars <-function(b, pars){ pars_new <-c(b, pars[setdiff(names(pars), names(b))])[names(pars)]#print(pars_new[c("h", "range_lb")] %>% paste()) pars_new <-update_range(pars_new)#print(pars_new[c("h", "range_lb")] %>% paste())return(pars_new) }# compute sse_data and sse_fit for residuals sse <-function(pars2){# determine lower and upper bound of lambda lb <- bestfit$lambda -abs(pars2$range_lb)*bestfit$sd ub <- lb +abs(pars2$range)*bestfit$sd# set up grid for lambda lambda <-seq(lb, ub, length.out = pars2$n)# compute sse_data and sse_fit sse_data <-SSE(lambda, df, fmod) sse_fit <-SSE_hat(lambda, pars2$h, pars2$a, dnorm, pars2$mean, pars2$sd)# pack into a data frame and returnreturn(tibble(lambda = lambda, sse_data = sse_data, sse_fit = sse_fit)) }# objective function: residual sum squares fn_obj <-function(b){# update parameters pars2 <-update_pars(b, pars)# compute residuals df2 <-sse(pars2) residuals <- df2$sse_fit - df2$sse_data# return residual sum squaresreturn(sum(residuals^2)) } pars <-update_range(pars)# optimize with minimization of residual sum squares fit <-optim(par = pars[pars_control], fn = fn_obj)# retrieve parameters pars2 <-update_pars(fit$par, pars) %>%lapply(abs)# compute sse_data and sse_fit df2 <-sse(pars2) rss <-sum((df2$sse_fit - df2$sse_data)^2) width <-max(df2$lambda) -min(df2$lambda) pars2 <-c(rss = rss, width = width, pars2) %>%lapply(signif, digits =4)# alert if minimized values don't matchif(fit$value != rss){print(glue("Not matched! {fit$value} vs. {rss}")) }return(list(fit = fit, pars = pars2, df_sse = df2))}# summarize fit resultsfit_summarize <-function(fits, title ="", plot = F, print = F){ fit <- fits$fit pars <- fits$pars df_sse <- fits$df_sse tbl_par <-as_tibble(pars) %>%mutate(title = title) %>%relocate(title, rss, width, h, a, mean, sd)if(print) print(tbl_par)# plot for comparison p <-fit_plot(df_sse, pars, title)if(plot) print(p)# return summary table tbl <-tibble(method = title,lambda = pars$mean,std.error = pars$sd/sqrt(nrow(df)),lower_0.95 = lambda_mean +qnorm(0.05/2)*std.error,upper_0.95 = lambda_mean +qnorm(1-0.05/2)*std.error,width = upper_0.95- lower_0.95)return(list(pars = tbl_par, tbl = tbl, p = p))}# plot for fit comparisonfit_plot <-function(df_sse, pars, title){# fewer digits for easier reading pars <-lapply(pars, round, digits =3)# plot comparison;' df_sse %>%pivot_longer(cols =starts_with("sse"), names_to ="Type") %>%ggplot(aes(lambda, value, col = Type)) +geom_line() +labs(x =bquote(lambda),y =bquote(SSE(lambda)),title =glue("{title}\n","rss = {pars$rss}, width = {pars$width}"),subtitle =glue("(mean, sd) = ({pars$mean}, {pars$sd})\n","(h, a) = ({pars$h}, {pars$a})"))}
In fit_optim(), we include the following fitting parameters and their initial values in a vector pars:
h and a: Control how to flip the SSE curve and its scaling width. Both the initial values are set to 1.
mean, sd: Control location and shape of the Gaussian curve. The mean and standard deviation obtained in the best fitting (non-linear regression) are used as initial values.
range_lb: Controls range of \lambda, lower bound of the fitting range (unit: sd). The initial lower bound is set to the 2 standard deviation lower than the initial mean.
range: Controls range of \lambda, numbers of sd to be included in the fitting (unit: sd). The initial range to be included is at least 4 standard deviations.
n: Controls how many grid points in the fitting range \lambda.
If either of range_lb and range is given, we use the “reflection points” as endpoints of the fitting range. This is defined by solving h-SSE(\lambda)=0.
Also, a parameter control vector pars_control is used to specify which parameters are varied to minimize the objective function. For example, c(1, 2, 3, 4) means we only vary h, a, mean and sd (the 1st, 2nd, 3rd and 4th parameters listed above) to optimize the fitting.
Fitting Strategy I: Fit Without Specifying Initial Fitting Range
In our first attempt, we fit SSE(\lambda) without explicitly specifying initial fitting range. the “reflection points” will be used to bound the fitting range based on the value of h:
Control (case 1): fit all parameters
Group 1 (case 2-3): fixed h or a - how to flip the SSE curve
Group 2 (case 4-5): fixed mean or sd - shape of the Gaussian curve
Group 3 (case 6-8): fixed range_lb and/or range - fitting range
# initial parameters and control which to fitpars <-list(h =1, a =1, mean = bestfit$lambda, sd = bestfit$sd, range_lb =NA, range =NA, n =100)# optimization fit: fit all parametersopt1 <-c(1, 2, 3, 4, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="fit all")# optimization fit: fit all parameters except hopt2 <-c(2, 3, 4, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except h")# optimization fit: fit all parameters except aopt3 <-c(1, 3, 4, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except a")# optimization fit: fit all parameters except meanopt4 <-c(1, 2, 4, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except mean")# optimization fit: fit all parameters except sdopt5 <-c(1, 2, 3, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except sd")# optimization fit: fit all parameters except range_lbopt6 <-c(1, 2, 3, 4, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except range_lb")# optimization fit: fit all parameters except rangeopt7 <-c(1, 2, 3, 4, 5) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except range")# optimization fit: fit all parameters with fixed intervalopt8 <-c(1, 2, 3, 4) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="fixed range")# compare visuallyggarrange( opt1$p, opt2$p, opt3$p, opt4$p, opt5$p, opt6$p, opt7$p, opt8$p,ncol =4, common.legend =TRUE, legend ="bottom")
$`1`
$`2`
attr(,"class")
[1] "list" "ggarrange"
# summarize in a tablebind_rows( opt1$pars, opt2$pars, opt3$pars, opt4$pars, opt5$pars, opt6$pars, opt7$pars, opt8$pars) %>%rowid_to_column()
Not surprisingly, the case 1 has the smallest fitting error (rss) because of all the parameters are free to change. But this is not a good fitting because if only fit one side of the curve to get the smallest rss.
Group 1 (case 2-3): fixed h or a - how to flip the SSE curve
Similarly to what happened previously, if we fit everything but h or a, the minimization of rss leads to fitting on one side of the curve.
Group 2 (case 4-5): fixed mean or sd - shape of the Gaussian curve
By fix either mean or sd, we fix the location and shape of the Gaussian curve. Fixing mean leads to a very narrow fitting range (plot 4). Fixing sd gives a off-center fitting range (plot5).
Group 3 (case 6-8): fixed range_lb and/or range - fitting range
The lower bound of the reflection points is very far away from the mean. Therefore, in case 6 and 8, the lower bound of the fitting range (range_lb) is fixed and this leads to the fitting curve with a large fitting range and large rss. On the other hand if we only fix the width of the fitting raneg (range), we get a pretty good fit (plot 7) although it’s off-center similarly to case 5.
Based on the results above, in order to get a good fitting, we need to force the fitting range centered and force it to include a wider range. These can be done by fixing range_lb and range parameters. If we let them be free, the fitting results will be either off-centered or using a very narrow fitting range.
Fitting Strategy II: Fit With Specifying Initial Fitting Range
Based on the learning from the previous section, we should control the fitting range. In this study, we will control it with the initial values of the range (the location range_lb and the width range):
Group 1 (case 1-4): the initial fitting range is set to be 1.5 standard deviations of the best fitting
Group 2 (case 5-8): the initial fitting range is set to be 1.5 standard deviations of the best fitting
In both groups, the fitting range is centered initially and we try to keep one parameter free at a time in the first three cases (case 1-3 in group 1 and case 5-7 in group 2). In the last set (case 4 and 8) of each group, we will keep both the fitting ranges fixed to the initial values.
# initial parameters and control which to fitpars <-list(h =1, a =1, mean = bestfit$lambda, sd = bestfit$sd, range_lb =-0.75, range =1.5, n =100)# optimization fit: fit all parameters except sdopt1 <-c(1, 2, 3, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except sd")# optimization fit: fit all parameters except range_lbopt2 <-c(1, 2, 3, 4, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except range_lb")# optimization fit: fit all parameters except rangeopt3 <-c(1, 2, 3, 4, 5) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except range")# optimization fit: fit all parameters with fixed intervalopt4 <-c(1, 2, 3, 4) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="fixed range")# initial parameters and control which to fitpars <-list(h =1, a =1, mean = bestfit$lambda, sd = bestfit$sd, range_lb =-1, range =3, n =100)# optimization fit: fit all parameters except sdopt5 <-c(1, 2, 3, 5, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except sd")# optimization fit: fit all parameters except range_lbopt6 <-c(1, 2, 3, 4, 6) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except range_lb")# optimization fit: fit all parameters except rangeopt7 <-c(1, 2, 3, 4, 5) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="except range")# optimization fit: fit all parameters with fixed intervalopt8 <-c(1, 2, 3, 4) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="fixed range")# compare visuallyggarrange( opt1$p, opt2$p, opt3$p, opt4$p, opt5$p, opt6$p, opt7$p, opt8$p,ncol =4, common.legend =TRUE, legend ="bottom")
$`1`
$`2`
attr(,"class")
[1] "list" "ggarrange"
# summarize in a tablebind_rows( opt1$pars, opt2$pars, opt3$pars, opt4$pars, opt5$pars, opt6$pars, opt7$pars, opt8$pars) %>%rowid_to_column()
It is not surprising that the smallest RSS is obtained when we have more fitting parameter free to change. The trade off again (as seen in the previous section), the fitting range is very narrow and might become off centered. The obtained standard deviations (sd) have very wide range too. Based on the comparison plots, to ensure we use have a centered reasonable range, we need to keep the two parameters range_lb and range fixed. This means we cannot let the algorithm to determine the fitting range.
# try centered cases with various fitting range ranges_lst <-c(1, 1.5, 2, 2.5, 3, 3.5)plot_lst <-list()for(i inseq_along(ranges_lst)){ range_i <- ranges_lst[[i]]# set up parameter pars <-list(h =1, a =1, mean = bestfit$lambda, sd = bestfit$sd, range_lb =-(range_i/2), range = range_i, n =100)# optimization fit: fit all parameters with fixed interval opt <-c(1, 2, 3, 4) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title ="fixed range") plot_lst <-append(plot_lst, list(opt$p))}# plot comparisonggarrange( plot_lst[[1]], plot_lst[[2]], plot_lst[[3]], plot_lst[[4]], plot_lst[[5]], plot_lst[[6]],ncol =3, nrow =2, common.legend =TRUE, legend ="bottom")
Fitting Strategy III: Grid Search
If we need to specify the fitting range, the remaining question is how the fitted standard deviation changes with the specified fitting range. To understand this, a grid search of range_lb and range has been performed:
# a helper function to do the fitting used in a grid search setupsearch_optim <-function(pars, range_lb.init, range.init){# initial parameters and control which to fit pars$range_lb = range_lb.init pars$range = range.init# optimization fit: fit all parameters except sd opt <-c(1, 2, 3, 4) %>%fit_optim(pars, pars_control = ., df, fmod, bestfit) %>%fit_summarize(title =glue("init. (h, a, lb, range)= ","({pars$h}, {pars$a}, {pars$range_lb}, {pars$range})"))}# initial parameters and control which to fitpars <-list(h =1, a =1, mean = bestfit$lambda, sd = bestfit$sd, range_lb =NA, range =NA, n =100)# grid searchdf_pscf <-expand_grid(range_lb.init =seq.int(-20, -5, 1)/10, range.init =seq.int(10, 40, 2)/10) %>%filter(range.init >=2*abs(range_lb.init)) %>%rowwise() %>%mutate(opt =list(search_optim(pars, range_lb.init, range.init)))# summary tablep4_summary <- df_pscf %>%hoist(opt, pars =1, tbl =2, plot =3) %>%unnest(pars) p4_summary %>%select(1:2, rss:sd)
df_grid <- p4_summary %>%select(range_lb.init, range.init) %>%arrange(range.init) %>%group_by(range.init) %>%mutate(xmin = range_lb.init,xmax =- range_lb.init,count =row_number(),y =4*range.init + (count-1)*0.1,text ="width = "%&% range.init %&%"x" )df_grid %>%ggplot(aes(xmin = xmin, xmax = xmin + range.init, y = y)) +geom_errorbarh(aes(col =factor(count))) +geom_label(#aes(xmin + range.init + 0.5, y = y, label = text), aes(2.5, y = y, label = text), filter(df_grid, count ==1),hjust =0) +theme(legend.position ="none",axis.text.x =element_blank(),axis.text.y =element_blank(),axis.ticks.y =element_blank()) +geom_vline(xintercept =0) +annotate("text", x =0.05, y =18, label ="bets-fitted lambda", hjust =0) +labs(x ="Fitting Range (Related to Best-Fitted Lambda)", y ="")
The summary table is huge and hard to read. It is easier to plot SSE(\lambda) as a function of the fitting range: range_lb as for its location in the horizontal axis and range as for its width in different color:
# plot the overall comparisonp4_summary %>%ggplot(aes(abs(range_lb.init), rss, col =fct_rev(factor(range.init)))) +geom_line() +geom_point() +labs(x ="Fitting Lower Bound [# of sd]",y ="SSE(lambda)",color ="Fitting Range [# of sd]" )
As seen in the plot, the minimization of SSE(\lambda) will leads the fitting algorithm to use very narrow fitting range. When we force it to use a large fitting range, the SSE(\lambda) becomes bigger. This clearly explains why this algorithm could not give us a stable fitted standard deviation sd - there is no global minimum.
# focus on the centered casesp4_centered <- p4_summary %>%filter(range.init ==2*abs(range_lb.init)) p4_centered %>%select(1:2, title, rss:sd)
# summary plot for the centered casesp4_centered %>%ggplot(aes(range.init, sd)) +geom_line() +geom_point() +labs(x ="Fitting Range [unit: # of bestfit standard devation]",y ="Fitted Standardard Deviation",title ="Fitted sd from the centered cases" )
The algorithm can give a very wide estimated (fitted) standard deviation depending on the initial values of the fitting range (location range_lb and width range). Even for the cases with forced centered ranges, the obtained fitting results are not stable.
Bootstrapping CI
Bootstrapping is a very powerful method to sample estimates by using random sampling with replacement. The central idea is assuming the bootstrap statistics vary in a similar fashion to the sample statistic of interest. Resample of the sample of same sample size with replacement. Bootstrap has two major uses (Wood 2009): 1. Approximating the shape of a sampling distribution (to form a CI) 2. Estimating the spread of an estimator
The assumption of bootstrapping is that variation between each resample (bootstrap sample) gives a good idea of sampling error when sampling a real population. Although the resample is not the real population, it is reasonably similar to sampling from the population under this assumption and thus provides a good estimate or test measure of interest. This is the main reason we choose this method in this study.
In general, the workflow of bootstrap confidence intervals is:
resample B stacks of bootstrap samples (same sample size n with replacement)
this can be done from observations (non-parametric) or from a known distribution (parametric)
calculate B (plug-in) bootstrap estimates of the sample statistic
assume these bootstrap statistics vary in a similar fashion to your sample statistic
to approximate shape of the distribution \text{CI}
to obtained spread of the estimator \hat{\text{SE}}(\hat{\underset{\sim}{\alpha}})
For each bootstrap sample, we can obtain the parameter estimate using parametric method or non-parametric method.
Parametric Bootstrap CI
Based on the physical model, we can estimate the parameter \hat{\lambda} = -\frac{1}{x}\ln y (Equation 1) using each bootstrap sample and calculate its average (Equation 2):
Resample data with the same sample size n
For each resample observation, compute the parameter estimate from (x, y)
Compute the average estimate
Repeat for N times
Non-Parametric Bootstrap CI
For the non-parametric approach, we use the bootstrap sample to refit a nonlinear regression model and obtain the NLR estimate for the parameter.
Resample data with the same sample size n
For each resample data set, fit a nonlinear regression model
Obtain NLR estimate
Repeat for N times
Bootstrap CI Results
We then use the distribution of the bootstrap statistics obtained above to approximate the shape of the distribution CI. Furthermore, for computational efficiency, we combine these two into one loop:
# number of bootstrapping samplesN <-0.1*1000# bootstrap for N timesboot <-1:N %>%sapply(\(x){# resample: index of rows idx <-sample(1:n, n, replace = T)# resample data df_boot <- df[idx, ] %>%# bootstrap A: calculate the lambda for each resampled observationmutate(estimate =fmod_inv(y, x))# bootstrap B: fit nlr model for each resampled data set mod_boot <-nls(y ~fmod(x, lambda), df_boot, start =c(lambda = lambda_mean))# return the estimatesc(A =mean(df_boot$estimate), B =tidy(mod_boot)$estimate) }) %>%# transpose and convert into a data framet() %>%as_tibble()
# histogram and CI# save and restore optionop <-par(pty="m", mfrow=c(2, 1), mar=c(4.2, 4.2, 1, 1))hist(boot$A, main ="Parametric Bootstrap", xlab ="lambda")abline(v =quantile(boot$A, probs =c(0.05/2, 1-0.05/2)))hist(boot$B, main ="Non-parametric Bootstrap", xlab ="lambda")abline(v =quantile(boot$B, probs =c(0.05/2, 1-0.05/2)))
Delta method normality is theorem that can be used to derive the distribution of a function of an asymptotically normal variable. Combining the advantages of the two powerful statistic concepts: large sample normality and delta method, it is very useful to approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator (Kelley 1928). The main appealing feature of this method is that we don’t need know much about the expected value and variance of the function g due to its asymptotic normality.
Univariate Delta Method
If Y \overset{\bullet}{\sim} N(\mu, \sigma^2) and there exist a transformation function g and value \mu and g'(\mu)\ne0, then the Taylor series expansion (Taylor approximation) is used to derive variation around a point (Doob 1935):
Now consider i.i.d. sample Y_i for g(\bar{Y}) as a sequence of RVs such that Y_n\overset{\bullet}{\sim}N(\mu, \sigma^2/n). For a function g and value \theta_0 where g'(\theta_0) exists and is not 0, we approximate the distribution of g(Y_n) using Delta method:
With the estimator of our model parameter \hat{\lambda} (Equation 1), we have g(Y) = -\frac{1}{x}\ln Y. Therefore, we can estimate the distribution of the model parameter:
Note that we use t-distribution to compute the 100(1-\alpha)\% CI.
Conclusion
In this study, we introduce the algorithm developed by Prof. Hayes to estimate the standard deviation of a model parameter by fitting a Gaussian curve. After implementing the algorithm, we perform a extensive study on a simulated data set to test how well the algorithm performs in the estimating the parameter’s standard deviation. In this study, we have shown that the obtained fitting results are not stable and greatly depend on the initial fitting values of the parameters. Among them, the parameters associating with the fitting range (the location range_lb and the width range) have the biggest impacts on the fitting results.
Even if we force the fitting range to be centered around the best fitted value, the range of the fitted standard deviation sd is still very wide. This is because the objective function we use to minimize in the algorithm doesn’t have a global minimum.
While we cannot get a reliable estimation on the parameter’s standard deviation using the proposed algorithm, we can still take advantages of the statistical methods like non-linear regression, bootstrapping CI and Delta method normality. These methods are well developed and have been used in many applications. In this study, we have showed that these methods are capable to obtain pretty good estimation on the model parameter.
# comparison - using statistical methodsrbind(tbl_nlr, tbl_bootA, tbl_bootB, tbl_delta1, tbl_truth)
References
Brake, A., and West, R. (2021), “Retrospective dosimetry for nuclear nonproliferation and emergency response,” North Carolina State University.
Doob, J. L. (1935), “The Limiting Distributions of Certain Statistics,”The Annals of Mathematical Statistics, 6, 160–169. https://doi.org/10.1214/aoms/1177732594.
Hayes, R. B., and O’Mara, R. P. (2019), “Retrospective dosimetry at the natural background level with commercial surface mount resistors,”Radiation Measurements, 121. https://doi.org/10.1016/j.radmeas.2018.12.007.
Kelley, T. L. (1928), Crossroads in the mind of man; a study of differentiable mental abilities., Stanford Univ. Press. https://doi.org/10.1037/11206-000.
O’Mara, R. P. (2020), Retrospective dosimetry for nuclear nonproliferation and emergency response, North Carolina State University.