Question 2 (Bin shift correction)

We start by creating the data points at each bin centre:

require(tidyverse)

dat <- read.csv("phydat.csv")

data.points.at.bin.centres <- dat[,3]

bin.centres <- (dat[,1]+dat[,2])/2

dat2 <- rbind(bin.centres,data.points.at.bin.centres)

dat2 <- as.tibble(dat2)

names(dat2) <- paste0("Bin # ",seq(1:12))
rownames(dat2) <- c("x","y")

require(knitr)

#kable(dat2)
Data Values at Bin Centres

Data Values at Bin Centres

x. <- as.numeric(dat2[1,])
y. <- as.numeric(dat2[2,])

We estimate the true pdf, by optimizing an exponential function with the given x and y values (bin centres and values at bin centres)

f.x <- function(params,x=x.,true=y.){
  A <- params[1]
  b <- params[2]
  
  ses <- 0
  
  for(i in seq(1:length(x))){
    y <- A*exp(-b*x[i])
    es <- (true[i]-y)^2
    ses <- ses+es
  }
  
  mes <- ses/length(x)
  rms <- sqrt(mes)
  return(rms)
  
}

a <- optim(fn=f.x,par = c(1,0.1),control = c("maxit"=10000000))

true.A <- a$par[1]
true.b <- a$par[2]

plot(x=x.,y=y.,type="p",pch=19,col="red",cex=0.8)
lines(x=x.,y=true.A*exp(-true.b*x.))

We use the following formula to shift the data points in x to better approximate the true pdf:

Bin Shift Correction

Bin Shift Correction

After calculating the bin-shift correction for each original bin centre, we plot it as follows:

Note that the dashed vertical lines are an indication of how much the x-value shifted in each bin (in the direction red -> blue)

delta <- dat[,2]-dat[,1]
  new.x <- c()
  for(i in seq(1:length(x.))){
    
    shif.x <- x.[i]-((true.b*delta[i]^2)/24)
    new.x <- c(new.x,shif.x)
  }


plot(x=x.,y=y.,type="p",pch=19,col="black",cex=1,main="Original values at bin centres (black dots)\n vs. Bin-shift corrected values (white dots)",xlab="PT (GeV)",ylab="dN/dPT (GeV^-1)")
lines(x=x.,y=true.A*exp(-true.b*x.))
points(x=new.x,y=y.,type="p",pch=21,col="black",bg="white",cex=1)
abline(v=x.,col="red",lty="dotted")
abline(v=new.x,col="blue",lty="dotted")

Next, we inspect the error squared values comparing the original values at bin centres with true pdf vs. the error squared vlaues, when having those values at bin-shift corrected x-values

ii <- (true.A*exp(-true.b*x.)-x.)^2
jj <- (true.A*exp(-true.b*x.)-new.x)^2

plot(x=x.,y=ii,type="l",lwd=1,col="red",main="Sum of error squared for original x values (RED),\nvs. bin-shift corrected x-values (BLUE)",ylab = "Error squared",xlab="PT (GeV)")
lines(x=x.,y=jj,lwd=1,col="blue")

It is clear to see from the above plot that the bin-shift corrected x-values have a consistently lower error-squared value in all bins, we illustrate this further by showing the following table:

tableu <- as.data.frame(cbind(x.,new.x,(new.x-x.),ii,jj,(jj-ii)))

names(tableu) <- c("Original x","Corrected x","Magnitude of correction","Original error squared","New error squared","Reduction in error")

#kable(tableu)

save.image("Q2.RData")
Analysis of value of bin shift correction, as assessed by the reduction in error squared values

Analysis of value of bin shift correction, as assessed by the reduction in error squared values

Conclusion:

In every bin, shifting the x values to the left, according to the formula used, results in a reduction in error when estimating the true pdf