Objective of tutorial: Illustrate the basic functionality of dftools by reproducing the HI mass function in Fig. 7 of Westmeier et al. 2017 (https://arxiv.org/pdf/1709.00780.pdf).
Load the dftools library
library(dftools)
Download the HI-mass data of Westmeier et al. 2017 (https://arxiv.org/pdf/1709.00780.pdf):
data = read.table('http://quantumholism.com/dftools/westmeier2017.txt',header=TRUE)
There are 31 galaxies in this sample, hence the dataframe data has 31 rows. This data can be recast into the log-masses x, normally used by dftools. We assume the mass uncertainties to be normal in x and determine their amplitude x.err using linear error propagation. We also define the vector of effective volumes veff.values.
x = log10(data$MHI)
x.err = data$errMHI/data$MHI/log(10)
veff.values = data$Vmax
Now fit these data:
survey = dffit(x,veff.values,x.err)
Let’s plot the fit:
mfplot(survey,xlim=c(2e6,5e10),ylim=c(1e-3,1))
and show the fitted parameters:
dfwrite(survey)
## dN/(dVdx) = log(10)*10^p[1]*mu^(p[3]+1)*exp(-mu), where mu=10^(x-p[2])
## p[1] = -1.315 (+-0.276)
## p[2] = 9.541 (+-0.308)
## p[3] = -1.103 (+-0.147)
The dashed line in the bottom panel shows the effective volume as a function of mass, recovered from the 31 values of veff.values. By default an effective volume of 0 for masses smaller than the smallest observed mass, and identical to the maximum volume for masses larger than the largest observed mass. If a better model is available from survey-specific considerations, then this information can be exploited to improve the fit. In this example, we like to replace the assumtion of \(Veff=0\) for \(x<xmin\), by \(Veff=max(0,(x - 6.53)*75)\). To this end, it suffices to define a list composed of the vector veff.values and a new function veff.function, specifying the effective volume outside the observerd mass range:
veff.function = function(x) {
veff.max = max(veff.values)
return(pmax(0, pmin(veff.max, (x - 6.53) * 75)))
}
selection = list(veff.values,veff.function)
Now fit again:
survey = dffit(x,selection,x.err)
and plot the best-fitting solution
dfwrite(survey)
## dN/(dVdx) = log(10)*10^p[1]*mu^(p[3]+1)*exp(-mu), where mu=10^(x-p[2])
## p[1] = -1.308 (+-0.272)
## p[2] = 9.535 (+-0.305)
## p[3] = -1.097 (+-0.146)
As can be seen the parameters have change very slightly due to the modified effecive volume at the lowest masses. The printed parameters have symmetric Gaussian uncertainties, determined in the Lapace approximation (i.e. by inverting the Hessian matrix of the modified likelihood function). To allow for non-Gaussian parameter posteriors, we now refit the data while bootstrapping it 10^3 times:
survey = dffit(x,selection,x.err,n.bootstrap = 1e3)
Finally, let’s produce the plot with 68% and 95% confidence regions around the best fit. Also change fit color to red, change data color to black, remove posterior data, remove effective volume line, and adjust binning of input data. Then, add HIPASS and ALFALFA lines.
mfplot(survey,xlim=c(2e6,5e10),ylim=c(1e-3,1),uncertainty.type=3,
col.fit='red',col.data.input='black',show.posterior.data=FALSE,
col.veff=NULL,nbins=6,bin.xmin=6.5,bin.xmax=9.5)
x = c(survey$grid$x)
lines(10^x,dfmodel(x,c(log10(6.0e-3),9.80,-1.37)),lty=2,lwd=1.5,col='blue') # HIPASS HIMF
lines(10^x,dfmodel(x,c(log10(4.8e-3),9.96,-1.33)),lty=4,lwd=1.5,col='#00bb00') # ALFALFA HIMF
and write the best-fitting parameters:
dfwrite(survey)
## dN/(dVdx) = log(10)*10^p[1]*mu^(p[3]+1)*exp(-mu), where mu=10^(x-p[2])
## p[1] = -1.308 (+0.264 -0.265)
## p[2] = 9.535 (+0.145 -0.187)
## p[3] = -1.097 (+0.179 -0.133)
Note that there are marginal differences to the uncertainty ranges quoted in the publication, due to a difference in the bootstrapping technique used in the publication (parametric bootstrapping) and the current version of dftools (non-parametric bootstrapping).