FoF Statistics

Opening the FITS file.

First, I copied the file of interest to my laptop. The file size is less than 1 MB. Using the (modified) FITSio package, it is easy to read the file.

filename <- "DES0241-3457_r2674p01-y3a1-main-mof-001-nbrsfofs.fits"
fitsfile <- file(description = filename, open = "rb")
readFITSheader(fitsfile)           # Read and dispose of first HDU

## [1] "SIMPLE  =                    T / file does conform to FITS standard             "
## [2] "BITPIX  =                   16 / number of bits per data pixel                  "
## [3] "NAXIS   =                    0 / number of data axes                            "
## [4] "EXTEND  =                    T / FITS dataset may contain extensions            "
## [5] "COMMENT   FITS (Flexible Image Transport System) format is defined in 'Astronomy"
## [6] "COMMENT   and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H "

header <- readFITSheader(fitsfile) # The second HDU contains the table
fitslist <- readFITSbintable(fitsfile, header)
close(fitsfile)
rm(fitsfile)
str(fitslist)

## List of 8
##  $ col     :List of 2
##   ..$ :integer64 [1:61304] 0 1 2 74 3 4 5 6 ... 
##   ..$ :integer64 [1:61304] 1 2 3 4 5 6 7 8 ... 
##  $ hdr     : chr [1:30] "XTENSION" "BINTABLE" "BITPIX" "8" ...
##  $ colNames: chr [1:2] "fofid" "number"
##  $ colUnits: chr [1:2] "" ""
##  $ TNULLn  : int [1:2] NA NA
##  $ TSCALn  : num [1:2] 1 1
##  $ TZEROn  : num [1:2] 0 0
##  $ TDISPn  : chr [1:2] "" ""

Now we can make the data.frame we want:

d <- as.data.frame(fitslist$col)
names(d) <- fitslist$colNames
str(d)

## 'data.frame':    61304 obs. of  2 variables:
##  $ fofid :integer64 0 1 2 74 3 4 5 6 ... 
##  $ number:integer64 1 2 3 4 5 6 7 8 ...

summary(d)

	fofid	number
	Min. : 0	Min. : 1
	1st Qu.: 7632	1st Qu.:15327
	Median :13876	Median :30652
	Mean :12462	Mean :30652
	3rd Qu.:18541	3rd Qu.:45978
	Max. :18718	Max. :61304

The summary shows that the integer64 columns are never carrying large values, and can safely be turned into integer (32-bit) columns, which are handled better by R.

d <- mutate(d, fofid=as.integer(fofid), number=as.integer(number))

Each row corresponds to an object; the column number carries the identifier for the object, and the column fofid carries the number of the cluster in which that object has been clustered. We have 61304 objects in this file, and they have been grouped into 18719 clusters.

Analysis of clusters

clusters = d %>% group_by(fofid) %>% summarize(nobj=n())
head(clusters)

fofid	nobj
0	1
1	13
2	1
3	3
4	1
5	1

The distribution of nobj is extremely heavy-tailed. A logarithmic box-and-whisker plot shows this well.

bwplot(~nobj, clusters, 
       xlab="Number of objects per cluster",
       scales=list(x=list(log=10, equispaced=FALSE)))

A summary of nobj is interesting:

summary(clusters$nobj)

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
1	1	1	3.274961	2	16064

Finally, here are all the clusters with nobj > 100:

clusters %>% 
  filter(nobj>100) %>%
  xyplot(nobj~fofid, .,
         xlab = "cluster ID",
         ylab = "Number of objects per cluster",
         scales=list(y=list(log=10, equispaced=FALSE)))

Frequency of different cluster sizes

freq <- clusters %>% group_by(nobj) %>% summarize(n = n())
freq

nobj	n
1	12189
2	2911
3	1300
4	662
5	407
6	242
7	182
8	151
9	121
10	62
11	69
12	54
13	31
14	40
15	35
16	26
17	17
18	21
19	10
20	18
21	11
22	9
23	12
24	10
25	7
26	8
27	9
28	7
29	6
30	7
31	4
32	3
33	2
34	1
35	7
36	4
37	2
38	3
39	1
40	4
41	3
42	2
45	2
46	1
47	2
48	2
49	3
50	1
51	1
53	1
54	3
55	1
56	2
58	1
61	2
62	2
64	1
65	1
68	2
69	1
73	1
84	2
88	2
96	1
97	1
98	1
102	1
108	1
132	1
142	1
182	1
217	1
227	1
249	1
387	1
429	1
544	1
16064	1

A log-log plot shows how close this is to a power law distribution:

logscale <- list(log=10,equispaced=FALSE)
loglog <- list(x = logscale, y = logscale)
xyplot(n~nobj, freq, scales=loglog, grid=TRUE)

FoF Statistics

Marc Paterno

10/13/2017

Opening the FITS file.

Analysis of clusters

Frequency of different cluster sizes