Opening the FITS file.

First, I copied the file of interest to my laptop. The file size is less than 1 MB. Using the (modified) FITSio package, it is easy to read the file.

filename <- "DES0241-3457_r2674p01-y3a1-main-mof-001-nbrsfofs.fits"
fitsfile <- file(description = filename, open = "rb")
readFITSheader(fitsfile)           # Read and dispose of first HDU
## [1] "SIMPLE  =                    T / file does conform to FITS standard             "
## [2] "BITPIX  =                   16 / number of bits per data pixel                  "
## [3] "NAXIS   =                    0 / number of data axes                            "
## [4] "EXTEND  =                    T / FITS dataset may contain extensions            "
## [5] "COMMENT   FITS (Flexible Image Transport System) format is defined in 'Astronomy"
## [6] "COMMENT   and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H "
header <- readFITSheader(fitsfile) # The second HDU contains the table
fitslist <- readFITSbintable(fitsfile, header)
close(fitsfile)
rm(fitsfile)
str(fitslist)
## List of 8
##  $ col     :List of 2
##   ..$ :integer64 [1:61304] 0 1 2 74 3 4 5 6 ... 
##   ..$ :integer64 [1:61304] 1 2 3 4 5 6 7 8 ... 
##  $ hdr     : chr [1:30] "XTENSION" "BINTABLE" "BITPIX" "8" ...
##  $ colNames: chr [1:2] "fofid" "number"
##  $ colUnits: chr [1:2] "" ""
##  $ TNULLn  : int [1:2] NA NA
##  $ TSCALn  : num [1:2] 1 1
##  $ TZEROn  : num [1:2] 0 0
##  $ TDISPn  : chr [1:2] "" ""

Now we can make the data.frame we want:

d <- as.data.frame(fitslist$col)
names(d) <- fitslist$colNames
str(d)
## 'data.frame':    61304 obs. of  2 variables:
##  $ fofid :integer64 0 1 2 74 3 4 5 6 ... 
##  $ number:integer64 1 2 3 4 5 6 7 8 ...
summary(d)
fofid number
Min. : 0 Min. : 1
1st Qu.: 7632 1st Qu.:15327
Median :13876 Median :30652
Mean :12462 Mean :30652
3rd Qu.:18541 3rd Qu.:45978
Max. :18718 Max. :61304

The summary shows that the integer64 columns are never carrying large values, and can safely be turned into integer (32-bit) columns, which are handled better by R.

d <- mutate(d, fofid=as.integer(fofid), number=as.integer(number))

Each row corresponds to an object; the column number carries the identifier for the object, and the column fofid carries the number of the cluster in which that object has been clustered. We have 61304 objects in this file, and they have been grouped into 18719 clusters.

Analysis of clusters

clusters = d %>% group_by(fofid) %>% summarize(nobj=n())
head(clusters)
fofid nobj
0 1
1 13
2 1
3 3
4 1
5 1

The distribution of nobj is extremely heavy-tailed. A logarithmic box-and-whisker plot shows this well.

bwplot(~nobj, clusters, 
       xlab="Number of objects per cluster",
       scales=list(x=list(log=10, equispaced=FALSE)))

A summary of nobj is interesting:

summary(clusters$nobj)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 1 3.274961 2 16064

Finally, here are all the clusters with nobj > 100:

clusters %>% 
  filter(nobj>100) %>%
  xyplot(nobj~fofid, .,
         xlab = "cluster ID",
         ylab = "Number of objects per cluster",
         scales=list(y=list(log=10, equispaced=FALSE)))

Frequency of different cluster sizes

freq <- clusters %>% group_by(nobj) %>% summarize(n = n())
freq
nobj n
1 12189
2 2911
3 1300
4 662
5 407
6 242
7 182
8 151
9 121
10 62
11 69
12 54
13 31
14 40
15 35
16 26
17 17
18 21
19 10
20 18
21 11
22 9
23 12
24 10
25 7
26 8
27 9
28 7
29 6
30 7
31 4
32 3
33 2
34 1
35 7
36 4
37 2
38 3
39 1
40 4
41 3
42 2
45 2
46 1
47 2
48 2
49 3
50 1
51 1
53 1
54 3
55 1
56 2
58 1
61 2
62 2
64 1
65 1
68 2
69 1
73 1
84 2
88 2
96 1
97 1
98 1
102 1
108 1
132 1
142 1
182 1
217 1
227 1
249 1
387 1
429 1
544 1
16064 1

A log-log plot shows how close this is to a power law distribution:

logscale <- list(log=10,equispaced=FALSE)
loglog <- list(x = logscale, y = logscale)
xyplot(n~nobj, freq, scales=loglog, grid=TRUE)