First, I copied the file of interest to my laptop. The file size is less than 1 MB. Using the (modified) FITSio package, it is easy to read the file.
filename <- "DES0241-3457_r2674p01-y3a1-main-mof-001-nbrsfofs.fits"
fitsfile <- file(description = filename, open = "rb")
readFITSheader(fitsfile) # Read and dispose of first HDU
## [1] "SIMPLE = T / file does conform to FITS standard "
## [2] "BITPIX = 16 / number of bits per data pixel "
## [3] "NAXIS = 0 / number of data axes "
## [4] "EXTEND = T / FITS dataset may contain extensions "
## [5] "COMMENT FITS (Flexible Image Transport System) format is defined in 'Astronomy"
## [6] "COMMENT and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H "
header <- readFITSheader(fitsfile) # The second HDU contains the table
fitslist <- readFITSbintable(fitsfile, header)
close(fitsfile)
rm(fitsfile)
str(fitslist)
## List of 8
## $ col :List of 2
## ..$ :integer64 [1:61304] 0 1 2 74 3 4 5 6 ...
## ..$ :integer64 [1:61304] 1 2 3 4 5 6 7 8 ...
## $ hdr : chr [1:30] "XTENSION" "BINTABLE" "BITPIX" "8" ...
## $ colNames: chr [1:2] "fofid" "number"
## $ colUnits: chr [1:2] "" ""
## $ TNULLn : int [1:2] NA NA
## $ TSCALn : num [1:2] 1 1
## $ TZEROn : num [1:2] 0 0
## $ TDISPn : chr [1:2] "" ""
Now we can make the data.frame we want:
d <- as.data.frame(fitslist$col)
names(d) <- fitslist$colNames
str(d)
## 'data.frame': 61304 obs. of 2 variables:
## $ fofid :integer64 0 1 2 74 3 4 5 6 ...
## $ number:integer64 1 2 3 4 5 6 7 8 ...
summary(d)
| fofid | number | |
|---|---|---|
| Min. : 0 | Min. : 1 | |
| 1st Qu.: 7632 | 1st Qu.:15327 | |
| Median :13876 | Median :30652 | |
| Mean :12462 | Mean :30652 | |
| 3rd Qu.:18541 | 3rd Qu.:45978 | |
| Max. :18718 | Max. :61304 |
The summary shows that the integer64 columns are never carrying large values, and can safely be turned into integer (32-bit) columns, which are handled better by R.
d <- mutate(d, fofid=as.integer(fofid), number=as.integer(number))
Each row corresponds to an object; the column number carries the identifier for the object, and the column fofid carries the number of the cluster in which that object has been clustered. We have 61304 objects in this file, and they have been grouped into 18719 clusters.
clusters = d %>% group_by(fofid) %>% summarize(nobj=n())
head(clusters)
| fofid | nobj |
|---|---|
| 0 | 1 |
| 1 | 13 |
| 2 | 1 |
| 3 | 3 |
| 4 | 1 |
| 5 | 1 |
The distribution of nobj is extremely heavy-tailed. A logarithmic box-and-whisker plot shows this well.
bwplot(~nobj, clusters,
xlab="Number of objects per cluster",
scales=list(x=list(log=10, equispaced=FALSE)))
A summary of nobj is interesting:
summary(clusters$nobj)
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 3.274961 | 2 | 16064 |
Finally, here are all the clusters with nobj > 100:
clusters %>%
filter(nobj>100) %>%
xyplot(nobj~fofid, .,
xlab = "cluster ID",
ylab = "Number of objects per cluster",
scales=list(y=list(log=10, equispaced=FALSE)))
freq <- clusters %>% group_by(nobj) %>% summarize(n = n())
freq
| nobj | n |
|---|---|
| 1 | 12189 |
| 2 | 2911 |
| 3 | 1300 |
| 4 | 662 |
| 5 | 407 |
| 6 | 242 |
| 7 | 182 |
| 8 | 151 |
| 9 | 121 |
| 10 | 62 |
| 11 | 69 |
| 12 | 54 |
| 13 | 31 |
| 14 | 40 |
| 15 | 35 |
| 16 | 26 |
| 17 | 17 |
| 18 | 21 |
| 19 | 10 |
| 20 | 18 |
| 21 | 11 |
| 22 | 9 |
| 23 | 12 |
| 24 | 10 |
| 25 | 7 |
| 26 | 8 |
| 27 | 9 |
| 28 | 7 |
| 29 | 6 |
| 30 | 7 |
| 31 | 4 |
| 32 | 3 |
| 33 | 2 |
| 34 | 1 |
| 35 | 7 |
| 36 | 4 |
| 37 | 2 |
| 38 | 3 |
| 39 | 1 |
| 40 | 4 |
| 41 | 3 |
| 42 | 2 |
| 45 | 2 |
| 46 | 1 |
| 47 | 2 |
| 48 | 2 |
| 49 | 3 |
| 50 | 1 |
| 51 | 1 |
| 53 | 1 |
| 54 | 3 |
| 55 | 1 |
| 56 | 2 |
| 58 | 1 |
| 61 | 2 |
| 62 | 2 |
| 64 | 1 |
| 65 | 1 |
| 68 | 2 |
| 69 | 1 |
| 73 | 1 |
| 84 | 2 |
| 88 | 2 |
| 96 | 1 |
| 97 | 1 |
| 98 | 1 |
| 102 | 1 |
| 108 | 1 |
| 132 | 1 |
| 142 | 1 |
| 182 | 1 |
| 217 | 1 |
| 227 | 1 |
| 249 | 1 |
| 387 | 1 |
| 429 | 1 |
| 544 | 1 |
| 16064 | 1 |
A log-log plot shows how close this is to a power law distribution:
logscale <- list(log=10,equispaced=FALSE)
loglog <- list(x = logscale, y = logscale)
xyplot(n~nobj, freq, scales=loglog, grid=TRUE)