Introduction

High-quality astronomical images delivered by modern ground-based and space observatories demand adequate, reliable software for their analysis and accurate extraction of sources, filaments, and other structures, containing massive amounts of detailed information about the complex physical processes in space. The multi-wavelength observations with highly variable angular resolutions across wavebands require extraction tools that preserve and use the invaluable high-resolution information. Complex fluctuating backgrounds and filamentary structures appear differently on various scales, calling for multi-scale approaches for complete and reliable detection of sources and filaments. Availability of uncalibrated extraction tools with widely varying qualities highlights the need to use standard benchmarks for choosing the most reliable and accurate method for the astrophysical research.

These pages present getsf, a new method for extracting sources and filaments in astronomical images using separation of their structural components, designed to handle multi-wavelength sets of images and very complex filamentary backgrounds. The method spatially decomposes the original images and separates the structural components of sources and filaments from each other and from their backgrounds, flattening their resulting images. It spatially decomposes the flattened components, combines them over wavelengths, and detects the positions of sources and skeletons of filaments. Finally, getsf measures the detected sources and filaments and creates the output catalogs and images. This universal and fully automated method has a single user-definable free parameter, which reduces to a minimum dependence of its results on the human factor.

These pages describe a realistic multi-wavelength set of benchmark images for testing and comparing source and filament extraction methods to enable the selection of the most reliable and accurate method. The images include a complex fluctuating background cloud, a long dense filament, and many starless and protostellar cores. The benchmark enables conclusive comparisons according to the extraction completeness, reliability, and goodness, as well as the detection and measurement accuracies and the overall quality. Multi-wavelength source and filament extractions with getsf using three variants of the new benchmark with increasing complexity levels show that the new method is superior to older methods.

An improved algorithm for the derivation of high-resolution surface density images from the multi-wavelength far-infrared and submillimeter continuum imaging is shown to create the densities as sharp as the shortest-wavelength image of a sufficient quality, with the angular resolution reaching 5.6" when using the Herschel 70 μm image. If the image is noisy, contaminated by the PAH emission, or unusable for other reasons, the highest resolution is defined by the 100 or 160 μm images (6.8-11.3"). Such high resolution is useful for the detailed studies of the enormous structural diversity in space, hence, for deeper understanding of the physical processes within the sub-structured filaments and their relation to the formation of stars.

Ophiuchus L1688 Star Formation

Next images illustrate the getsf performance on complex real-life images, it was applied to the Spitzer  λ = 8 μm image of the star-forming region L 1688 in Ophiuchus with a 6" resolution. The image displays quite complex intensity distribution and background varying by more than two orders of magnitude and sources situated in both faint and bright background areas. The 8 μm image was downloaded from the Spitzer Heritage Archive (c2d Legacy Program, PI N. Evans II).

Observed image at 8 microns

Component of sources

Background of sources

Component of filaments

Background of filaments

Extracted sources

Cygnus X Star Formation

Illustration of the new algorithm for the derivation of high-resolution surface densities from multi-wavelength observations. The density was derived from 5 images (at 70–500 μm) obtained with the Herschel Space Observatory (HOBYS key project, PIs F. Motte, A. Zavagno, S. Bontemps).

High-res surface density

< Support Extraction Method >

Extraction Method

The method is universal, it works for any images with non-zero background or noise fluctuations, including the images obtained by ALMA and other interferometers. It also can be applied to the position-velocity cubes, if the latter are split into separate images along the velocity axis.

Main processing steps of getsf:

  1. Preparation of a complete set of images on the same grid of pixels and optional derivation of high-resolution surface densities and dust temperatures.
  2. Separation of the structural components of sources and filaments from each other and from their backgrounds using spatially decomposed images.
  3. Flattening of the residual noise and background fluctuations in the images of the separated components of sources and filaments.
  4. Combination of the flattened components of sources and filaments over several (or all) wavebands in their spatially decomposed images.
  5. Detection of sources (positions) and filaments (skeletons) in the combined images of the components in their spatially decomposed images.
  6. Measurements of the properties of detected sources and filaments, creation of the multi-wavelength output catalogs and images.

GETSF Flowchart

The getsf method has just a single, user-definable free parameter: the maximum size of the structures of interest to extract. Internal parameters of getsf have been carefully calibrated and verified in numerous tests using large numbers of diverse images (both simulated and real-life observed ones) to make sure that getsf works in all cases. This approach rests on the belief that high-quality extraction methods for scientific applications must not depend on the human factor. It is the responsibility of the creator of a numerical method to make it as general as possible and to minimize the number of free parameters as much as possible. An internal multi-dimensional parameter space of any complex numerical tool must never be delegated to the end user to explore, if the aim is to obtain consistent and reliable scientific results.

Preparation of images

High-res surface density

Separation of components

Spatial decomposition 4–1400"

Component of sources

Background of sources

Component of filaments

Background of filaments

< Introduction Numerical Code >

Numerical Code

The method has been developed as a Bash script getsf that executes a suite of the FORTRAN utilities doing most of the numerical computations. The Linux or macOS systems with ifort or gfortran can be used to install the code. To read and write FITS images, getsf uses the cfitsio library (Pence 1999); for resampling and reprojecting images, it uses swarp (Bertin et al. 2002); for convolving images, it uses the fast Fourier transform routine rlft3 (Press et al. 1992); for determining the source coordinates alpha and delta, it uses xy2sky from wcstools (Mink 2002); for a colored screen output, it uses the utility highlight (by André Simon).

The getsf code is extensively tested, automated, powerful, and flexible. Its latest version can be freely downloaded below and it is also available from the author upon request. When downloading getsf for the first time, users are requested to register.

  • GETSF 201124
  • User Guide 

GETSF Utilities and Scripts

The following list of the getsf utilities and scripts explains their purpose and functions. They are quite useful for command-line image manipulations, even if there is no need in a source or filament extraction. Their usage information is displayed when a utility is run without any parameter; they are sorted in a decreasing sequence of their general usability outside getsf.

modfits modify an image or its header in various ways: math transformations; profiling an image along a line; image segmentation; filament skeletonization; removal of connected clusters of pixels; addition or removal of border areas; correction of saturated or bad pixel areas; conversion of intensity units; changes of the header keywords; etc.
operate operate on two input images: addition, subtraction, multiplication, division; relative differencing; minimization or maximization; extension or expansion of masks; copying of an image header; computation of surface densities, temperatures, or intensities; etc.
imgstat display and/or save image statistical quantities; produce mode-, mean-, or median-filtered images; compute images of standard deviations, skewness, kurtosis; etc.
fftconv fast Fourier transform or convolve image with few predefined kernels or an external kernel image
fitfluxes fit spectral shapes of source fluxes or image pixel intensities to derive masses or surface densities
convolve convolve an image to a desired lower resolution
resample resample and re-project an image with optional rotation
highres derive high-resolution densities and temperatures
prepobs convert all observed images to the same grid of pixels
installg install getsf on a computer system (macOS, Linux)
iospeed test I/O speed of a hard drive for a specific image
readhead display an image header or save selected keywords
cleanbg interpolate background under footprints of sources
ellipses overlay an image with ellipses of extracted sources
sfinder detect sources in combined clean single-scale images
smeasure measure and catalog properties of detected sources
fmeasure measure and catalog properties of detected filaments
finalcat produce the final multi-wavelength catalogs of sources
expanda expand masked areas of an image to its edges
extractx extract all FITS extensions into separate images
splitcube split a FITS data cube into separate images

< Method Benchmark >

Benchmarks

The multitude of source extraction methods that have been applied in different astrophysical studies over the last decade is an indication of an unsatisfactory situation in this critically important area between the observations and their analysis and interpretation. Sometimes quite outdated extraction tools are used in this era of space observatories, despite the fact that they were developed for completely different, lower-quality observations. Although the source or filament extraction methods are usually tested before publication, the arbitrary and completely independent tests with different components and complexity levels are incomparable and rarely resemble the reality observed with the modern space telescopes. No systematic studies have been published that would compare different methods on the same set of realistic images to guide researchers in their selection of the most appropriate tool for their studies. A detailed comparison of 9 extraction methods that was referred to by Men'shchikov et al. (2012) remains unpublished. Source and filament extraction methods are the critically important tools that must be calibrated and validated using the same set of benchmark images with fully known properties of all components, before their astrophysical applications. Without a careful selection of the best method using standard benchmarks, independent studies of an observed region with uncalibrated extraction methods of different (unknown) quality are likely to give incompatible results.

The new multi-wavelength benchmark, contains simulated Herschel images of a dense background cloud with strong nonuniform fluctuations, a wide dense filament with a power-law intensity profile, and hundreds of radiative transfer models of starless and protostellar cores with wide ranges of sizes, masses, and profiles. The simulated benchmark, in which parameters of all components are precisely known, allows quantitative analyses of extraction results and conclusive comparisons of different methods and their extraction completeness, reliability, and goodness, along with the detection and measurement accuracies. The benchmark can also serve as a standard benchmark problem for other source and filament extraction methods (whose qualities differ quite widely), allowing researchers perform their own tests of available methods to choose the most reliable and accurate one for their studies. Therefore, the benchmark images, together with the truth catalogs and the extraction results discussed in this paper, are made available to anyone who wants to get their own comparisons and conclusions; new future methods can also be calibrated using the benchmark and compared with the quality of the older methods.

Benchmark A

Benchmark A was created by Men'shchikov et al. (2012) to develop the getsources extraction method and used in Men'shchikov (2017) to illustrate the getimages method for background subtraction and image flattening. This old benchmark contains three simulated components: background, 459 sources, and instrumental noise, and is referred to as A3. Although the background in A3 is not as complex and realistic as in Benchmark B, its sources are allowed to overlap. This makes it suitable for testing, how well a source extraction method is able to handle blended sources. Its simpler variant is A2 (no background).

The compressed archives below contains the simulated images at 75, 110, 170, 250, 350, and 500 μm (resolutions of 5, 7, 11, 17, 24, and 35"), the surface density images (resolution of 11"), and the truth catalogs with parameters of all model sources.

Background

Sources

Background+Sources

Benchmark B

The new Benchmark B, used to develop getsf, features a dense and highly variable background cloud, a dense spiral filament with its markedly anisotropic properties, 919 sources with a wide range of fluxes, sizes, and intensity profiles, and uniform noise. In contrast to Benchmark A, the sources do not overlap. Benchmark B with 4 components (background, filament, sources, and noise) is referred to as B4. Simpler variants include B3 (no filament) and B2 (no filament, no background).

The simplest Benchmark B2 must not present any problems to those extraction methods that do not set any upper limit on sizes of extractable sources and do not assume Gaussian shapes of their intensity profiles. Only the faintest sources are expected to be lost within the noise fluctuations in B2 and measurements of sources are expected to be accurate, although for the faintest sources inaccuracies must somewhat increase due to the noise fluctuations.

The more complex Benchmark B3 could present serious problems to those extraction methods that are not explicitly designed to handle complex backgrounds. The background fluctuations are strongly nonuniform and they progressively increase in the denser background area, resembling the observed clouds. In comparison with B2, this means that more of the sources are expected to remain undetected in B3 and possibly more spurious sources to become cataloged.

The most complex Benchmark B4 with the dense filamentary background better resembles the extreme complexity of the interstellar clouds revealed by observations, further complicating the source extraction problem. It can be expected that a larger number of sources would vanish in the background in B4 and that measurements of the sources would become less accurate than in the simpler B3 and B2.

The compressed archives below contain the simulated images at 70, 100, 160, 250, 350, and 500 μm (resolutions of 8.4, 9.4, 13.5, 18.2, 24.9, and 36.3"), the surface density images (resolution of 13.5"), and the truth catalogs with parameters of all model sources.

Background

Filament

Background+Filament

Sources

Background+Filament+Sources

< Numerical Code Support >

Support

The author supports getsf users and helps them get started, with the usual response times ranging from minutes to hours. Please note that the code may become updated in the future and that only the most recent version of getsf is recommended. Use of the outdated versions of the code is the responsibility of the user. If apparently abnormal results are found and the user needs assistance in finding the reasons, the author needs a clear description of the problem with all relevant details and log files of the runs. It may also be necessary to re-run the problematic extraction with the latest version.

< Benchmark Introduction >

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}
                    
print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form