False alarm probability of coincidences

A general formula for probability that is used in the estimation of significance of the coincidences is implemented. The code is available at github. Run git clone https://github.com/mbejger/polgraw-allsky.git to get the repository.

Prerequisites

The code is written in standard C. The only dependency is GNU Scientific Library (GSL), used to manipulate the Fisher matrix (calculate the eigenvectors and eigenvalues), the function and for the combinations.

Theoretical description

This description is a short Appendix version of Implementation of an F-statistic all-sky search for continuous gravitational waves in Virgo VSR1 data.

For a given frequency band we analyze non-overlapping time segments: the search in the th segment produces candidates. The size of the parameter space for each time segment is the same and it can be divided into the number of independent cells. The code tests the null hypothesis that the coincidences among candidates from segments are accidental. The probability for a candidate event to fall into any given coincidence cell is equal to . The probability that a given coincidence cell is populated with one or more candidate events is given by For two or more candidates within a given cell we choose the one with the highest signal-to-noise ratio. The probability that any given coincidence cell out of the total of cells contains candidate events from or more distinct data segments is given by a generalized binomial distribution: where is the sum over all permutations of data sequences. Finally the probability that there is or more coincidences in one or more of the cells is

In order to find coincidences the entire cell coincidence grid is shifted by half a cell width in all possible combinations of the four parameter-space dimensions of , and coincidences are searched in all the 16 coincidence grids. It does not account for cases when candidate events are located on opposite sides of cell borders, edges, and corners. This leads to a higher number of accidental coincidences, and consequently it underestimates the false alarm probability.

In the four dimension parameter space of the formula for the probability that there are or more independent coincidences in one or more of the cells in all 16 grid shifts is

By choosing a certain false alarm probability , we can calculate the threshold number of coincidences. If we obtain more than coincidences, the null hypothesis that coincidences are accidental is rejected at the significance level of .

Compilation

Run make fap; resulting binary is called fap (modify the Makefile to fit your system).

Full list of switches

To obtain the full list of options, type

% ./fap --help 
Switch Description
-band Band number
-cellsize Cell size (default value: 4)
-data Coincidence summary file
-grid Grid matrix directory (default value: .)
-dt Data sampling time dt (default value: 2)
-threshold FAP threshold (default value: 0.1)
-nod Number of days
-vetofrac Vetoed fraction of the band (default value: 0)

Also:

--help This help

Example

Using the software injection added to 2-day Gaussian noise data segments (see minimal example of the pipeline):

% ./fap -nod 2 -band 1234 -data <(sort -gk5 -gk10 summary | tail -1) -grid ../../testdata/2d_0.25/004 -vetofrac 0.0 -cellsize 4 -threshold 1.0 

or, with the auxilary fap.sh script,

% band=1234; bash fap.sh <(sort -gk5 -gk10 summary | tail -1) <(echo $band 0.0) ../../testdata/2d_0.25/004 

Number of days in the time segment nod equals 2, fraction of the band vetoed vetofrac is 0 (no lines, Gaussian data) and the cell size scalling factor cellsize is 4. Directory containing the grid matrix file grid.bin of the reference frame (in this case frame 004) should be given by the grid switch. The input data is the last line of a sorted summary file to select the shift giving the best coincidence with the highest signal-to-noise ratio:

1234_2 1111 308.859375     8     5  9.95663703e-01 -1.10830358e-09 -1.12585347e-01 1.97463002e+00 1.246469e+01 5 2040 1987 1 2483 2419 4 2384 2193 3 2247 2137 8 2408 2363 2 2249 2172 6 2305 2220 7 2226 2191 6 2 8 3 5

(see the coincidences section for details).

Output

% ./fap -nod 2 -band 1234 -data <(sort -gk5 -gk10 summary | tail -1) -grid ../../testdata/2d_0.25/004 -vetofrac 0.0 -cellsize 4 -threshold 1.0 

is

Number of days in time segments: 2
Input data: /dev/fd/63
Grid matrix data directory: ../../testdata/2d_0.25/004
Band number: 1234 (veto fraction: 0.000000)
The reference frequency fpo: 308.859375
The data sampling time dt: 2.000000
FAP threshold: 1.000000
Cell size: 4
1234 3.088594e+02 3.091094e+02 7.665713e-08 5 17682 9.956637e-01 -1.108304e-09 -1.125853e-01 1.974630e+00 1.246469e+01 2

The last line (in case the probability PFshifts is lower than the threshold) is printed to stderr. The meaning of this output is the following:

#band f_min       f_max        PFshifts     noc Nkall  f s d a hemisphere 
1234 3.088594e+02 3.091094e+02 7.665713e-08 5   17682  9.956637e-01 -1.108304e-09 -1.125853e-01 1.974630e+00 1.246469e+01 2