Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://hea-www.harvard.edu/AstroStat/etc/Wong+2014_AOAS_8_1690.pdf
Äàòà èçìåíåíèÿ: Fri Oct 24 21:25:43 2014
Äàòà èíäåêñèðîâàíèÿ: Sun Apr 10 11:56:27 2016
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: crab nebula
The Annals of Applied Statistics 2014, Vol. 8, No. 3, 1690­1712 DOI: 10.1214/14-AOAS750 © Institute of Mathematical Statistics, 2014

AUTOMATIC ESTIMATION OF FLUX DISTRIBUTIONS OF ASTROPHYSICAL SOURCE POPULATIONS BY R
AYMOND K. T HOMAS

W. W ONG , PAU L BAINES , A C. M. L EE, 2 AND V INAY L. K

LEXANDER ASHYAP , 3


A

UE , 1

,

University of California, Davis and Harvard­Smithsonian Center for Astrophysics

In astrophysics a common goal is to infer the flux distribution of populations of scientifically interesting objects such as pulsars or supernovae. In practice, inference for the flux distribution is often conducted using the cumulative distribution of the number of sources detected at a given sensitivity. The resulting "log(N > S )­log(S )" relationship can be used to compare and evaluate theoretical models for source populations and their evolution. Under restrictive assumptions the relationship should be linear. In practice, however, when simple theoretical models fail, it is common for astrophysicists to use prespecified piecewise linear models. This paper proposes a methodology for estimating both the number and locations of "breakpoints" in astrophysical source populations that extends beyond existing work in this field. An important component of the proposed methodology is a new interwoven EM algorithm that computes parameter estimates. It is shown that in simple settings such estimates are asymptotically consistent despite the complex nature of the parameter space. Through simulation studies it is demonstrated that the proposed methodology is capable of accurately detecting structural breaks in a variety of parameter configurations. This paper concludes with an application of our methodology to the Chandra Deep Field North (CDFN) data set.

1. Introduction. The relationship between the number of sources and the threshold at which they can be detected is an important tool in astrophysics for describing and investigating the properties of various types of source populations. Known as the log N ­log S relationship, the idea is to use the number of sources N(> S ) that can be detected at a given sensitivity level S , on the log­log scale, to describe the distribution of source fluxes. In simple settings and under restrictive assumptions a linear relationship between the log-flux and the log-survival function can be derived from first principles. Traditionally, astrophysicists have therefore examined this relationship by characterizing the slope of the log of the empirical survival function as a function of the log-flux of the sources.
Received May 2013; revised April 2014.
1 Supported in part by NSF Grants 1209226 and 1305858. 2 Supported in part by NSF Grants 1007520, 1209226 and 1209232. 3 Supported in part by NASA Contract NAS8-03060 to the Chandra X-Ray Center.

Key words and phrases. Broken power law, CDFN X-ray survey, interwoven EM algorithm, likelihood computations, log N ­log S , Pareto distribution.

1690


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1691

One of the first examples of the log N ­log S relationship being derived from first principles is in Scheuer (1957). It is shown that if radio stars are uniformly distributed in space, then the number with intensity exceeding a threshold S is given by N(> S ) S -3/2 . Importantly, the relationship holds irrespective of several factors such as luminosity dispersion and the reception pattern of the detector. The derived relationships therefore allow for researchers to test for departures from specific theories. For example, Hewish (1961) uses the derived relationship to infer a nonuniform distribution of sources for a particular population. Other examples of log N ­log S analyses include Guetta, Granot and Begelman (2005), who use the relationship for Gamma Ray Bursts (GRBs) to constrain the structure of GRB jets. By comparing the log N ­log S relationship for observed data to the predicted log N ­log S relationship under different physical models for GRB jets, the authors are able to uncover limitations in the physical models. The log N ­log S curves have also been used to constrain cosmological parameters using cluster number counts in different passbands; see, for example, Mathiesen and Evrard (1998)and Kitayama, Sasaki and Suto (1998). Other applications of log N ­ log S modeling include the study of active galactic nuclei (AGNs). For example, Mateos et al. (2008)use the log N ­log S relationship over different X-ray bands to constrain the population characteristics of hard X-ray sources. Under independent sampling, the linear log N ­log S relationship corresponds to a Pareto distribution for the source fluxes, known to astrophysicists as a power-law model. Despite the unrealistic assumptions in the derivation, the linear log N ­log S relationship does have strong empirical support in a variety of contexts, for example, Kenter and Murray (2003). In addition to its simplicity, the power-law model also retains a high degree of interpretability, with the power-law exponent often of direct scientific interest. As a result of this simplicity and interpretability, the power-law model forms the basis of most log N ­log S analyses despite its many practical limitations in the ability to fit more complex data sets. To address the limitations of this simple model, astrophysicists have also experimented with a variety of broken power-law models. This is particularly important for larger populations or populations of sources spread over a wide energy range. Mateos et al. (2008) illustrate this by using both a two- and three-piece broken power-law model to capture the structure of the log N ­log S distribution across a wide range of energies. The basic idea of broken power-law models is to relax the assumption that the log survival function is a linear function of the log flux, and to instead assume a piecewise linear function. This adds additional challenges in estimating the location of the breakpoint and quantifying the need for the breakpoint model above the simpler single power-law model. While recognizing the need to have more flexible models for log N ­log S analyses, most of the work in this area does not provide a coherent means to selecting the location and number of breakpoints. Similarly to the single power-law model, the broken power-law model can be derived from first principles as a mixture of truncated and untruncated Pareto distributions. The direct physical plausibility of the model is not as complete as for


1692

R. K. W. WONG ET AL.

the single power-law model, but the model parameters, in particular, the slopes of the log N ­log S relationship, can be used to draw conclusions about competing theories. The broken power law provides a useful approximation that can be used to model mixtures of populations of sources, as well as more general piecewiselinear populations. Indeed, the broken power law has empirical support in a variety of contexts both in astrophysics [Kouzu et al. (2013), Mateos et al. (2008)] and outside [Segura, Lazzati and Sankarasubramanian (2013)]. There are many alternative generalizations of the single power law in addition to the broken power law considered in this paper. For example, Ryde (1999) considers a smoothly broken power-law model that avoids the nondifferentiability introduced by the strict broken power-law model. Other alternatives include mixtures of log-normal distributions and power laws with modified tail behavior. In addition to parametric methods, the flux distribution can also be modeled nonparametrically. For the types of applications we are considering here, the main goal is parameter estimation and model selection to distinguish between single and broken power-law models. The scientific interpretability of a nonparametric model for the log(N > S )­log(S ) relationship is more complicated than the parametric alternative, and such approaches have gained less traction in the astrophysics community in the context of log N ­log S analyses. Therefore, while a more flexible nonparametric fit is perhaps statistically preferable, it is not as amenable to downstream science as in other contexts where the goal is prediction rather than estimation. Among all generalizations, the strict broken power law remains the most popular alternative. This popularity is a result of the interpretability of the model and the ease of translation from statistical results to scientific interpretability. Despite the popularity of the broken power-law model in the log N ­log S literature, there is currently no widely applicable and statistically rigorous method framework for fitting broken power-law models to the log N ­log S relationship to astrophysical source populations. In this paper we provide an automatic method for jointly inferring the number and location of breakpoints and the parameters of interest for the log N ­log S problem. Our method allows astrophysicists to reliably infer both the number and the location of breakpoints in the log N ­log S relationship in a statistically rigorous manner for the first time. This simultaneous fitting introduces new computational challenges, so our method utilizes a new extension of the EM algorithm, known as the interwoven EM algorithm (IEM) [Baines (2010), Baines, Meng and Xie (2014)]. The IEM algorithm provides efficient and stable estimation of the model parameters across a wide range of parameter settings for a fixed number of breakpoints. To determine the number of breakpoints, we then use an additional model selection procedure that employs the power posterior technique of Friel and Pettitt (2008) to accurately compute the log-likelihood of the candidate models. The remainder of the paper is organized as follows. In Section 2 we introduce the necessary background and statistical formulation of the log N ­log S model.


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1693

Section 3 provides details of our estimation procedure for a fixed number of breakpoints, with Section 4 outlining our model selection procedure to determine the number of breakpoints required. The performance of our method in terms of both parameter estimation and identification of the number of breakpoints is detailed in Section 5. An application to data from the Chandra Deep-Field North X-ray survey is provided in Section 6. Large-sample theory is developed in Section 7 and concluding remarks are offered in Section 8. Last, technical details are given in an online supplement [Wong et al. (2014)]. 2. Background and problem specification. Let S = (S1 ,...,Sn )T denote a vector of the fluxes (in units of erg s-1 cm-2 ) of each of a population of n astrophysical sources. For example, we may be interested in the flux distribution of a selection of n X-ray pulsars located in a specified region of sky at a specified distance. The basic building block of our method is the power-law model:
n

(1)

N(> S ) =
i =1

I

{Si >S }

S

-

,

S > .

This specifies that the unnormalized survival function N(> S ) is approximately a power of the flux S . The power-law exponent, , is the parameter of primary interest and provides domain specific knowledge about the source populations. The lower threshold can either be fixed according to the desired sensitivity level or estimated from the data. Equivalently, taking the logarithm of both sides, (1) assumes a linear relationship between log(N (> S )) and log(S ): (2) log N(> S ) log( ) - log(S ), S > .

In a statistical context, the theoretical power-law assumption corresponds to assuming that the source fluxes follow a Pareto distribution: Si Pareto( , ),
i.i.d.

i = 1,...,n.

In practice, the linear log N ­log S , or Pareto, assumption is not sufficient to describe the log N ­log S relationship for many real data sets. There are several ways to generalize (1), the most popular among astrophysicists being the broken powerlaw model as illustrated in JordÀn et al. (2004) and Cappelluti et al. (2007). The starting point of the broken power law is to replace (1) with a monotonically decreasing piecewise linear approximation. In the case of a two-piece model we assume (3) log N(> S ) = log(1 ) - 1 log(S ), log(2 ) - 2 log(S ), 1 2 ,

where 1 and 2 are parameters of interest. Note that as a result of the continuity and normalization constraints on 1 ,2 ,1 ,2 ,1 and 2 , there are a total of 4 free


1694

R. K. W. WONG ET AL.

F IG . 1 . Example simulations of flux distributions under the single power-law model (left), and the broken power-law model (right). In practice, the fluxes are not directly observed and must be inferred from count data as described in Section 2. For the broken power-law example, the vertical blue line corresponds to the location of the breakpoint.

parameters in this expanded two-piece model. Applications of the broken powerlaw model in the astrophysics community typically use either fixed numbers and locations of the breakpoint(s) or selection via ad hoc procedures [Trudolyubov et al. (2002)]. The contribution of this paper is the proposal of an automatic procedure for selecting the number and estimating the locations of the breakpoints jointly with the parameters of interest. Figure 1 depicts the log N ­log S relationship for flux distributions simulated under a single power-law (left) and broken power-law model (right). As may be expected, even under a theoretically linear relationship, the empirical log N ­log S -curve regularly exhibits nonlinear features in the log N ­log S -space. Depending on the difference in the power-law slopes, the breakpoint may be clearly visible or indistinguishable by eye. In either case, it should be noted that much larger variations in the log N ­log S relationship are to be expected in the lower right part of the curves as a result of the log­log scaling. As will be seen in Section 2.1, the task of estimating the parameters controlling the flux distribution and/or detecting a breakpoint is additionally challenging because the fluxes depicted in Figure 1 are not directly observed. 2.1. Hierarchical modeling of the log N ­log S relationship. We now the connection between the broken power-law model introduced in (3) observed data. In practice, the flux of each source, Si , is not observed Instead, we observe a Poisson-distributed photon count whose intensity is function of the parameter Si .Let Y1 ,Y2 ,...,Yn denote the source counts, describe and the directly. a known then we


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1695

assume the following hierarchical model. For i = 1,...,n, (4) Yi |S1 ,...,Sn Poisson(Ai Si + bi ) Si ParetoB ( , ), where Ai 's and bi 's are known constants (see below), (1 ,...,B ) such that B > ··· >1 > 0, and ParetoB Pareto distribution with survival distribution 1, 1 1 , x 1 2 1 2 , SB (x ) = 2 x
B -1
j =1 i.i.d. indep.

and

= (1 ,...,B )> 0, = ( , ) represents a B -piece x < 1 , 1 x < 2 , 2 x < 3 , . . . x B ,



j



j

j +1

B x



B

,

and thus its distribution function FB (·) = 1 - SB (·). Note that the B -piece Pareto distribution corresponds to the broken power law. The probability density fB can be easily found by differentiation. When B = 1, the B -Pareto distribution reduces to a Pareto distribution with probability density function f1 (x ; , ) =


, +1 x 0,

x , x < .

In the above Ai 's, sometimes known as effective areas, represent sensitivities of the detector, while bi 's represent background intensities. With the above model the goal is then to estimate B and, at the same time, and . At first sight, this seems to be a straightforward statistical problem: for a fixed B , maximum likelihood estimation can be used to estimate and , while the issue of choosing B can be viewed as a model selection problem and, thus, traditional ideas such as AIC and BIC can be used. However, as to be seen below, practical implementation of these ideas poses serious computational challenges that cannot be easily solved. 3. Maximum likelihood estimation when B is known. In this section we provide details of how to obtain maximum likelihood estimates of and for a fixed number of breakpoints B in the log N ­log S model. Defining 0 = 0, 0 = 1 and B +1 =, the likelihood is
n

L( , ; Y1 ,...,Yn ) =
i =1

e-(Ai s +bi ) 1

(Ai s + bi )Yi fB (s ; , )d s . Yi !


1696

R. K. W. WONG ET AL.

Note that the likelihood involves some numerically unstable integrals that do not have a closed-form solution and, hence, a direct maximization is extremely difficult. To further appreciate this difficulty, consider the case when there is no background contamination (bi = 0), for which the above likelihood degenerates to
n B



j -1



j -1

i =1 j =1



j

j (Ai j ) Yi !

j

(Yi - j ,Ai j ) - (Yi - j ,Ai

j +1

).

Here, (a, x ) = x t a -1 e-t dt is the incomplete gamma function which is numerically unstable, particularly when the first argument is large. Together with the inner summation in the above expression, these issues make a direct maximization of the (log-)likelihood difficult even when there is no background contamination. To address these issues, we propose an EM-algorithm [Dempster, Laird and Rubin (1977)] to find the maximum likelihood estimators of and for the general case of bi 0.

3.1. EM with a sufficient augmentation scheme. The EM algorithm [Dempster, Laird and Rubin (1977)] has long been popular for its monotone convergence and resulting stability, and is therefore well suited to our context. As always, the EM algorithm must be formulated in terms of "missing data" or auxiliary variables, that must be integrated out to obtain the observed data log-likelihood. For the current problem, since we are interested only in inference for and , marginalizing over the uncertainty in the individual fluxes, it is natural to treat S = (S1 ,...,Sn )T as the missing data. Since S is a sufficient statistic for = ( , )T , we call this the sufficient augmentation (SA) scheme in the terminology of Yu and Meng (2011). Let Y = (Y1 ,...,Yn )T . The complete data log-likelihood of (Y, S) is
n n

log p(Y, S; , ) =
i =1

log g(Yi ; Ai Si + bi ) +
i =1

log fB (Si ; , ),

where g(x ; ) is the probability mass function of a Poisson distribution with mean . In the E-step of the algorithm we compute the conditional expectation Q | (5)
(k )

= E log p(Y, S; )|Y;
n

(k ) (k )

=
i =1

E log g(Yi ; Ai Si + bi )|Yi ;
n

+
i =1

E log fB (Si ; )|Yi ;

(k )

,

where (k ) denotes the estimate of at the k th iteration. The M-step of the algorithm must then maximize Q( | (k ) ) with respect to . Since the first term of (5) does not depend on , it can be ignored in our maximization. For the second term,


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1697

Algorithm 1 SAEM: EM with the sufficient augmentation scheme (SAEM) (1) Choose a starting value (0) and set k = 0. (2) Generate S(1) ,..., S(Nsim ) from p(S|Y; (k ) ) using the following Metropolis­Hastings algorithm. For each simulation of S, we sample the elements of S one at a time. Suppose S = (S1 ,...,Sn ) is the current draw. Denote S = (S1 ,...,Sj -1 ,Sj ,Sj +1 ,...,Sn ), where Sj is drawn from ParetoB ( (k ) , (k ) ). We accept this S as new value with probability aj (S, S ); otherwise, we retain S. The acceptance probability is given by aj S, S


= min 1,

g(Yj ; Aj Sj + bj )

g(Yj ; Aj Sj + bj )

.
(k )

~ (3) Find the maximizer of the Monte Carlo estimate of Q( | equivalent to computing ~ = argmax


). This is

Nsim

1 - Nburn

Nsim

n

s =Nburn +1 i =1

log fB Si ; ,

(s )

~ where Nburn is the number of burn-in. As discussed above, can be obtained by the following steps: (a) set 1 = min{Si(s ) : i = 1,...,n,s = Nburn + 1,...,Nsim }, ~ (b) obtain 2 ,..., B as the maximizer of B=1 mj ( ) log j ( ), where = ~ ~ j (1 ,2 ,...,B ), using the Nelder­Mead algorithm, and ~ ~ ~ (c) set j = j ( ) using (6), for j = 1,...,B . ~ (4) Set (k +1) = . (5) Repeat steps (2) to (4) until convergence.

as it does not admit a closed-form expression, a Monte Carlo method is used to approximate it. The basic idea is to estimate it by the mean of a suitable Monte Carlo sample of the Si 's as described in Algorithm 1. Without the first term in (5), the maximization of Q( | (k ) ) is equivalent to finding the MLE of = ( , )T from an i.i.d. sample X = (X1 ,...,Xm ) from the ParetoB ( , ) distribution. The log-likelihood of X is
B B

l( ; X) =
j =1

j (nj log j - nj
B

+1

log
m

j +1

)+
j =1

mj log j

-
j =1

j
i Aj

log Xi -
i =1

log Xi ,


1698

R. K. W. WONG ET AL.

where nj = nB +1 log B +1 nj 's and mj 's that l( ; X) is (6)

card{i : Xi j }, nB +1 = 0, mj = nj +1 - nj , B +1 = , is defined to be 0, and Aj = {i : j Xi < j +1 }. Note that the are functions of . For any fixed , straightforward algebra shows maximized when j is set to log Xi + n
i Aj j +1

j ( ) = mj ( )

( ) log

j +1

- nj ( ) log

-1 j

,

j = 1,...,B . By substituting the above expression, l( ; X) becomes
m B

(7)

l( ; X) =-m -
i =1

log Xi +
j =1

mj ( ) log j ( ).

Therefore, to obtain the MLE for = ( , )T from X, one can first maximize ^ l( ; X) in (7) with respect to , and then plug the corresponding maximizer (i.e., ^ the MLE of ) into (6) to obtain the MLE for . ^ The MLE of 1 is 1 = min(X1 ,...,Xm ), while unfortunately the MLEs for 2 ,...,B do not admit closed-form expressions. Further, (7) is not a continuous function in and, therefore, traditional optimization methods that require function derivatives (e.g., Newton-like methods) cannot be applied here. We have experimented with various optimization algorithms and found that the Nelder­Mead algorithm works well for this problem. The major steps of the EM algorithm in the SA scheme (SAEM) for finding the MLEs of are given in Algorithm 1. In practice, the SAEM algorithm often converges very slowly. Section 3.4 below provides some illustrative numerical examples. 3.2. EM with an ancillary augmentation scheme (AAEM). Given the slow convergence of the SAEM algorithm, we seek faster alternatives. This subsection proposes an alternative EM algorithm that is based on an ancillary augmentation (AA) scheme, called the AAEM algorithm. For a discussion of augmentation schemes and their use in EM, see Baines, Meng and Xie (2014). The basis of our AAEM is to re-express our model using auxiliary variables Ui = FB (Si ; ): Yi |U1 ,...,Un Poisson Ai F
i.i.d. indep. -1 B

(Ui ; ) + bi

and

Ui Uniform(0, 1), for i = 1,...,n. Here U = (U1 ,...,Un ) is treated as the missing data and preserves the observed data log-likelihood. In the E-step we then calculate the conditional expectation
n

(8)

Q |

(k )

=
i =1

E log g Yi ; Ai F

-1 B

(Ui ; ) + bi |Yi ;

(k )

.


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1699

Algorithm 2 AAEM: EM with ancillary augmentation scheme (1) Choose a starting value (0) and set k = 0. (2) Generate U(1) ,..., U(Nsim ) from p(U|Y; (k ) ) using the Metropolis­ Hastings algorithm. For each simulation of U, we sample the element of U one by one. Let U = (U1 ,...,Un ) be the previous draw. We denote U = (U1 ,...,Uj -1 ,Uj ,Uj +1 ,...,Un ), where Uj is drawn from Uniform(0, 1). We as new value with probability b (U, U ); otherwise, we retain U. accept this U j The acceptance probability is given by bj U, U


= min 1,

g(Yj ; Aj F g(Yj ; F

-1 (k ) ) + b j B (Uj ; -1 (k ) ) + b ) j B (Uj ;

)

.
(k )

~ (3) Find the maximizer of the following Monte Carlo estimate of Q( | Nsim 1 - Nburn
Nsim n

):

log g Yi ; Ai F
s =Nburn +1 i =1

-1 B

Ui ; + bi .

(s )

The maximization can be done, for example, with the Nelder­Mead algorithm. ~ (4) Set (k +1) = . (5) Repeat steps (2) to (4) until convergence.

This conditional expectation can be approximated and maximized in a similar manner as for the Q( | (k ) ) in the SAEM algorithm. The resulting AAEM algorithm is summarized in Algorithm 2. Section 3.4 provides some empirical comparisons between the AAEM and SAEM algorithms. As may be expected, there are some situations where the AAEM algorithm converges faster, while there are other situations where the SAEM algorithm converges faster. 3.3. Interwoven EM (IEM). In practice, choosing the most efficient algorithm between the SAEM and AAEM requires knowledge of the unknown parameter values and the theoretical convergence rates, both of which are not available in most contexts. Therefore, it would instead be desirable if one could combine the "best parts" of SAEM and AAEM rather than select one of them. One simple way to combine the two algorithms is to use the so-called alternating EM (AEM) algorithm. The AEM algorithm proceeds by using SAEM for the first iteration, then uses AAEM for the second iteration, followed by SAEM for the third, and so on. While this procedure tends to "average" the performance of the two algorithms, a more sophisticated way to combine them is to use the interwoven EM (IEM) algorithm of Baines, Meng and Xie (2014). Theoretical and empirical results show that IEM typically achieves sizeable performance gains over the component EM algorithms. The key to the boosted performance of IEM is that it utilizes the joint structure of the two augmentation schemes through a special "IE-step." In contrast,


1700

R. K. W. WONG ET AL.

Algorithm 3 IEM: interwoven EM (1) Choose a starting value (0) and set k = 0. ~ (2) Execute steps (2) and (3) of the SAEM algorithm. Set (k +0.5) = . (l ) generated as U (l ) = (3) Execute step (3) of the AAEM algorithm, with U j (l ) ~ FB (Sj ; (k +0.5) ),for j = 1,...,n and l = Nburn + 1,...,Nsim .Set (k +1) = . (k +1) to be (4) If convergence is achieved or k attains Nlimit , then declare MLE; otherwise, set k = k + 1 and return to step (2). AEM simply performs sequential updates using each augmentation scheme that makes no use of this joint information. The theory of the IEM algorithm in Baines, Meng and Xie (2014) shows that the rate of convergence of IEM is dependent on the "correlation" between the two component augmentation schemes. Since the SA and AA schemes typically have low correlation, here we interweave these two schemes to produce an IEM algorithm for estimating the parameters of flux distributions. The IEM algorithm for our log N ­log S model is given in Algorithm 3. The algorithm requires very minimal computation in addition to the component SAEM and AAEM algorithms, so is comparable in real-time per-iteration speed. Last, we note that there is some freedom in how to combine the IEM algorithm with MC methods. Specifically, there are variations in how one may choose to implement step (3). One may want to sample U again instead of using the previous samples in step (2). In both cases, one obtains a sample from U|Y, (k +0.5) and achieves the goal. From our practical experience, we found that there is very little difference between the performances of these two approaches. Thus, we choose to use the one which is least computationally expensive. 3.4. An empirical comparison among different EM algorithms. In this subsection we empirically compare the convergence speeds of SAEM, AAEM, AEM and IEM by applying them to two simulated data sets. These two data sets were simulated from a model with B = 1 and no background contamination counts. This model is somewhat simple, but the advantage is that the likelihood function simplifies considerably, and the corresponding maximum likelihood estimates can be reliably obtained with non-EM methods. With these maximum likelihood estimates the maximized log-likelihood value can be calculated and used for baseline comparisons. In Figure 2(a), for the first simulated data set, we plot the negative log-likelihood values of the SAEM, AAEM, AEM and IEM estimates evaluated at different iterations. One can see the slow convergence speeds of SAEM and AAEM, with SAEM being the slower. Also, both AEM and IEM converged relatively fast, with IEM being the faster. When comparing to AEM, IEM utilizes the relationship between SAEM and AAEM at each step, which leads to the superiority of IEM. As


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1701

F IG . 2 . Plots of negative log-likelihood values for different EM algorithms. In each plot the horizontal dashed line indicates the negative log-likelihood evaluated at the maximum likelihood estimates.(a) Simulated data set 1, (b) simulated data set 2.

noted earlier, the convergence rate of IEM is heavily influenced by the "correlation" between the two data augmentation schemes being interwoven, that is, the SA and AA for this example. For the log N ­log S model the correlation between these augmentation schemes is hard to estimate exactly, but it appears empirically that the SA and AA have a reasonably high correlation, thus preventing IEM from outperforming AEM by a larger amount. This is likely due to , which controls the boundary of the parameter of the space and heavily impacts the rate of convergence. However, among the candidate algorithms IEM yields the best convergence properties. We repeat the same plot in Figure 2(b) for the second simulated data set. This time the relative speeds of SAEM and AAEM switched, that is, SAEM converged faster. This illustrates that neither SAEM nor AAEM is uniformly superior to the other across all data sets. The relative rate of convergence of AEM and IEM remain the same for these two data sets and across other simulated data sets (not shown). Overall, from these two plots one can see that the IEM algorithm is the most efficient and robust. Also, when comparing to AEM, it is computationally faster due to the skipping of an extra sampling step. Similar performance was observed across a wide range of simulation settings. Therefore, we recommend using the IEM algorithm to compute the maximum likelihood estimates when B is known. 4. Automated choice of B . This section addresses the selecting the number of "pieces," B , in the broken-Pareto m lem can be seen as a model selection problem, one can adopt such as AIC and BIC to solve it. To proceed, we first note important problem of odel. Since this probwell-studied methods that when B = 1, the


1702

R. K. W. WONG ET AL.

number of free parameters in the model is 2B . With AIC, the best B is chosen as ^ B
AIC

^^ = argmax AIC(B ) = argmax -2log L( , ; Y1 ,...,Yn ) + 4B ,
B B

while for BIC B is chosen as the minimizer of ^^ ^ BBIC = argmax BIC(B ) = argmax -2log L( , ; Y1 ,...,Yn ) + 2B log n .
B B

Despite the straightforward definitions, in practice, the numerical instability of the likelihood function makes computation of AIC(B ) and BIC(B ) very challenging. To address this problem, we adopt the so-called power posterior method proposed by Friel and Pettitt (2008) to approximate the log-likelihood directly. In our context, the power posterior is defined as pt (S|Y; ) p(Y|S)t p(S; ) In addition, define z(Y|t) =
Rn

for 0 t 1.

p(Y|s)t p(s; )d s

and, for simplicity, write the likelihood as p(Y) = L( , ; Y1 ,...,Yn ). The following equality is crucial to this method: log p(Y) = log z(Y|t = 1) = z(Y|t = 0)
1 0

E log p(Y|S) |Y; ,t dt ,

where the last expectation (inside the integral) is taken with respect to the power posterior pt (S|Y; ). The idea is as follows. First, for any given t , Monte Carlo methods can be applied to sample from the power posterior and approximate the expectation. Once a sufficient number of these expectations (corresponding to different values of t ) are calculated, numerical methods can be used to approximate the integral, which is the same as the log-likelihood. Since this method approximates the log-likelihood directly (i.e., without the computation of the likelihood), it is numerically quite stable. The detailed algorithm is presented as Algorithm 4. The above algorithm provides a reliable method for approximating the loglikelihood for a given value of . Then one natural question to ask is as follows: can we not simply obtain the MLE of by directly maximizing this log-likelihood approximation via, say, Newton's method? The answer, in principle, is yes, but the IEM algorithm is still preferred mainly because the estimates from IEM are generally more stable and reliable. Moreover, the power posterior approximation to the log-likelihood is computationally intensive if one wants to obtain an accurate estimate. For these reasons, we only use this power posterior approximation to estimate the log-likelihood evaluated at the MLE obtained by the IEM algorithm.


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1703

Algorithm 4 Power posterior method for log-likelihood calculation (1) Choose a starting value S(0) and set k = 0. (2) Set t = (k /Ngrid )c , where c controls the density of the grid values of t . It is typically set to 3 or 5 [see Friel and Pettitt (2008)]. (3) Generate S(1) ,..., S(Nsim ) from pt (S|Y; ) using the Metropolis­Hastings algorithm described in step (2) of the SAEM algorithm. Note that the acceptance probability becomes aj S, S


= min 1,

g(Yj ; Aj Sj + bj )

t

g(Yj ; Aj Sj + bj )
Nsim

.

(4) Estimate E[log{p(Y|S)}|Y; ,t ] with ^ lt = Nsim 1 - Nburn log p Y|S(s ) ; .
s =Nburn +1 N

sim (5) If k < Ngrid , set k = k + 1, S(0) = s =Nburn +1 S(s ) /(Nsim - Nburn ), and go to step (2). Otherwise, go to the next step. ^ (6) Given the lt 's, the log-likelihood log{p(Y)} can be approximated via any reliable numerical integration method.

5. Simulation experiments. Numerical experiments were conducted to evaluate the practical performance of the proposed methodology. Four experimental settings were considered: (1) B (2) B (3) B (4) B n = 500. = = = = 1, 2, 2, 3, = 5 â 10-17 , = (1 â 10-17 , 5 = (1 â 10-17 , 5 = (1 â 10-17 , = 1 and n = 100, â 10-17 )T , = (0.5, 3)T and n = 200, â 10-17 )T , = (0.5, 1.5)T and n = 200, 8 â 10-17 , 1.8 â 10-16 )T , = (0.3, 1, 3)T and

The parameter values of these settings were chosen to mimic the typical behavior of the real data. The effective areas and the expected background counts are set to Ai = 1019 and bi = 10, respectively, for all i . Two hundred data sets were generated for each experimental setting. For each generated data set, both AIC and BIC were applied to choose the value of B , and model parameters were estimated by the IEM algorithm. The selected values of B are summarized in Table 1. One can see that BIC works substantially better than AIC for selecting B , and while BIC occasionally overestimates B , there is a clear tendency for AIC to consistently overestimate B . Other crucial factors that determine the ability of our method to detect structural breaks in the population distribution include: (i) the sample size, (ii) the separation


1704

R. K. W. WONG ET AL. TABLE 1 ^ The number of pieces B selected by AIC and BIC ^ B 1 94 164 0 0 0 0 0 0 2 53 33 135 198 110 177 0 0 3 35 3 45 2 71 23 138 194 4 18 0 20 0 19 0 62 6

Experimental setting 1 2 3 4

Model selection method AIC BIC AIC BIC AIC BIC AIC BIC

between breakpoints, and (iii) the magnitude of the difference between the powerlaw slopes on adjacent segments. The impact of the third factor can be seen by comparing simulation results from settings (2) and (3), where the misclassification rate is seen to increase as the slopes become closer. From additional simulations our experience suggests that in typical settings a sample size of 200 or more is needed to reliably detect a single breakpoint, with double this required to detect two breakpoints. In simulations, true breakpoints can be detected for smaller sample sizes, but at a lower rate that is more dependent on the noise properties of the specific simulation. In addition to selecting the number of breakpoints, we also conducted a simulation to assess the quality of parameter estimation when using the IEM algorithm. ^ ^ For each experimental setting, we calculated the squared error (1 - 1 )2 of 1 ^ for all those data sets where B were correctly selected. We then computed the ^ average of all these squared errors, denoted as m.s.e.(1 ), and calculated the rela^ tive mean squared error m.s.e.(1 )/1 . Similar relative mean squared errors for ^ ^ other estimates in and were obtained in a similar manner. These relative mean squared errors are given in Table 2. We note that all of these are of the order of 10-2 or 10-1 . 6. Application: Chandra Deep Field North X-ray data. We now apply our method to data from the Chandra Deep Field North (CDFN) X-ray survey. Our data set comprises a total of 225 sources with an off-axis angle of 8 arcmins or less and counts ranging from 5 to 8655. The full CDFN data set is comprised of multiple observations at many different aimpoints, however, we here consider only a subset where the aimpoints are close to each other to avoid complications such as variations in detection probability due to changes in the point spread function


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS TABLE 2 ^ ^ The relative mean squared errors of and , conditional on selection of the correct B . All entries are multiplied by 102 ^ Setting 1 2 3 4 Method AIC BIC AIC BIC AIC BIC AIC BIC 1 5.14 4.91 3.33 3.52 3.52 3.57 2.71 2.72 2 ­ ­ 2.55 2.60 14.2 12.9 3.26 3.94 3 ­ ­ ­ ­ ­ ­ 5.04 4.97 1 11.1 10.6 9.81 9.17 12.0 11.1 7.08 7.16 ^ 2 ­ ­ 11.3 10.8 13.2 13.5 9.91 9.74 3 ­ ­ ­ ­ ­ ­ 12.3 11.9

1705

(PSF) shape and consequent variations in detection probability. The decision to include only aimpoints close to each other was taken primarily to avoid the issue of "incompleteness" and essentially amounts to taking a higher signal to noise subset of the full data set. Incompleteness occurs when sources are not observed, typically a result of being too faint to be detected under the specific detector configuration used. Since this missingness is a function of the quantity to be estimated, it must be accounted for and can lead to tremendously more complicated and challenging modeling. This approach is taken as part of a fully Bayesian analysis in Baines et al. (2012), but there are significant challenges to the method. Most notably, results are very sensitive to the "incompleteness function," which is frequently not known to such high precision. By considering only a subset of aimpoints we focus on a higher SNR subset of the Chandra data that is not subject to issues arising from incompleteness. We do not believe that the subset choice impacts the final conclusion, as the results in the unpublished report of Udaltsova, which models the full data set and accounts for incompleteness, are extremely similar to those presented here. Since the off-axis angle measures the radial distance of the source from the center of the detector, sources with large off-axis angles can be thought of as being "close to the edge of the image." Sources appearing at large off-axis angles appear much larger and at lower resolution than those closer to the center of the detector. The source-specific scaling constant, effective area Ai , is used to account for variations in the expected number of photons as a function of source location and photon energy. However, at large off-axis angles additional complications such as "confusion" (two or more sources overlapping and appearing as one) and "incompleteness" (possible nondetections of fainter sources) must be considered. For the purposes of our analysis here, we include all sources with an off-axis angle <8 arcmin to achieve a worst-case completeness of 80%. We also consider


1706

R. K. W. WONG ET AL.

F IG . 3 . log N ­log S plot for the Chandra Deep Field North data with off-axis angle truncation at ^ 8 arcmins. The vertical dotted lines are drawn at 1 and 2 . The red lines correspond to the fitted ^ ^ ^ broken-Pareto model with estimated slopes 1 and 2 .

thresholding at <6and <7 arcmins, with a full discussion of the sensitivity to this threshold considered in Section 6.1. Applying our model selection procedure to the data set with <8 arcmins yields ^ ^ an estimate of B = 2, with B = 1 for the <6and <7 arcmin subsets. As discussed in detail in Section 6.1, the consistency of the observations in the 6­8 arcmin range suggests that the ability to detect the presence of a breakpoint is limited by the small sample sizes at <6 and <7. Figure 3 shows the log N ­log S plot for the <8 arcmin data set, depicting the log (base 10) of the empirical survival count as a function of the log flux, using the imputed fluxes from the final E-step of our algorithm. While the plot ignores the uncertainty in the Si 's, it remains the standard plot for the analysis of log N ­log S relationships. We note from the plot that the "break" is clearly visible around log10 (1 ) =-15.657, with a change in slope from 0.48 to 0.85. Full parameter estimates and standard error estimates are provided in Table 3. Standard error estimates are obtained using a simple Bootstrap resampling procedure. We also note that by simulating from the model, the seemTABLE 3 Parameter estimates and standard errors for the CDFN data set Parameter 1 2 log log
10 (1 ) 10 (2 )

Estimate 0.483 0.854 -16.344 -15.657

SE 0.060 0.224 0.030 0.271


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1707

ingly nonlinear behavior of the curve at log(S ) =-14.5 is nonetheless seen to be consistent with the piecewise linear model. Our analysis shows that a two-piece broken power-law model is preferred for this subset, with a breakpoint at a lower flux than shown in Moretti et al. (2003) and with the lower segment at a flatter slope. This differs from what would be expected if point sources are to make up all of the diffuse background [Hickox and Markevitch (2007)], suggesting that a significant proportion of the residual X-ray background is composed of diffuse emission (e.g., hot intergalactic plasma); see also Mateos et al. (2008). The analysis in Hickox and Markevitch (2007) was based on optical sources from the Hubble Space Telecope (HST) which had no X-ray counterparts. By considering various models for the X-ray intensities of these sources, Hickox and Markevitch (2007) compared them to the residual X-ray background from deep Chandra observations. The proportion of the Cosmic X-ray Background (CXB) that can be explained by point sources alone is typically around 70­80%. Connecting to our results, higher values for 1 increase the possibility that deeper observations could be obtained that would explain an additional proportion of the CXB as discrete sources. Alternatively, lower values for 1 signify a flatter log N ­log S , suggesting a greater amount of diffuse emission. Figure 8 of Hickox and Markevitch (2007) depicts the relationship between the proportion of the 0.5­2 keV CXB from unresolved HST point sources and the power-law slope. The breakpoint estimated in our analysis translates to 10-16 ergs-1 cm-2 for the passbands used by Hickox and Markevitch (2007). However, in the 2 Msec data set they analyze, they do not detect any breakpoints (see their Figure 7). Our analysis indicates that the log N ­log S curve flattens for fluxes less than the breakpoint, thus allowing for a significant proportion of the unresolved residual X-ray background to be due to diffuse emission. 6.1. CDFN source selection. In this section we consider the sensitivity of our analysis to the chosen off-axis angle threshold. As discussed in Section 6, at higher off-axis angles there are additional complications such as incompleteness and confusion that must be built into any statistical analysis that are not covered by the method presented here. Let K denote the maximum off-axis angle, that is, all sources with off-axis angle less than K are retained and all others are excluded from the analysis. The choice of K = 8 for our analysis in Section 6 is motivated by scientific considerations and an estimated completeness above 80% at K = 8. However, by varying the truncation point we obtain additional insight into the sensitivity of our analysis to this decision, as well as to the statistical sensitivity to the sample size required for breakpoint detection. Table 4 shows the results of the analysis for differing values of K . As explained, results for K > 9 are likely to be untrustworthy, although they happen to be similar to those with K = 8. On the other extreme, if we truncate at K = 4 or K = 5, we unnecessarily discard a large number of sources.


1708

R. K. W. WONG ET AL. TABLE 4 CDFN Results by varying off-axis truncation log10 () ^ K 4 5 6 7 8 9 10 11 12 13 n 77 112 152 192 225 257 287 298 303 304 log10 (1 ) ^ - - - - - - - - - - 16.364 16.353 16.329 16.373 16.343 16.352 16.378 16.389 16.403 16.429 log10 (2 ) ^ ^ 1 0.788 0.738 0.691 0.590 0.482 0.449 0.450 0.456 0.454 0.412 ^ ^ 2

- - - - - -

15.668 15.732 15.696 15.702 15.677 15.843

0.850 0.850 0.792 0.793 0.802 0.743

We note that at K = 7 we are also no longer able to formally detect a break, ^ that is, B = 1. However, upon closer examination the BIC values for B = 1 and B = 2 when K = 7 are very similar (2186.79 vs. 2188.37), indicating that there is little to choose between the B = 1 and B = 2 models. With a few additional data points added at K = 8, our procedure then has enough power to detect the break at K = 8. It is worth noting that all additional data points with off-axis angle between 7 and 8 were manually screened, and are quantitatively very similar to those with K < 7. That is, the detection (or lack) of a breakpoint in this context appears to be primarily determined by the sample size of the data set used. This is consistent with our results from the simulation study in Section 5, where a sample size of approximately 200 was required to reliably detect a break with similar parameter configurations. Indeed, looking at the plot in Figure 3, we note that the break is rather a subtle one, with the estimated slopes differing by approximately 0.37. In summary, for this particular data set we note that there appears to be evidence of a breakpoint, although the sample size required to detect the breakpoint is not reached until we truncate at K = 8, just before additional modeling considerations such as incompleteness must be accounted for. 7. Theoretical properties. This section deals with the large-sample properties of the proposed procedure. We first establish consistency results for the case when B is known, with no background contamination (bi = 0 for all i ) and all Ai are assumed to be identical. Then we describe how one could weaken the assumptions of identical Ai 's and zero bi 's. However, as explained at the end of this section, the case of unknown B is substantially more difficult and we are unable to provide any theoretical results for this case.


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS

1709

If it is assumed that Ai = A> 0 and bi = 0 for all i = 1,...,n,then Y1 ,...,Yn constitute an i.i.d. sample from model (4). Denote the density of Y1 by f(y ; ) = =
j =1 e-As 1 B

(As )y fB (s ; , )d s y!

j -1



j -1



j

j (Aj ) y!

j

(y - j ,Aj ) - (y - j ,A

j +1

).

The parameter space is defined as = { = ( , )T R2B : j = j +1 ,j < + j +1 ,j = 1,...,B - 1}. Let 0 = ( 0 , 0 )T denote the true parameter value. Notice that is not compact and that the value of the likelihood does not converge to zero if the parameter approaches the boundary of . Therefore, standard arguments such as the ones based on Wald (1949) do not apply directly in order to ^ establish strong consistency of the maximum likelihood estimator of 0 . Instead a compactification device is applied to subsequently use the results of Kiefer and Wolfowitz (1956). This leads to the following result. T HEOREM 1. Suppose B is known and Ai = A> 0 for all i = 1,...,n. Then, ^ ^ the maximum likelihood estimator is strongly consistent for 0 , that is, 0 with probability one as n . The proof of Theorem 1 is provided in an online supplement [Wong et al. (2014)]. To weaken the restriction of identical Ai , observe that this condition is mainly applied to allow the use of the strong law of large numbers for i.i.d. random variables, as required for the direct application of the results in Wald (1949) and Kiefer and Wolfowitz (1956). Since the arguments used to prove Theorem 1 are still valid if only the assumption Ai > 0 is made, Kolmogorov's version of the strong law of large numbers can be applied to adapt their proof to the present case, imposing additional assumptions such as the Kolmogorov criterion
i =1

Var(Yi ) < i2

or conditions ensuring the validity of Kolmogorov's three-series theorem. Then, the result of Theorem 1 holds also in this more general setting. The case for nonzero bi 's can also be dealt with similarly, but with long and tedious algebra. In the theory developed above, the number of pieces, B , in the broken-Pareto model is assumed to be known. The case of unknown B is, however, substantially more difficult. In fact, in results in simpler settings such as the traditional "change in mean" scenario, in which segments of independent observations differ only by their levels, strong distributional assumptions become necessary to show consis-


1710

R. K. W. WONG ET AL.

tency of an estimator for B . These typically require normality of the observations so that sharp tail estimates of the supremum of certain Gaussian processes are available, for example, see Yao (1988). These techniques have also been exploited in Aue and Lee (2011) for image segmentation purposes. However, in the current context of the more complex broken-Pareto model, these arguments are not applicable and, in fact, it seems infeasible to derive theoretical properties under a set of practically relevant assumptions. 8. Concluding remarks. We provide a coherent statistical procedure for selecting the number and orientation of "pieces" in an assumed piecewise linear log N ­log S relationship. Our framework allows astrophysicists to use a principled approach to reliably select the model order B , and for parameter estimation via maximum likelihood estimation in a numerically challenging context. To our knowledge, this is the first statistically rigorous procedure developed for solving this important scientific problem. R code implementing the proposed procedure can be obtained from the authors. Acknowledgments. The authors are grateful to the Associate Editor and the Editor, Professor Tilmann Gneiting, for their most useful and constructive comments which substantially improved the paper. SUPPLEMENTARY MATERIAL Technical details (DOI: 10.1214/14-AOAS750SUPP; .pdf). We provide technical details of the proof of Theorem 1. REFERENCES
, A . and L EE , T. C . M . (2011). On image segmentation using information theoretic criteria. Ann. Statist. 39 2912­2935. MR3012396 BAINES , P. D . (2010). Statistics, science and statistical science: Modeling, inference and computation with applications to the physical sciences. Ph.D. thesis. BAINES , P. D ., M ENG , X . L . and X IE , X . (2014). The interwoven EM algorithm. Unpublished manuscript. BAINES , P. D ., U DA LT S OVA , I . S ., Z EZAS , A . and K ASHYAP, V. L . (2012). Bayesian estimation of log N ­log S . In Statistical Challenges in Modern Astronomy V (E. D. Feigelson and G. J. Babu, eds.) 469­472. Springer, New York. C APPELLUTI , N ., H ASINGER , G ., B RU S A , M ., C OMASTRI , A ., Z AMORANI , G ., B æHRINGER , H ., B RUNNER , H ., C IVANO , F., F INOGUENOV, A ., F IORE , F., G ILLI , R ., G R I FFI T H S , R . E ., M AINIERI , V., M AT U T E , I ., M IYAJI , T. and S ILVERMAN , J . (2007). The XMM­Newton Widefield survey in the COSMOS field. II. X-ray data and the log N ­log S relations. Astrophys. J., Suppl. Ser. 172 341­352. D EMPSTER , A . P., L AIRD , N . M . and RUBIN , D . B . (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1­38. MR0501537
UE

A


ESTIMATION OF ASTROPHYSICAL SOURCE POPULATIONS F
RIEL

1711

, N . and P ETTITT, A . N . (2008). Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 589­607. MR2420416 G UETTA , D ., G RANOT , J . and B EGELMAN , M . C . (2005). Constraining the structure of gamma-ray burst jets through the log N ­log S distribution. Astrophys. J. 622 482­491. H EWISH , A . (1961). Extrapolation of the number-flux density relation of radio stars by Scheuer's statistical methods. Monthly Notices of the Royal Astonomical Society 123 167­181. H ICKOX , R . C . and M ARKEVITCH , M . (2007). Can Chandra resolve the remaining cosmic X-ray background? Astrophys. J. 671 1523­1530. J ORDàN , A ., C òTè , P., F ERRARESE , L ., B LAKESLEE , J . P., M EI , S ., M ERRITT, D ., M ILOSAVL JEVIFà , M . , P ENG , E . W. , T ONRY, J . L . and W EST, M . J . (2004). The ACS Virgo cluster curvey. III. Chandra and Hubble space telescope observations of low-mass X-ray binaries and globular clusters in M87. Astrophys. J. 613 279­301. K ENTER , A . T. and M URRAY, S . S . (2003). A new technique for determining the number of X-ray sources per flux density interval. Astrophys. J. 584 1016­1020. K IEFER , J . and W OLFOWITZ , J . (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887­906. MR0086464 K ITAYAMA , T., S ASAKI , S . and S UT O , Y. (1998). Cosmological implications of number counts of clusters of galaxies: log N ­log S in X-ray and submm bands. Publ. Astron. Soc. Jpn. 50 1­11. KOUZU , T., TASHIRO , M . S ., T ERADA , Y., YAMADA , S ., BAMBA , A ., E NOT O , T., M ORI , K ., F UKAZAWA , Y. and M AKISHIMA , K . (2013). Spectral variation of hard X-ray emission from the Crab Nebula with the suzaku hard X-ray detector. Publ. Astron. Soc. Jpn. 65 74-1­74-11. M ATEOS , S ., WARWICK , R . S ., C ARRERA , F. J ., S TEWART , G . C ., E BRERO , J ., D ELLA C ECA , R ., C ACCIANIGA , A ., G ILLI , R ., PAG E , M . J ., T REISTER , E ., T EDDS , J . A ., WAT S O N , M . G ., L AMER , G ., S AXT ON , R . D ., B RUNNER , H . and PAG E , C . G . (2008). High precision X-ray log N ­log S distributions: Implications for the obscured AGN population. Astron. Astrophys. 492 51­69. M AT HIESEN , B . and E VRARD , A . E . (1998). Constraints on 0 and cluster evolution using the ROSAT log N ­log S relation. Mon. Not. R. Astron. Soc. 295 769­780. M ORETTI , A ., C AMPA NA , S ., L AZZATI , D . and TAGLIAFERRI , G . (2003). The resolved fraction of the cosmic X-ray background. Astrophys. J. 588 696­703. RYDE , F. (1999). Smoothly broken power law spectra of gamma-ray bursts. Astrophys. Lett. Commun. 39 281­284. S CHEUER , P. A . G . (1957). A statistical method for analysing observations of faint radio stars. Proc. Cambridge Philos. Soc. 53 764­773. MR0091847 S EGURA , C ., L AZZATI , D . and S ANKARASUBRAMANIAN , A . (2013). The use of broken powerlaws to describe the distributions of daily flow above the mean annual flow across the conterminous U.S. J. Hydrol. 505 35­46. T RUDOLYUBOV, S . P., B OROZDIN , K . N ., P RIEDHORSKY, W. C ., M ASON , K . O . and C OR DOVA , F. A . (2002). On the X-ray source luminosity distributions in the bulge and disk of M31: First results from the XMM­Newton Survey. Astrophys. J. Lett. 571 17­21. WALD , A . (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Stat. 20 595­601. MR0032169 W ONG , R . K . W., BAINES , P., AUE , A ., L EE , T. C . M . and K ASHYAP, V. L . (2014). Supplement to "Automatic estimation of flux distributions of astrophysical source populations." DOI:10.1214/14-AOAS750SUPP. YAO , Y. - C . (1988). Estimating the number of change-points via Schwarz' criterion. Statist. Probab. Lett. 6 181­189. MR0919373


1712

R. K. W. WONG ET AL.

Y U , Y. and M ENG , X . - L . (2011). To center or not to center: That is not the question--an ancillaritysufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. J. Comput. Graph. Statist. 20 531­570. MR2878987
R. K. W. W ONG P. BAINES A. AUE T. C . M . L EE D EPARTMENT O F S TATISTICS U NIVERSITY O F C ALIFORNIA , DAV I S 4118 M AT H E M AT I C A L S CIENCES B UILDING O NE S HIELDS AVENUE DAV I S , C ALIFORNIA 95616 USA E- MAIL : rkwwong@ucdavis.edu pdbaines@ucdavis.edu aaue@ucdavis.edu tcmlee@ucdavis.edu V. L . K ASHYAP H ARVA RD ­S MITHSONIAN C ENTER FOR A STRO PHYSICS 60 G ARDEN S TREET C AMBRIDGE , M ASSACHUSETTS 02138 USA E- MAIL : kashyap@head.cfa.harvard.edu