Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mrao.cam.ac.uk/projects/OAS/publications/fulltext/OT19031.pdf
Дата изменения: Wed Feb 20 17:40:26 2013
Дата индексирования: Sat Mar 1 03:09:14 2014
Кодировка:
A Markov-Chain Monte Carlo approach to the design of multilayer thin-film optical coatings
Michael P. Hobson and John E. Baldwin Astrophysics Group, Cavendish Laboratory, Madingley Road, Cambridge, CB3 0HE, U.K. We reconsider the problem of locating the globally optimal solution of a multilayer optical coating design problem, within some predetermined space of parameters, with the aim of obtaining a robust technique that requires a minimum of user intervention. The approach adopted centres on exploring the space of the parameters of interest using a Markov-Chain Monte Carlo sampling algorithm. This technique enables one to locate the global optimum automatically with high confidence, and without the need for a good starting design. It also allows the trivial inclusion of prior constraints on the variables and provides a natural means for investigating the robustness of the optimal solution. c 2004 Optical Society of America

OCIS codes: 310.0310, 310.1210, 310.1620, 310.6860.

1.

Intro duction

Techniques for thin-film coating design are often classified into two categories: refinement and synthesis. In refinement, a starting design, with a given set of materials and order of the layers, is improved progressively by small adjustments of the layer thicknesses to give the best design that can be reached by such a process.1 1
-3

A good


starting design is usually essential to obtaining an acceptable result. It is unlikely that the final design will be the global optimum design within the allowed parameter space except in simple cases. In synthesis, a wider search is carried out of the multidimensional space defined by the materials, the order and number of layers and their thicknessess to find better starting designs which can then be refined.4
-8

The

effectiveness of the wider search in finding a design sufficiently close to the global optimum is evidently crucial to the success of the method. Many algorithms are now available for carrying out the design of multilayer coatings,9
-13

but in practice it is still rarely the case that the design process is com-

pletely automatic, especially when the performance specifications are complicated. User intervention is frequently necessary, for example, to change the tolerances or number of layers, and extend or truncate the overall thickness after inspection of the results. In this paper, we reconsider the problem of locating the global optimum of a thin-film design problem, within some predetermined parameter space, with the aim of obtaining a robust technique that requires a minimum of user intervention. The approach adopted here centres on exploring the space of the parameters of interest using a Markov-Chain Monte Carlo sampling algorithm. This technique enables one to locate the global optimum automatically with high confidence, and without the need for a good starting design. It also allows the trivial inclusion of prior constraints on the parameters and provides a natural means for investigating the robustness of the optimal solution. In principle, the approach can be applied to the general problem in which the refractive indices, thicknessess and total number of layers are 2


allowed to vary. Nevertheless, to illustrate the method in a straightforward manner, we restrict our numerical investigations here to multilayer designs based on a given set of materials, in which the order of materials and the maximum allowed number of layers are specified. One then locates the globally optimal values for the number of layers and their corresponding thicknesses (within predefined limits). In particular, we focus on two specific normal-incidence design problems: a wide-band anti-reflection coating and a wide-band dichroic.

2.

The Optical Coating Design Problem

An optical coating is usually considered to be divided up into a number of thin layers of differing thicknesses and refractive indices, as illustrated in Fig. 1.14 It is further assumed that the material constituting the layers is non-magnetic, that there are no volume charges, and that a single plane wave is incident on the outside of the coating. A. Mathematical formulation of the design problem

Referring to Fig. 1, suppose the thin-film optical coating consists of M layers, which separate the outside medium (`layer' 0) from the substrate (`layer' M + 1). The quantities aj and bj define the complex electric field amplitudes of the forward travelling and reflected waves for j = 0, . . . , M + 1, the values being defined at y = 0 on the surface from which they originate in the figure. The relationship between field amplitudes are determined by imposing the boundary conditions that the tangential components of the electric field E and the magnetic field H are continuous at the layer interfaces.
14

For S -polarisation, the E vector of the incident electromagnetic wave lies along

3


the outward normal to the plane of the paper in Fig. 1. The boundary conditions for E and H respectively at y = 0 at the surface (j - 1) then read aj + bj eij = a (aj - bj eij )nj cos
j

j -1

eij

-1

+b

j -1

, )n
j -1

(1) cos
j -1

= (aj -1 eij

-1

-b

j -1

,

(2)

where j = (2 /)nj dj cos j is the phase thickness of a layer expressed in terms of its refractive index nj and physical thickness dj , the wavelength , and the angle of internal refraction j . From Snell's law, we also have the subsidiary condition nj sin j = n sin = sin 0 , (3)

j -1

j -1

where 0 is the angle of incidence for the incoming wave. For P -polarisation, the H vector of the incident electromagnetic wave lies along the outward normal to the plane of the paper in Fig. 1, and the boundary conditions for E and H at the surface (j - 1) then read (aj + bj eij ) cos (aj - bj eij )n = (a = (a eij eij
-1

j

j -1

+b

j -1

) cos )n
j -1

j -1

(4) (5)

j

j -1

-1

-b

j -1

,

where, once again, we have the subsidiary condition (3). In both the S and P polarisation cases, the above formalism remains valid for complex refractive indices and for angles where total internal reflection and evanescent waves result. Losses and dispersions are ignored in this illustrative discussion, but this is not a theoretical limitation of the approach. For the purposes of calculation, it is convenient to set a0 = 1 as indicated in Fig. 1. We also assume that b
M +1

= 0, so no radiation moves outwards from the 4


substrate. In either polarisation case, for M layers, we thus obtain a set of 2M + 2 equations in 2M + 2 unknowns. These equations are easily assembled into a single matrix-vector equation of the form Mx = c, where the column vector x = (b0 , a1 , b1 , . . . , aM , bM , b
M +1

(6) )T contains the (complex)

unknowns, the constant vector on the right-hand side reads c = (1, 1, 0, 0, . . . , 0, 0, 0)T , and M is a (complex) band-diagonal matrix with non-zero elements only on the leading diagonal and the two adjacent super- and sub-diagonals. Thus, by solving the linear system (6) one obtains all the unknown complex field amplitudes throughout the multilayer optical coating for a given wavelength of the incident radiation. The band-diagonal structure of M allows the derivation of simple recurrence formulae for the amplitudes, which are analytically equivalent to the standard Abel` es `matrix method'.14 Nevertheless, the same solution can also be obtained in a numerically efficient manner by solving the band-diagonal linear system (6) using the appropriate routines from the Lapack linear algebra package. As an example, for a coating consisting of 40 layers, the solution to the corresponding linear system (6), which consists of 82 simultaneous equations, is obtained in few в 10-3 sec of CPU time on an Intel 1.5 GHz Pentium processor. B. Specification of the optical performance of the coating The above formalism is entirely general in its applicability. Here we simply illustrate our discussion by considering the common problem of designing a multilayer optical coating for which the reflectivity R() = |b0 ()|2 is required to follow, as closely as 5


possible, some target reflectivity dependence R() over a given spectral band. Thus we are concerned only with the value of the first component of the vector of unknown complex amplitudes x in (6), as a function of wavelength. In solving (6), all the remaining complex amplitudes are also obtained, but since this calculation requires so little CPU time we consider it better to retain the flexibility of our general approach. In a numerical solution to the design problem, it is impossible to calculate R() at all points of the spectral band. It is therefore standard practice to introduce a fixed set of K wavelengths 1 , 2 , . . . ,
K

at which the values of R and R are compared.

The choice of these target wavelengths is a matter of some debate. There are two conflicting requirements. On one hand, the spectral band should be sampled as finely as possible to guarantee an accurate solution. On the other hand, the computational burden of calculating a solution increases linearly with the number of targets. Various strategies for choosing an appropriate set of (non-homogeneous) target wavelengths have been proposed.
14

Nevertheless, a straightforward criterion for

choosing the separation of targets in wavelength can be obtained by drawing a simple analogy between the optical coating and a transmission line (see Fig. 2. As the length l of the transmission line is increased the apparent input impedance will vary between
2 Z and Z0 /Z , with one full cycle of variation occuring for an increase in length of

l = /2, where is the wavelength of the alternating input signal. Alternatively, if the length l of the line is kept fixed and the wavelength of the input signal is changed, the input impedance will vary through a full cycle for a change in wavelength

6


such that l = =. l 2l The same will be true for an optical coating. If it comprises M layers, each having an average thickness of 0 /4 (quarter-wave layers are the natural starting point for a synthesis of the sort we are considering here) then 2 = . M 0 The apparent impedance of the substrate (the deepest layer as seen from the outer medium) gives the most rapid oscillation in reflectivity with wavelength. To suppress such features in our design problem we need to choose the targets to be separated in wavelength by an amount /4. So, the spacing between targets should be such that . 2M 0 Since is clearly a function of , this produces a non-homogeneous distribution of targets across the spectral band. C. The optimisation problem

The most direct and general techniques for obtaining a suitable thin-film design are based on the numerical optimisation of some merit function. taken to be
K 14

This function is usually

f (p) =
k=1

wk [R(k ; p) - R(k )]2 ,

(7)

where wk are a set of given non-negative weights and the parameter vector p may, in general, include both the refractive indices nj and physical thicknessess dj of the 7


layers (where j = 1, 2, . . . , M ). In many problems the weights wk are all simply set to unity, although differential weighting can prove useful if the target wavelengths are not equally spaced throughout the spectral band. ^ The optimal solution p for the coating design is that which minimises (globally) the value of the merit function. It is, of course, necessary to specify in advance the space of parameters in which the (global) optimum is sought. For example, one might fix the order of layer materials and the number of layers (or at least the maximum allowed number), and place limits on the layer thicknesses. In any case, the key problem in adopting this approach is that the merit function is generally multiextremal, i.e. aside from the global minimum there exist many local minima. Indeed, the number of local extrema typically grows exponentially with the number of layers and can reach many thousands even for M 10. This makes the search for the globally optimum solution very difficult. Most approaches to obtaining the global minimum of the merit function (7) are based on the use of first-order optimisation algorithms. These methods require one to be able to evaluate both the merit function f and its gradient f at any given point in f in

the parameter space p. Different first-order optimisation algorithms use f and

different ways to iterate to the minimum, and several approaches have been explored. Some authors suggest that optimisation from some starting point p0 should begin by employing a steepest descent method and then switching to a conjugate gradient algorithm as the minimum is approached.14 Others advocate the use of the quasiNewton or Levenberg-Marquardt methods.12
, 13

In all cases, however, the simple use

of gradient information made by these methods means that they will converge only 8


to the `nearest' local minimum. Therefore, p0 must lie within the basin of attraction of the global minimum. An additional subtlety of the thin-film coating optimisation problem is that the refractive indices and thicknessess are allowed to vary in only a limited range of values. Thus, optical coating refinement is a constrained optimisation problem, with the most obvious constraint being that the physical thicknesses of the layers must be positive, i.e. dj > 0. It is often the case, however, that one must satisfy a more complicated set of constraints. Even when one is refining just the layer thicknesses, manufacturing constraints may require that each dj is greater than some (non-zero) tolerance, which may depend on the material that comprises the layer. If the refractive indices of the layers are also allowed to vary, the set of constraints can be significantly more complicated. In such cases, it is necessary to use a more sophisticated approach, such as active set strategies, in which variables are frozen when they reach their bounds and are subsequently released when specific conditions are satisfied.
12, 13

The optical coating optimisation problem is thus highly non-trivial. Nevertheless, in Section 4, we present an approach to obtaining the globally optimal solution that is based on exploring the parameter space p by means of a Markov-chain Monte Carlo (MCMC) sampling algorithm. As we will show below, this technique requires only the evaluation of the merit function f itself (and not its gradient). Moreover, the algorithm can locate the global minimum of the merit function without the need for a good starting guess, allows trivially for the inclusion of constraints on the allowed values of the variables, provides a natural means for assessing the robustness of the optimal solution, and also simultaneously identifies other sub-optimal, but perhaps 9


more easily-manufactured, solutions. The approach we present below may, in fact, be considered as the natural generalisation of the technique of simulated annealing.9 Starting from some point p0 , a random change p in the parameters is generated (there exist numerous scheme for generating such a change in an efficient manner). A proposed downhill step is always accepted, but also a proposed uphill one is accepted with probability exp(-f /T ), where f is the proposed change in the value of the merit function and T is the annealing temperature. For a given T -value, the algorithm can thus wander freely among local minima of depth less than about T . If T is reduced sufficiently slow ly according to some annealing schedule, the algorithm should tend towards the region containing the global minimum. In general, however, simulated annealing minimisation techniques can be extremely computationally intensive and are very sensitive to the precise annealing schedule employed and the method used for proposing steps. Moreover, this approach does not allow the simple investigation of the robustness of the optimal solution and does not provide information on subsidiary extrema that are natural products of our MCMC sampling approach. Before discussing the MCMC approach in detail, we first consider how the optical coating design problem can be usefully reformulated in a Bayesian context, irrespective of the algorithm used to locate the globally optimal solution.

3.

Reformulation of the Design Problem

There is a conceptual advantage in reformulating the design problem in a traditional model-fitting context. In the standard merit function (7), the weights wk determine 10


the relative importance at different target wavelengths of the closeness between the actual and required spectral characteristics. A better approach is to replace the set of weights wk by a set of tolerances Rk , which determine the absolute accuracy desired of the solution at each of the target wavelengths. Moreover, by considering the tolerances as the standard deviation of Gaussian distributed uncertainties, one
1 may write down the likelihood function Pr(R|p) exp(- 2 2 (p)), where R and p are

the target and parameter vectors respectively. In this expression 2 is the standard misfit statistic
K

(p) =
k=1

2

Rk (p) - R Rk

2 k

,

(8)

where K is the total number of targets. Clearly, the global maximum of the likelihood function in the multidimensional predetermined space of parameters, p, corresponds to the global minimum of the merit function f (p) in (7). One may also perform a simple anaylsis of the robustness of the optimal solution by investigating the shape of the likelihood function (or 2 -function) in the vicinity of the optimum. For example, one may wish to determine the region of parameter space ^ p contained within the hypersurface 2 (p) - 2 (p) = K . In other words, one may estimate straightforwardly the `uncertainties' in the fitted values of the parameters. i.e. the amount by which the optimal parameter values can be changed while still providing an adequate solution to the design problem within the required tolerances. If one views the model-fitting problem in a Bayesian context, one should properly consider the posterior distribution, which is given via Bayes' theorem as Pr(p|R) = Pr(R|p) Pr(p) , Pr(R) 11


where the prior Pr(p) on the parameters represents our state of knowledge (or prejudices) regarding the values of the parameters. The evidence Pr(R) may be considered here merely as an unimportant normalisation constant. Thus one need only consider Pr(p|R) Pr(R|p) Pr(p). Under the assumption that Pr(p) = constant, the global optimum of the posterior coincides with that of the likelihood function (and the global minimum of 2 ). One may, however, use the Bayesian formulation to impose a non-uniform prior on the parameters p. This provides a natural means of encoding one's intuition (or prejudice) about the likely form of the solution. For example, one could impose a multivariate Gaussian prior centred on some quarter-wave stack design that is expected to be a reasonable approximation to the optimal solution. The dispersion of the Gaussian prior for each parameter would indicate how closely we expect the final optimal value of that parameter to correspond with our approximate model. This has the effect of identifying the broad region of parameter space in which one expects the optimal solution to be located.

4.

Markov-Chain Monte Carlo Sampling

Markov-Chain Monte-Carlo (MCMC) techniques allow one to explore the full posterior distribution Pr(p|R) by sampling, rather than having to evaluate it over a large hypercube spanning the parameter space, which would be computationally impossible. The principles underlying MCMC sampling from some posterior distribution have been discussed extensively elsewhere,15 so we shall give only a brief summary of the basic points. 12


A.

Sampling from the posterior

The ma jor advantage to using a sampling-based approach is that the set of samples {p1 , p2 , . . . , pNs } from the posterior may be used straightforwardly to estimate the posterior mean, mode or median, each of which can equally-well serve as our `best' ^ estimate p, depending on the application. Moreover, provided the posterior has been sampled effectively, the mode should correspond to the global maximum, and the presence of multiple extrema in the posterior is readily observed. Using the samples, it is also trivial to perform marginalisations over any subset of the parameters p. In particular, one may easily obtain the one-dimensional marginalised (unnormalised) posterior distributions on each parameter pi separately, which can be used to place confidence limits on each parameter pi . Alternatively, the samples may be used to obtain a direct estimate of the elements of the covariance matrix of the parameters Cij (pi - pi )(pj - pj ) , ^ ^

(9)

which gives the robustness of the optimal solution to (small) changes in the values pj . ^ The incorporation of constraints on the parameter values is also trivial in a sampling-based exploration of the posterior. Typically, the sampling is performed within some predefined hypercube in parameter space that encompasses the allowed range of the variables. For simple constraints the extent of the hypercube itself may correspond to the allowed parameter ranges. For more complicated constraints, one can simply ignore the samples obtained from outside the allowed regions of parameter space. Equivalently, one may set the prior Pr(p) to be zero there, so that region is not sampled by the MCMC algorithm. 13


B. The MCMC method Sampling-based methods offer enormous advantages in Bayesian parameter estimation, but the difficult question remains of how one efficiently obtains a set of samples {p1 , p2 , . . . , pNs } from a given posterior Pr(p|R) within the predetermined parameter space. Creating a set of independent samples from the posterior is very time consuming, and so instead one constructs a Markov chain whose equilibrium distribution is the required posterior. Thus, after propagating the Markov chain for a given burn-in period (see below), one obtains (correlated) samples from the equilibrium distribution, provided the Markov chain has converged. A Markov chain is characterised by the fact that the state p transition kernel Pr(p
n+1 n+1

is drawn from a

|pn ) that depends only on the previous state of the chain pn ,

and not on any earlier state. The form of the transition kernel is usually assumed not to depend on n, in which case the Markov chain is homogeneous. The standard approach to constructing a homogeneous Markov chain with a given equilibrium distribution (p) is to use the surprisingly simple Metropolis-Hastings (MH) algorithm, which is based on the familiar notion of rejection sampling. At each step n in the chain, the next state p
n+1

is chosen by first sampling a candidate point p from some proposal

distribution q (p|pn ), which may in general depend on the current state of the chain pn . The candidate point p is then accepted with probability (p , pn ) given by (p , pn ) = min 1, (p )q (pn |p ) . (pn )q (p |pn )
n+1

(10)

If the candidate point is accepted, the next state becomes p didate is rejected, the chain does not move, so p 14
n+1

= p , but if the can-

= pn . Remarkably, it is straight-


forward to show that q (p|pn ) can have any form and the stationary distribution of the chain will be (p).15 Although, in theory, the convergence of the chain to the stationary distribution is independent of choice of proposal distribution, this choice is crucial in determining both the efficiency of the MH algorithm and the rate of convergence to the stationary distribution, as is seen immediately by inspection of (10). The choice depends, in general, on the form of the required stationary distribution. Nevertheless, some general principles do exist for choosing a proposal distribution.16 Certain general classes of proposal distribution are commonly employed. For example, in the original Metropolis algorithm, the proposal distribution is symmetric, such that q (p|pn ) = q (pn |p) for all p and pn , and (10) simplifies accordingly. For the special case of the random-walk Metropolis algorithm, the proposal satisfies q (p|pn ) = q (|p - pn |). In this case, a popular choice for the proposal distribution is a multivariate Gaussian distribution centred on pn with a fixed covariance matrix. Finally, an independence sampler is a special case of the MH algorithm for which q (p|pn ) = q (p), so that the proposal distribution does not depend on the current state of the chain pn . In practice, the basic MH algorithm (or variations thereon) can be augmented by numerous speed-ups that allow the stationary distribution to be sampled more efficiently. A particularly useful device for improving the sampling efficiency is periodically to update the covariance matrix of the Gaussian proposal distribution based on the statistical properties of the previous samples. It is also beneficial in terms of sampling efficiency and in diagnosing convergence (see below) to run several chains simultaneously in parallel. These chains may periodically be allowed to interact by, 15


for example, exchanging some subset of parameter values. This ensures good mobility of the ensemble of chains around the equilibrium distribution (p). C. Burn-in, annealing schedules and maximisation As mentioned above, the states of the chain pn can be regarded as samples from the stationary distribution (p) only after some initial burn-in period required for the chain to reach equilibrium. Unfortunately, there exists no formula for determining the length of the burn-in period, or for confirming that a chain has reached equilibrium. Indeed, the topic of convergence is still a matter of ongoing statistical research. Nevertheless, several convergence diagnostics for determining the length of burn-in have been proposed. These employ a variety of theoretical methods and approximations that make use of the output samples from the Markov chain.
16

During the burn-in period it can prove useful not to sample directly from the posterior, but instead from the posterior raised to a non-negative power . One begins sampling from this modified posterior with = 0 and then slowly raises the value according to some annealing schedule until = 1. This allows the chain(s) to sample from remote regions of the posterior distribution and thus fully explore its many extrema. Indeed, by adopting an annealing schedule based on the output of convergence diagnostics, one can arrange for the end of the burn-in period to coincide with the point at which reaches unity, at which point is fixed at this value and the post burn-in samples are drawn from the true posterior distribution. If one is interested only in determining the global maximum of the posterior (i.e. the optimal solution), rather than elucidating the structure of the full posterior

16


distribution (i.e. the robustness of the optimal solution), then an MCMC sampler can still be used as a very efficient method for locating it. This is most easily achieved by allowing the annealing parameter described above to continue increasing gradually beyond unity. This artificially `sharpens' the posterior distribution and, as grows, the samples are concentrated into an increasingly compact region of parameter space around the global maximum. Whether or not one fixes to unity after burn-in, or allows it to continue increasing, the modal sample can be used as the starting point for a more traditional maximisation (minimisation) algorithm. Since this starting point should lie well within the basin of attraction of the global optimum (and is already in fact a very good approximation to it), a simple optimiser can then straightforwardly locate its precise position. We have indeed implemented such a `polishing' stage using Powell's direction set algorithm as our simple (local) optimiser.17 This algorithm has the advantage that it does not require the evaluation of function gradients, but only function values, in a similar manner to the MCMC sampler. D. The Metro sampler

In this paper, we use our own implementation of the MCMC technique, called the Metro sampler. This sampler propagates multiple chains simultaneously, the number of chains being set by the user at run time. For the numerical results discussed in section 5, the number of chains was set equal to the number of layers in the coating under investigation. The use of multiple chains assists both in the exploration of the posterior distribution and in the diagnosis of convergence. By examining the samples

17


produced by each of the chains, one can check that they have all coverged to the same region of the parameter space; this was verified for the numerical results presented in the next section. The sampler uses a geometric annealing schedule during burn-in. The value of the annealing parameter is initially set to some low value. After propagating all the chains by one step, the value of is then increased by a constant factor, which is set by the user at run time. The burn-in period is completed once the annealing parameter reaches unity. For the numerical results presented in the next section, the starting value of the annealing parameter was set to 10-6 , and the factor by which it was increased after propagating all the chains was set to 1.1. Once had reached unity, it was fixed at this value, so that the post burn-in samples are drawn from the required posterior distribution. The chains are propagated by the sampler using the Metropolis-Hastings algorithm. To improve the sampling efficiency, however, at each chain step the proposal distribution used is chosen at random from a choice of three possible `engines'. The first engine is a multivariate Gaussian proposal distribution centred at the current chain position. The width of the Gaussian in each parameter direction is scaled dynamically according to the acceptance rate for proposals in that direction. If the average acceptance rate of in given direction drops below some user defined value (default value: 1/3), it is likely that the proposal width for this parameter is too large and so the corresponding proposal width is decreased by a fixed factor (default value: 2). Similarly, if the acceptance rate is less than the proposal width is probably too 2). We find that the

narrow and is thus increased by a fixed factor (default value: 18


proposal width for each parameter quickly settles down to a stable value appropriate to the target density being sampled. Once burn-in is completed the proposal widths are left unchanged, so that the post burn-in samples obey the Markov chain condition. The second engine is an inter-chain mixing proposal. To propagate a given chain, one first chooses at random one of the other chains, the current position of which is used as the `pivot' point. The current position of the chain to be propagated is then inverted through the pivot point to obtain the new proposed point. This engine is particularly effective in accelerating the motion of chains along narrow valleys in parameter space. The third engine is a genetic inter-chain proposal. To propagate a given chain, one first chooses at random one of the other chains, the current position of which is used as the `partner' point. The current position of the chain to be propagated then has a random selection of its parameter values replaced by those from the partner point to obtain the new proposed point for the chain. Finally, we note that the sampler does not make use of gradient information, so that discrete parameters can be easily accommodated.

5.

Numerical Results

In this section, we illustrate the use of our MCMC approach in designing two separate multilayer optical coatings: a wide-band anti-reflection coating and a wide-band dichroic coating. In each case, we assume that the substrate is a typical glass, with a refractive index of 1.50, and that the outer medium is air with a refractive index of 1.00. For the purpose of providing a straightforward illustration of our technique, we also make the simplifying assumption that these refractive indices, plus those of 19


the coating layers, do not vary with wavelength. In practice, some slight variation will always occur, but we expect that this will not significantly affect the result presented below. We point out that, given the appropriate experimental data regarding such variations, it is a simple matter to generalise the calculation of the complex amplitudes in the layers presented in Section 2A. A. Wide-band anti-reflection coating

Our first design problem is to produce an anti-reflection coating over the wavelength range 600­2300 nm; hence our target reflectivity dependence over this range is R() = 0. We choose the target wavelengths 1 , 2 , . . . ,
K

to be 600­1000 nm in steps of

20 nm, 1040­2000 nm in steps of 40 nm and 2000-2300 nm in steps of 50 nm, giving a total of K = 52 targets. The tolerances { Rk } are taken to be 0.01 at each target wavelength. We fix the order of materials in the layer such that the outer layer consists notionally of MgF2 , with a refractive index of 1.37, and subsequent layers alternate between SiO2 and Ta2 O5 , with refractive indices of 1.45 and 2.10 respectively. We take the maximum possible number of layers in the coating to be 25. To determine whether any layers could be omitted from the coating, the algorithm was first run allowing the layer thicknesses to vary between 0 and 300 nm. To ensure good coverage of the posterior distribution, a total of 106 post burn-in samples were taken, with the annealing parameter capped at unity so that the density of samples follows the value of the posterior distribution. The entire MCMC sampling process, including burn-in, required around 60 mins CPU time on an Intel 1.5 GHz Pentium processor. As described in Section 4C, the modal sample was then taken as the starting

20


point for the `polishing' step, in which a simple Powell's direction set algorithm was used to locate the precise position of the global optimum; this requires only a few seconds of CPU time. The resulting solution was found to contain two layers (numbers 6 and 15) for which the optimal thickness was less than 1 nm. These two layers were therefore removed from the stack. Since the materials in the stack alternate, we also concatonate the layers immediately above and below each removed layer, which leads to a quasi-optimal coating containing 21 layers. To locate the new true optimum, the MCMC algorithm was re-run using the 21-layer quasi-optimal solution to define the peak of a Gaussian prior with an uncertainty of 5 nm in each layer thickness. Since this prior already tightly constrains the problem, only 105 MCMC samples were taken (requiring around 6 mins CPU time) and the thickness of each layer was constrained to lie between 5 and 350 nm. In fact, in this case, the quasi-optimal solution is sufficiently good that a simple polishing step starting at this point converges to a solution almost identical to that found by the MCMC sampler in just a few seconds of CPU time. The final optimal solution for the layer thickness is given in Table 1. We see that, as required, all the layer thicknesses lie between 5 and 350 nm, and so the practical manufacture of the coating is technically feasible. The resulting reflectivity distribution R() is shown as the thick solid line in Fig. 3, from which we see that the solution obtained does indeed provide an effective anti-reflection coating over a wide spectral band. The value of 2 for this solution, as defined in (8), is 29.3; this is somewhat less than the number of target wavelengths (K = 52), indicating that the solution more than satisfies the required tolerances when the performance is averaged 21


over the entire spectral band. In fact, as we see from the figure, the mean reflectivity across the band is around 0.007, and the tolerances are individually satisfied at all but two of the target wavelengths. In addition to locating the optimum solution, it is important to note that, if desired, all the MCMC samples obtained in the process can be written to a file for subsequent analysis. One can then easily examine the list of MCMC samples to find, for example, all those samples for which 2 < K (either in the vicinity of the global maximum, or throughout the parameter space) and hence satisfy the required tolerances when the performance is averaged over the entire spectral band. While not optimal, these solutions may be easier to manufacture or contain very thin layers that can removed from the stack without significantly compromising the performance of the coating. In this example, we instead illustrate the use of sampling in determining the robustness of the optimal solution. Let us assume that the manufacturing process is such that the rms error in layer thicknesses produced is 1 nm. The robustness of the coating to such inaccuracies is easily investigated by sampling from a multivariate Gaussian distribution centred on the optimal solution, with a dispersion of 2 nm in each layer thickness. For this purpose it is perfectly adequate to take 103 samples, which required only a few seconds of CPU time. For each sample, one calculates the reflectivity at each of the target wavelengths. Thus, from the ensemble of samples, one obtains the marginalised distribution in reflectivity at each target wavelength, from which the corresponding 68 per cent central confidence interval is easily calculated. These are plotted as the error-bars in Fig. 3, and show that, at longer wavelengths, 22


the solution is reasonably robust to the assumed level of manufacturing inaccuracy, but the performance of coating can be somewhat poorer at shorter wavelengths. B. Wide-band dichroic coating Our second design problem is to produce a dichroic coating over the wavelength range 600­2300 nm, with the break in reflectivity occuring at 1000 nm; hence our target reflectivity dependence over the spectral band is R() = 1 for = 600­1000 nm and R() = 0 for = 1000­2300 nm. We choose the target wavelengths 1 , 2 , . . . ,
K

to

be 600­800 nm in steps of 5 nm, 810­950 nm in steps of 10 nm and 1025-2300 nm in steps of 25 nm, giving a total of K = 108 targets. The tolerances { Rk } are taken to be 0.01 at each target wavelength. The order of materials in the layer alternates between SiO2 and Ta2 O5 , with refractive indices of 1.45 and 2.10 respectively. We take the maximum possible number of layers in the coating to be 40. As in the previous problem, the MCMC algorithm was run with the layer thicknesses allowed to vary between 0 and 350 nm, in order to determine whether any layers could be omitted from the coating. In this case, a multivariate Gaussian prior was also used, with its peak corresponding to a quarter-wave stack at 800 nm and with an rms uncertainty of 100 nm in each layer thickness. After burn-in, 106 samples were taken with the annealing parameter capped at unity so that the density of samples follows the value of the posterior distribution. The entire MCMC sampling process, including burn-in, required around 3 hours CPU time on an Intel 1.5 GHz Pentium processor, and was followed by an almost instantaneous polishing step. The resulting optimal solution for the layer thickness is given in Table 2. In this

23


case, none of the resulting layer thicknesses is close to zero, and hence the final stack contained 40 layers. Indeed, we see that all the layer thicknesses lie between 30 and 310 nm, and so the practical manufacture of the coating is technically feasible. The resulting reflectivity distribution R() is shown as the thick solid line in Fig. 4, from which we see that the solution obtained does indeed provide an effective dichroic coating over a wide spectral band. The value of 2 for this solution is 104.7; this is close to the number of target wavelengths (K = 108), indicating that the solution satisfies the required tolerances when the performance is averaged over the entire spectral band. Indeed, the individual tolerances are satisfied at over 80 percent of the target wavelengths; only 4 target wavelengths have |R - R| > 0.02 and none has |R - R| > 0.03. As above, we may illustrate the robustness of the coating to manufacturing inaccuracies by sampling from a multivariate Gaussian distribution centred on the optimal solution, with a dispersion of 1 nm in each layer thickness. Once again 103 samples were sufficient, which required only a few seconds of CPU time. The resulting 68 per cent central confidence intervals in the reflectivity at each target wavelength are plotted as the error-bars in Fig. 4. We see that, at all wavelengths in the band, the solution is extremely robust to manufacturing inaccuracies.

6.

Discussion and Conclusions

We have reconsidered the important practical problem of designing multilayer thinfilm optical coatings. As is well-known, the relationship between the field amplitudes in each layer may be written as a band-diagonal linear system of equations, from which 24


one may derive simple recurrence formula for the field amplitudes.14 Nevertheless, one may also rapidly obtain the complex field amplitudes throughout the multilayer, for a given wavelength of the incident radiation, using standard linear algebra packages, which is the method we adopt here. For a 40-layer coating, for example, the corresponding linear system can be solved in few в 10-3 sec of CPU time on an Intel 1.5 GHz Pentium processor. This computational speed is certainly sufficient to adopt a Markov-chain Monte Carlo sampling approach to determining the globally optimal coating design. We reformulate the standard merit function optimisation approach in a Bayesian parameter fitting context, and explore the corresponding posterior distribution using the Metropolis-Hastings algorithm. Our technique requires a minimum of user intervention and is capable of locating the global optimum automatically with high confidence, and without the need for a good starting design. It also allows the trivial inclusion of prior constraints on the variables and provides a natural means for investigating the robustness of the optimal solution. We illustrate the method in design problems in which the coating is based on a given set of materials and the order of materials is specified. The optimal layer thicknesses and number of layers are then derived. In particular, we focus on two specific normal-incidence design problems: a wide-band anti-reflection coating and a wide-band dichroic. In each case an effective and robust solution is obtained. In future investigations, we plan to extend our approach to the case in which the refractive indices of the layers are also included as variables in the parameter space.

25


Acknowledgments MPH thanks Charles McLachlan, John Skilling and Steve Gull for many interesting conversations regarding Markov-chain Monte Carlo sampling techniques.

References 1. P. Baumeister, "Design of multilayer filters by successive approximations", J. Opt. Soc. Am. 48, 955­958 (1958). 2. J.A. Dobrowolski and R.A. Kemp, "Refinement of optical multilayer systems with different optimization procedures", Appl. Opt. 29, 2876­2893 (1990). 3. A. Premoli and M.L. Rastello, "Minimax refining of wideband antireflection coatings for wide angular indicence", Appl. Opt. 33, 2018­2024 (1994). 4. J.A. Dobrowolski, "Completely automatic synthesis of optical thin film systems", Appl. Opt. 4, 937­946 (1965). 5. C.G. Snedaker, "New numerical thin-film synthesis technique", J. Opt. Soc. Am. 72, 1732A (1982). 6. W.H. Southwell, "Coating design using very thin high- and low-index layers", Appl. Opt. 24, 457­460 (1985). 7. A.V. Tikhonravov, M.K. Trubetskov and G.W. DeBell, "Application of the needle optimization technique to the design of optical coatings", Appl. Opt. 35, 5493­ 5508 (1996). 8. P.G. Verly, "Fourier transform technique with refinement in the frequency domain for the synthesis of optical thin films", Appl. Opt. 35, 5148­5154 (1996).

26


9. T. Boudet and P. Chaton, "Thin film design using simulated annealing and the study of filter robustness", in Developments in Optical Component Coatings, I. Reid, ed., Proc. SPIE 2776, 27­38 (1996). 10. S. Martin, J. Rivory and M. Schoenauer, "Synthesis of optical multilayer systems using genetic algorithms", Appl. Opt. 34, 2247­2254 (1995). 11. D.G. Li and A.C. Watson, "Global optimization for optical thin film design using Latin Squares", in Optical Thin Films V: New Developments, R.L. Hall, ed., Proc. SPIE 3133, 8­15 (1997). 12. P.G. Verly, A.V. Tikhonravov and M.K. Trubetskov, "Efficient refinement algorithm for the synthesis of inhomogeneous optical coatings", Appl. Opt. 36, 1487­1495 (1997). 13. P.G. Verly, "Optical coating synthesis by simultaneous refractive-index and thickness refinement of inhomogeneous films", Appl. Opt. 37, 7327­7333 (1998). 14. Sh.A. Furman and A.V. Tikhonravov, Basics of Optics of Multilayer Systems (Editions Frontieres, Gif-sur-Yvette, France, 1992), 1­149. 15. W.R. Gilks, S. Richardson, D.J. Spiegelhalter, Markov-Chain Monte Carlo in Practice (Chapman & Hall, London, UK, 1995). 16. M. Kalos and P. Whitlock, Monte Carlo Methods (Wiley, New York, USA, 1986). 17. W.H. Press, S.A. Teukolski, W.T. Vetterlin and B.P. Flannery, Numerical Recipes, 2nd edition (Cambridge U. Press, Cambridge, UK, 1992), 409­413.

27


List of Figure Captions Fig. 1. The system of complex field amplitudes and internal angles of refraction in a multilayer optical coating sub ject to incoming radiation incident at angle 0 . Fig. 2. A transmission line. Fig. 3. The reflectivity R() of the optimal wide-band anti-reflection coating (solid line) given in Table 1. The solid circles indicate the target wavelengths. The error-bars indicate the robustness of the optimal solution, and denote the 68 per cent confidence intervals in reflectivity corresponding to an rms manufacturing error of 1 nm in the layer thicknesses. Fig. 4. The reflectivity R() of the optimal wide-band dichroic coating (solid line) given in Table 1. The solid circles indicate the target wavelengths. The error-bars indicate the robustness of the optimal solution, and denote the 68 per cent confidence intervals in reflectivity corresponding to an rms manufacturing error of 1 nm in the layer thicknesses.

28


substrate

layer j b
j

layer j -1 b
j -1

outer medium b0

a

M+1

a


j

j

a


j -1

j -1

a0 = 1 surface 0



y=0
0

surface M

surface j -1

29


Z

Z

0

30


31


32


Table 1. The optimal wide-band anti-reflection coating Layer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 n d (nm) 1.37 189.17 1.45 9.04 2.10 40.02 1.45 51.79 2.10 266.16 1.45 48.26 2.10 47.40 1.45 121.09 2.10 13.80 1.45 208.66 2.10 25.57 1.45 88.20 2.10 63.52 1.45 31.46 2.10 170.30 1.45 12.30 2.10 87.57 1.45 50.90 2.10 49.61 1.45 88.17 2.10 18.16

33


Table 2. The optimal wide-band dichroic coating Layer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n d (nm) 1.45 305.99 2.10 100.84 1.45 104.63 2.10 113.60 1.45 158.38 2.10 84.53 1.45 161.60 2.10 104.00 1.45 137.75 2.10 102.04 1.45 157.93 2.10 95.87 1.45 143.35 2.10 106.57 1.45 141.00 2.10 94.16 1.45 153.50 2.10 86.24 1.45 120.03 2.10 79.59 Layer n d (nm) 21 1.45 93.36 22 2.10 77.63 23 1.45 120.79 24 2.10 82.69 25 1.45 130.27 26 2.10 83.40 27 1.45 104.77 28 2.10 75.76 29 1.45 102.75 30 2.10 79.21 31 1.45 128.35 32 2.10 81.97 33 1.45 116.87 34 2.10 81.60 35 1.45 88.46 36 2.10 80.11 37 1.45 107.56 38 2.10 71.42 39 1.45 151.87 40 2.10 33.39

34