|
Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://www.naic.edu/~isradar/stm/stm.html
Дата изменения: Fri Sep 27 23:27:48 1996 Дата индексирования: Tue Oct 2 00:53:15 2012 Кодировка: Поисковые слова: http astrokuban.info astrokuban |
Introduction to Spectral Analysis and the Evaluation of a Physical Model
Michael P. Sulzer
Arecibo Observatory
Presented at the 1996 Cedar Meeting Boulder, Colorado
in a slightly different form
There are three parts to this poster:
1. An introduction to spectral analysis, considering the spectrum and the autocorrelation function, and their relationship. It also discusses the random errors on estimates of these functions. pages: 2, 3, 4, 5, 6, 7, 8, 9, 10 .
2. The comparison of data to a model, considering how an evaluation of the quality of the model can be made. pages: 11, 12, 13, 14.
3. An application of 2. (using the results of 1.) to the problem of analyzing incoherent scatter radar data. It is shown that in the case of high signal to noise ratio data, fits to the spectrum are easier to evaluate than fits to the ACF. pages: 15, 16.
Through out this poster we shall receive the advice of Shannon the monkey. Shannon was a member of the team that spent many lifetimes of the universe attempting to duplicate the works of Shakespeare by random typing. Thus she is particularly well-qualified to comment on the practical nature of stochastic processes.

-2- (Go to Top)
1. Introduction to Spectral Analysis
About the spectrum and autocorrelation function
One of the most useful concepts in science and engineering is that of the power spectrum. Consider any signal; it might be a function of time, but not necessarily. Construct a set of filters, each of which passes sinusoidal signals with a narrow range about some center frequency. Define the center frequencies so that they cover some larger range of frequencies uniformly. Take the outputs of the filters at some time t, square them. The resulting set of numbers is an estimate of the power spectrum. It is an estimate for two reasons: first, we have found samples of the actual spectrum at only a finite number of frequencies, while the spectrum is actually a continuous function, and second, our signal is statistical. If we repeated this operation many times and averaged the results, we would gradually build up an increasingly accurate estimate of the power spectrum. If we want more points in the spectrum, it takes longer to get each estimate, since the filters are narrower and independent estimates occur less frequently. Thus we already see two problems in practical spectral analysis:
1. determining how often to sample the spectrum,
2. determining the statistical accuracy of the estimates.
Consider another way of looking at the same information. For simplicity consider
a spectrum composed of just two frequencies, but fold it about zero frequency so that it has symmetry (convenient for a real spectrum). It looks like this:
One of the most useful concepts in science and engineering is that of the power spectrum. Consider any signal; it might be a function of time, but not necessarily. Construct a set of filters, each of which passes sinusoidal signals with a narrow range about some center frequency. Define the center frequencies so that they cover some larger range of frequencies uniformly. Take the outputs of the filters at some time t, square them. The resulting set of numbers is an estimate of the power spectrum. It is an estimate for two reasons: first, we have found samples of the actual spectrum at only a finite number of frequencies, while the spectrum is actually a continuous function, and second, our signal is statistical. If we repeated this operation many times and averaged the results, we would gradually build up an increasingly accurate estimate of the power spectrum. If we want more points in the spectrum, it takes longer to get each estimate, since the filters are narrower and independent estimates occur less frequently. Thus we already see two problems in practical spectral analysis:
1. determining how often to sample the spectrum,
2. determining the statistical accuracy of the estimates.
Consider another way of looking at the same information. For simplicity considera spectrum composed of just two frequencies, but fold it about zero frequency so that it has symmetry (convenient for a real spectrum). It looks like this:

Then take its Fourier transform* and get this (positive time only shown):

This is the sum of two cosines and it is the autocorrelation function corresponding to the above spectrum. This is a less compact form; true enough, but suppose the spectrum were a Gaussian; then the ACF would also be a Gaussian and thus just as "natural" a representation.
*or the cosine transform if we did not fold the spectrum about zero f.
-3- (Go to Top)
How to make a digitally computed spectrum
It is very natural to sample a signal digitally and then compute the spectrum of this digital signal using a discrete version of the Fourier transform, usually the FFT. Consider the following question: under what conditions is the spectrum computed in this way a good estimate of the spectrum of the signal?
In order to illustrate the problems that might be encountered, make the ACF of this waveform (decaying cosine) in two ways:

The first first way is the direct approach in which all of the delayed products are computed, giving:

The second way uses the indirect approach in which the first step is to calculate the power spectrum, and the second step is to transform the spectrum to get the ACF. To Calculate the spectrum, take the power in the FFT of the waveform:

The positive half only of the spectrum only is shown; the negative half merely repeats the positive part. There is a problem! We got 256 points in the directly computed ACF, but only 129 in the directly computed spectrum. Clearly we cannot get the ACF from this spectrum. However, if we carry out the calculation, we get the waveform shown on the next page.
-4- (Go to Top)

Shannon says:![]()
The spectral estimate computed directly from the FFT is under sampled by a factor of two. It does not reproduce the shape of the process correctly. However, it might be good enough in some cases, so do not go to fancy techniques and window shapes without thinking about what you are doing very carefully.
If one wants to do the computation using FFTs, often the most efficient way, it is necessary to pad the data with zeros before computing. Techniques have been developed for the most efficient computation of the ACF by means of FFTs.
It is important to realize that even if one does not want to use the ACF for anything, the use of a spectral estimation technique which allows the recovery of the ACF (if wanted) means that one is getting all that one should from the data. Otherwise one might be losing useful information.
When not to worry about zero-padding: Suppose one is in a situation where it is easy to compute with good frequency resolution. This is when each peak in the spectrum is covered by several or more samples so that there are no abrupt changes from one frequency to the next. This is equivalent to saying that the ACF becomes very close to zero before the end of its sampled range. By very close to zero we could mean into the noise, for example. This means that the folding and adding described on the last page would not significantly modify the ACF. In this case, the spectral estimate computed directly from the FFT without zero padding would contain all the useful information about the process.
Shannon Concludes:![]()
When the spectral estimates are computed with good resolution, as defined above, it is perfectly acceptable to use an FFT with no zero padding.
An example of a situation when this conclusion applies is radar work in the lower atmosphere where so-called pulse-to-pulse correlation can be used. There is nothing preventing one from computing an FFT over a range of pulses that allows good frequency resolution to be obtained. In fact, one could use the entire period over which one would normally average for a single FFT.
An example of when this conclusion does not apply is incoherent scatter radar in the F region. There is a conflict: the pulse must be long for good frequency resolution, but short for good range resolution. A compromise is necessary, and as a result, one must take care to do the spectral analysis properly. This means computing the ACF, or using FFTs with zero-padding.
Spectral Estimates based on a uniformly weighted ACF
The FFT technique considered above gives ACFs which are tapered to zero at the longer lags. This is the natural from which results when the FFT is used in the simplest way: there are fewer lag products at the longer delays. However, it is possible to use FFTs to compute ACFs that have of the same number of lag products at all delays. Although we do not describe the computational technique used to make such spectra, it was used in the following simulations.
-6- (Go to Top)
Statistical Errors in Spectral Analysis: Using Simulations
When we use spectral analysis to study a process, it is because the data we obtain do not have any direct meaning, but it is only certain statistical properties of the data that are important. The estimates of these quantities always have associated errors, and it is usually important to understand the nature of the errors . This can be done by straightforward analysis, but it is often much more instructive to simulate the random process and look at the statistics "experimentally". We consider a simple random process and then look at the nature of the fluctuations on computed estimates of the spectrum and ACF of the process. To simulate the process, we begin with a sequence of Gaussian random numbers; we will label the independent variable time, although it could be anything.

Shown above are 200 independent points, a tiny fraction of what we need in order to do a good simulation. Shown below are an impulse response and the result of convolving the impulse response with the random sequence above.

The convolution can be thought of as the following operation:
1. Resolve the sequence of random numbers into a linear sum of impulses, where each impulse is scaled by the value of the corresponding random number.
2. Separate the sum into an individual function for each impulse.
3. Replace each impulse by the impulse response, scaled by the value of the impulse.
4. Recombine the individual functions by adding.
The result i n this case is to give a smoothed function in which there is significant correlation between neighboring points.
-7- (Go to Top)
The final step in constructing the sample of the random process is to add some purely random noise. Most experimental systems, radars, optical instruments, etc., have a significant component of additive random noise. The result is shown below.

The next thing to do is to find the spectrum and ACF. 100,000 32-point sections of the process were summed to give these figures:

Note that the spectrum and the ACF display information about the random process in different ways. The large central component of the spectrum results from the correlated part of the process, while the flat part results from the additive random noise. The spectrum of random noise is flat because it contains all frequencies with equal probability. In the ACF the narrow spike at 0 delay is the effect of the random noise, while the rest of the ACF is the result of the correlated part of the process. Note that it resembles the impulse response. The ACF of random noise is zero except at zero delay because each point is unrelated to the others.
-8- (Go to Top)
Next we look at the fluctuations of our spectral and correlation estimates from the means. For the means at each frequency or delay we use the accurate estimates shown on the last page, and compute some new examples using only 1000 sections of the random process. Several examples of these fluctuations are shown below.

On the left are spectral fluctuations from the mean, while those on the right are those of the ACF. Spectral fluctuations appear uncorrelated from point to point; they also appear to have magnitudes that are proportional to the spectral values. The fluctuations on the ACF appear to be composed of the superposition of a correlated part and a smaller uncorrelated part, similar to the process itself.
We compute estimates of the ss as a function of frequency for the spectrum and time for the ACF using 100 estimates of each; they are found by squaring and adding the 100 estimates and then dividing by 100 and taking the square root, giving an accuracy of about 10%.

The sigmas of the spectrum have almost the same functional shape as the spectrum itself, and so they are almost proportional to the spectrum. The sigmas of the ACF are highest where the correlation is large, but vary little over the entire range. Thus the estimates of the sigmas confirm what we see in the plots of the fluctuations above. It is beyond the scope of this poster to explain the properties of the sigmas in detail.
-9- (Go to Top)
Finally, look at the correlations between the the spectral fluctuations at different frequencies, and the fluctuations in the ACF at different delays. This is fairly complicated to present, and only a small part of the information is shown. The two plots below show the correlation of the zero frequency fluctuations with those at other frequencies, and the correlation of the lag at zero delay with lags at other delays.

The spectral fluctuations show very low correlation. It would be expected to be zero only for a perfectly flat spectrum. Consider for example the extreme case of a very narrow (nearly sinusoidal process) which is much stronger than the random noise. The spectrum of the process has a sin x/x functional shape, and variations in this process would show up (and be correlated) at all frequencies where the sin x/x is significant compared to the random noise. In the example above we have sufficient frequency resolution so that the correlation is small.
On the other hand, the correlation on the fluctuations of the ACF between the zero lag and other lags is significant whenever the correlation is large. This is in agreement with the appearance of the plots at the top of the last page. Some of the effects of this correlation are considered in the next section.
-10- (Go to Top)
Shannon's Discussion
Simulations are a useful way to learn the basic properties of a random process. There is nothing here that cannot be derived easily; however seeing the simulation at each step suggests what is important and how to proceed to the next step.
Summary of the results of the this section:
1. The random processes that we need to study are often correlated in time, and the nature of this correlation characterizes the process.
2. Observations of such processes usually contain a component of random noise added to the samples of the process under study.
3. Sometimes the random noise is the dominant source of variation in the process. This leads to very simple statistics, but of course it is an undesirable situation because we want the signal to be strong since the additive random noise can only make the uncertainties larger. However, the case where the signal dominates is more complicated.
4. When the signal to noise ratio is high, the fluctuations in the ACF are correlated when the correlation is high. In the next section we see that this leads to some unpleasant effects in evaluating the comparison of a model to data. The spectral estimates of a process are not necessarily free of correlation, but if we have enough frequency resolution to adequately analyze the process, the correlation is low.
-11- (Go to Top)
2. Data and Model: Evaluating the Comparison
About chisquare
Assume that we have a set of data points, forming an estimate of some process which we wish to investigate. Also we have a mathematical model for the process, one that we can evaluate at the same times, frequencies, or whatever, as the data. The data points have something that the model points do not: random errors, or "noise", a natural term for the effect in an audio communications channel. If there were no noise, then one would just subtract the data from the model and form some quantity that characterizes the differences; the sum of the squares is an obvious and common characterization. Then one would adjust the parameters of the model until the sum goes to zero. With noise present, as it is in all real cases, the sum does not go to zero: if the model is perfect, the differences are random, and the sum of the squares of random numbers is always greater than zero. If the variances are known, then the sum has a definite expected value. It is convenient to divide the square of the difference between each data point and the corresponding model point by the variance of the data point. Then the expected value of each ratio is unity, and that of the sum is just the number of data points. This we call chisquare.
Shown below are two sequences of chisquare values, one resulting from a sequence spectral estimates of the process described in the last section, and the other from the corresponding ACFs. The models are the very accurate spectral and ACF estimates using 100,000 32-point sections; the data are the estimates using 100 32-point sections. The 33 points in the right half of the spectrum, and in the positive real part of the ACF were used.

-12- (Go to Top)
The two sequences of chisquare estimates on the last page page are quite different even though they are based on the same data. The explanation is that there are fewer independent samples in the fluctuations of the ACF than of the spectrum. This can be seen clearly in the plots of the fluctuations (Figs. 14 and 15) and the correlations in the fluctuations (Figs. 18 and 19). Both sequences have the same expected or average value, but it is more likely that a chisquare estimate will be found near the expected value if it is made from data that are uncorrelated. This does not means that spectra are somehow superior to ACFs. It does mean that if we have a process which separates into two parts when displayed as a spectrum, as the process considered here does, then this is probably the better domain to use in the analysis. It is possible to take into account the correlations and do as well as one could regardless of the domain. However, techniques for doing this are more complicated than the relatively simple ones discussed here.
Consider the possibility that the "model" is not correct, but that a new model, with a small bump in the right half of the spectrum is actually the correct model. This is shown below. If we substitute the new model for the old, what is the difference in chisquare for the spectral and ACF estimates? These are also shown below; the blue lines are the for the original model, and the red lines are for the modified model.

-13- (Go to Top)
It is easy to understand why chisquare is sensitive to the bump for the spectrum and not for the ACF.
The Spectrum: The fluctuations for the bump are isolated from the large fluctuations of the lower frequency part of the spectrum. The sigmas are very are small at the bump, and thus the relevant terms in chisquare are given large weight. (We divide by sigma squared.) Thus the bump receives its proper statistical weighting and we are limited only by the size of the fluctuations at the bump.
The ACF: The ACF of the bump is a signal which oscillates about zero and decays to near zero in a few oscillations. Its fluctuations are similar in nature. They are added to the much larger and highly correlated fluctuations present from the rest of the signal (Fig. 15). The effect of the bump is to make some of these highly correlated numbers larger and some smaller. Thus the effect in the sum of the squares is very small.
Shannon concludes:![]()
We have seen two effects on chisquare as a result of correlation between the data points:
1. Although the mean value of chisquare remains the same, the size of the fluctuations about the mean increase. In the case of a very high correlation, the loss in the reliability of chisquareas a measure of the quality of the fit can be large.
2. The more important effect is the loss of sensitivity to changes in the model. In the case considered, the difference between working in the domains with and without correlation is huge. The effect considered is easily measured with the spectrum, but is hardly detectable with the ACF. Note what is not being said:
It has not been shown that the parameters resulting from a least squares fit would be any different in the two domains. That is a different question.
It has not been shown that some estimator other than chisquare would not work well with correlated data. In fact it is obvious that one can find such an estimator. For example, one could transform the ACF fluctuations to a spectrum, and then square and sum.
However, it is advantageous to work in a domain where the statistics are simple and the effect of changes in the model are apparent.
-14- (Go to Top)
About Goodness of Fit: Q
When one compares data to a model, it is necessary to evaluate the goodness of fit. The parameter discussed in the last section, chisquare, is a step in this direction, but by itself it does not provide a quantitative indication. For this, we need something like a probability. We have a model; we have data, and we calculate chisquare. What is the probability that a value for chisquare as large or larger than a given one could have been obtained by chance, assuming that the model is good? This is a question that we can answer, and the resulting parameter is known as Q. Assuming the ss are correct, a very low value of Q indicates that the model is bad. After all, if it is unlikely that the value of chisquare occurred by chance, then we must look for some other cause. Q is defined such that the most likely answer, given a correct model, is .5. Values consistently higher than .5 indicate that the ss are bad. Q is a function of chisquare and the number of degrees of freedom. In a comparison to a model, the number of degrees of freedom is the number of data points minus the number of parameters. A family of curves for log10 of Q is plotted below against chisquare. The parameter is the number of degrees of freedom

How bad is the model with the bump? From figure 23, we see that chisquare is about 80. Following the curve for 30 degrees of freedom, we see that Q is about 10-6. The value for no bump is about 0.5. For the ACF both models have values of about 0.5, showing the uselessness of the ACF domain for studying this process with chisquare.
-15- (Go to Top)
3. Application to Incoherent Scatter Data
It is the object of this section to demonstrate briefly that the ideas presented in the two previous sections apply to Arecibo incoherent scatter data. The figure on the next page is from the Arecibo topside ionosphere. It has the following characteristics:
1. It is high signal to noise ratio data. (It is plotted on a log scale, and so the fluctuations appear constant and small until the noise baseline is reached.)
2. There are two regions in the spectrum that separate nicely, showing the width of the O+ part of the spectrum and the H+ part, and of course on the third level we see the noise fluctuations.
3. Fits to two different models are shown, with small visible difference but a huge difference in the Q for the two models.
Since the level of the H+ part of the spectrum is much less than that of the O+ part, we have a situation very similar to the one simulated in the last section. If the data were analyzed as an ACF and chisquare was used as the estimator, then the difference between the Q values in the two cases would be much smaller and would be wrong.
Shannon has one quote from Numerical Recipes (Press et al. 1992) that she would like to share with everyone:
To be genuinely useful, a fitting procedure should provide (i) parameters, (ii) error estimates on the parameters, and (iii) a statistical measure of the goodness of fit. When the third item suggests that the model is an unlikely match to the data, then items (i) and (ii) are probably worthless. Unfortunately, many practitioners of parameter estimation never proceed beyond item (i). They deem a fit acceptable if a graph of the data and model "looks good." This approach is known as chi-by-eye. Luckily, its practitioners get what they deserve.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C, The Art of Scientific Computing, Second Edition, Cambridge University Press, 1992
-16 - (Go to Top)