Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass96/reprints/vasilyevs.pdf
Дата изменения: Thu Jan 15 01:12:35 1998
Дата индексирования: Tue Oct 2 15:31:06 2012
Кодировка:

Поисковые слова: asteroid
Astronomical Data Analysis Software and Systems VI ASP Conference Series, Vol. 125, 1997 Gareth Hunt and H. E. Payne, eds.

A Computer-Based Technique for Automatic Description and Classification of Newly-Observed Data
S. Vasilyev SOLERC, P.O. Box 59, Kharkiv, 310052, Ukraine Abstract. A technique allowing automatic representation by a relatively small numb er of indep endent parameters based on the principal comp onent analysis of data sequences is presented. In some instances the parameters can serve as an indep endent description, classification, and compression of the observational results.

1.

Introduction

In recent years the sp ectrum of observed astronomical data can b e characterized as greatly varied. In particular, this can b e explained by the rapid growth in the numb er of ob jects studied and the app earance of new data typ es due to the progress in space-based observations. In this instance, the problem of initial description and classification of newly-observed data b ecomes most urgent, esp ecially if there is a lack of preliminary observational material and theoretical exp ectations. The literature developing methods to treat statistical data sequences is ample but many studies are based on the face of data images and have difficulties determining the minimum set of indep endent parameters appropriate for further analysis. The well-known principal comp onent method of the multivariate statistical data treatment can b e extended in order to obtain a tool for determination of the indep endent parameters applicable for reliable data representation and further comprehensive analysis. 2. Prop osed Approach

The distinction of the approach consists in the direct use of the observed data records as input parameters in comp osing the initial and covariance matrices involved in further analysis. Each of n observational dep endencies forms a row in the initial matrix and is represented by a vector in m-dimensional space in accordance with the numb er of observed dep endencies and the numb er of p oints on each curve, resp ectively. These n vectors determine the dimension of the covariance matrix and thus the numb er of its eigenvectors and eigenvalues. The eigenvectors are orthogonal and, consequently, each row of the covariance matrix, as well as that of the initial one can b e only expressed by their linear combination. It is advisable to normalize the eigenvectors by dividing them by their lengths, which are equal to the square roots of the corresp onding eigenvalues. This puts the eigenvectors on the same scale. 155

© Copyright 1997 Astronomical Society of the Pacific. All rights reserved.


156

Vasilyev

Finding the eigenvalues often makes it p ossible to represent the initial matrix with needed accuracy by taking a linear combination of a relatively small numb er of principal comp onents corresp onding to the largest eigenvalues and so b earing most of the information on the data (Genderon & Goddard 1966). The other principal comp onents are usually resp onsible for the random noise in observations and can b e neglected from consideration (Lorge & Morrison 1938). The quality of data representation can b e controlled with a test matrix comp osed of linear combinations of the principal comp onents multiplied by the corresp onding eigenvalues. Generally, analytical functions for the selected principal comp onents can b e found by fitting, and we obtain the data b eing analytically represented in addition. In this case, all of the initial dep endencies can b e easily calculated as linear combinations of the functions, which are presumably describ ed by a minimum parameter set. We have develop ed an interactive computer package which can automatically describ e some kinds of data by principal comp onents. The software also p erforms the preliminary data fitting on the measurements, which are not uniformly tabulated observational records. The procedure of finding the largest eigenvalues and corresp onding eigenvectors, or principal comp onents, is based on the algorithm presented in Simonds (1963). The approach has b een successfully applied to describ e all of the variety of the asteroid p olarization phase dep endencies (Vasilyev 1994) and tested for some other typ es of data. We have found two principal comp onents which are adequate to represent any p olarization curve b elonging to the analyzed assemblage and even for data not involved in the initial analysis. The corresp onding eigenvalues 1 and 2 can b e considered as new parameters instead of the widely used system of four interdep endent parameters (Pmin , min , 0 , and h), and are more suitable for further analysis (Vasilyev 1996). Expressions obtained for the principal comp onents can b e used to describ e new data and gappy observational dep endencies. In the case of asteroids, the method allows the synthesis of the p olarization curves using only three observations and shows a b etter fit to the data compared to other fitting techniques tried. The p ower of this method is its intrinsic ability to find the principal p eculiarities appropriate to all the data under study. It can b e efficiently used for restoring truncated data records and rationally planning further observations. Although the principal comp onent method itself does not imply knowledge of the physical nature of the analyzed data, it often allows connection of the principal comp onents with the physical parameters of the ob jects. In particular, we have found that b oth of the principal comp onents of the family of asteroid p olarimetric phase curves have physical meanings and the fit correlations were determined. Additional useful p ossibilities give the correlative diagrams of the largest eigenvalues corresp onding to the first principal comp onents. These diagrams do not exhibit any mutual correlation b etween the eigenvalues, of course, as it is required by the technique. However, they may reveal the differences in prop erties among the analyzed data and can b e successfully used for indep endent classification of the studied ob jects (Tholen 1984; Vasilyev 1996). As the numb er of the principal comp onents and corresp onding eigenvalues used for the data representation is usually much smaller than that of the observed dep endencies, we obtain an alternative tool for compact data storage. The problem of finding the balance b etween the required accuracy of the data


Automatic Data Description and Classification

157

restoration and the needed compression ratio is the sub ject of a separate study. Our preliminary results show that the use of the principal comp onent technique can reduce the data volume by up to several times (Vasilyev 1995). In the instance of the asteroid p olarimetric data the compression ratio was increased to a factor of five, while the differences b etween the initial and restored data did not exceed the errors in observations. Furthermore, as the principal comp onents keep the data structure, this ratio can b e increased by the subsequent application of any other archiving software. 3. Conclusion

The technique based on the principal comp onent analysis of the data records may serve as a p owerful tool for the initial statistical data treatment allowing data inter/extrap olation, analytical representation, classification, and compact storage. Among the advantages of the technique are the stability of the obtained eigenvectors when adding new data and the minimizing the rms errors in data representation. It is imp ortant that the application of this approach does not require any a priori assumption, either ab out the ob jects or ab out the physical mechanisms under study. In order to make p ossible such a multipurp ose application of the technique for some typ es of the newly observed data in the automatic mode we are currently developing an integrated program package PCMAD (Principal Comp onent Method for Astronomical Data) including the most of the describ ed features. It should b e noted that the method has no sp ecial requirements of computer p erformance except during the first stage of its application when the matrix op erations are p erformed. References Genderon, R. G., & Goddard, M. G. 1966, Photogr. Sci. Eng, 10, 77 Lorge, J., & Morrison N. 1938, Science, 87, 491 Simonds, J. L. 1963, J. Opt. Soc. America, 53, 968 Tholen, D. J. 1984, Ph.D. Thesis, Univ. of Arizona Vasilyev, S. V. 1994, BAAS, 26, 1173 Vasilyev, S. V. 1995, Vistas in Astronomy, 39, 275 Vasilyev, S. V. 1996, Ph.D. Thesis, Kharkiv St. Univ., Ukraine