Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://www.stecf.org/conferences/adass/adassVII/reprints/christliebn.ps.gz
Äàòà èçìåíåíèÿ: Mon Jun 12 18:51:44 2006
Äàòà èíäåêñèðîâàíèÿ: Tue Oct 2 04:17:57 2012
Êîäèðîâêà:

Astronomical Data Analysis Software and Systems VII
ASP Conference Series, Vol. 145, 1998
R. Albrecht, R. N. Hook and H. A. Bushouse, e
Ö Copyright 1998 Astronomical Society of the Pacific. All rights reserved.
ds.
Linn’ e, a Software System for Automatic Classification
N. Christlieb 1 and L. Wisotzki
Hamburger Sternwarte, Gojenbergsweg 112, D21029 Hamburg,
Germany
G. Graúho#
Institut f˜ur Wissenschaftsgeschichte, GeorgAugustUniversit˜at
G˜ottingen
A. Nelke and A. Schlemminger
Philosophisches Seminar, Universit˜at Hamburg
Abstract. We report on the software system Linn’ e, which has been de
signed for the development and evaluation of classification models. Linn’ e
is used for the exploitation of the Hamburg/ESO survey (HES), an ob
jective prism survey covering the entire southern extragalactic sky.
1. Introduction
The Hamburg/ESO survey (HES) was originally conceived as a wide angle ob
jective prism survey for bright quasars. It is carried out with the ESO Schmidt
telescope and its 4 # prism and covers the total southern extragalactic sky. For
a description of the survey see Wisotzki et al. (1996).
A few years ago we started to develop methods for the systematic exploita
tion of the stellar content of the survey by means of automatic spectral clas
sification. A short overview of the scientific objectives is given in Christlieb et
al. (1997) and Christlieb et al. (1998), where also a detailed description of the
classification techniques can be found. In this paper we report on the software
system Linn’ e, that has been designed for the development and evaluation of
classification models.
2. Classification models
A classification model (CM) consists of the following components:
Class definitions are given by means of a learning sample, implicitly including
the number N and names of the defined classes.
1 Email: nchristlieb@hs.unihamburg.de
457

458 Christlieb et al.
Class parameters include the a priori probabilities p(# i ), i = 1 . . . N , of the
N defined classes and the parameters of the multivariate normal distribu
tion of the classconditional probabilities p(#x|# i ).
Classification aim can be one of the following items:
(1) Perform a ``simple'', i. e. Bayesrule classification.
(2) Compile a complete sample of class # target # {# 1 , . . . , #N } with mini
mum cost rule classification.
(3) Detect ``unclassifiable'' spectra, i. e. spectra to which the reject
option (Christlieb et al. 1998) applies. Note that e. g. quasar spectra
belong to this class.
Feature space The space of features in which the search for the optimal sub
set is carried out. Note that in certain cases one may want to exclude
available features beforehand to avoid biases, so that the feature space is
not necessarily identical to the total set of available features.
Optimal feature set for the given classification aim.
Optimal loss factors In case of classification aim (3) a set of three optimal
loss factors -- weights for di#erent kinds of misclassifications -- has to be
stated (Christlieb et al. 1998).
Once a CM is established, it is straightforward to derive from it a classification
rule for the assignment of objects of unknown classes to one of the defined
classes.
The aim of Linn’ e is to permit easy and well controlled access to the vari
ation of the model components and e#ective means to evaluate the resulting
quality of classification. The performance of a model with classification aim (1)
can be evaluated by e. g. the total number of misclassifications, estimated with
the leavingoneout method (Hand 1981); in case of aim (3) the model is usually
assessed by the number of misclassifications between the target class and the
other classes.
3. Description of the system
The core of Linn’ e was implemented in an objectoriented extension to Prolog,
with the numerical routines -- e. g. for estimation of the parameters of the multi
variate normal distributions -- written in C. To facilitate user interaction and to
ensure e#ective control over the model components and performance, a graphi
cal user interface (GUI) for Linn’ e was developed (see Figure 1). After the first
implementation, using Prolog's own graphical library (SWIProlog plus XPCE),
we recently started to switch to Java for reasons of system independence and
remote access via WWW. At present, Linn’ e has a serverclient architecture, the
Prolog server communicating with a Java client through TCP/IP sockets. The
server keeps the learning sample data, read in from MIDAS via an interface and
converted into Prolog readable terms. It is not yet possible to select all model
components interactively via the GUI, so that partly predesigned models have
to be used. They are also provided from the server side.

Linn’ e, a software system for automatic classification 459
Figure 1. Main control panel of Linn’ e. The three upper text fields
show model performance parameters. Below them the feature selection
area is placed (m all5160. . . x hpp2). The automatic feature selection
is controlled by the menus above the Prolog server messages window.
The results of the classification model evaluation are presented on the client.
A confusion matrix and loss matrix window assists the user in the analysis of
the model. The user may then alter components and repeat the evaluation to
improve the model step by step.
The search for the optimal feature set can also be done automatically. Since
the set of available features may easily become too large to perform exhaustive
search among all possible combinations, apart from the exhaustive search a hill
climbing like, stepwise search has been implemented. It can be controlled from
the client side, using di#erent strategies and branching parameters.
Linn’ e also provides a tool for the systematic and e#cient adjustment of
loss factors (see Figure 2).
4. Application of classification models
Once a CM has been established, evaluated, and the evaluation has pleased the
user, its parameters can be exported to MIDAS tables. The classification of
spectra of unknown classes can then be carried out under MIDAS. The typical
computing time for the classification of all spectra on one HES plate -- mapping
5 # â 5 # of the sky and yielding typically # 10, 000 nondisturbed spectra with
S/N > 10 -- is less than 5 min on a Linux PC with a Pentium 133 MHz processor.
So far Linn’ e has been used for some first test applications, i. e. compilation
of a sample of extremely metal poor halo stars and a search for FHB/A stars. It

460 Christlieb et al.
Figure 2. Tool for interactive adjustment of loss factors. The upper
half of the window shows the number of target class spectra which have
been erroneously assigned to one of the other classes in dependence of
the loss factors c target## i
(abscissa) and c # i #target (ordinate). The lower
half shows the same for the target class contamination. The third loss
factor, c # i ## j
, does not have to be adjusted but can be held constant
at a small value.
will be developed further and extended in functionality and will be applied to the
exploitation of the huge HES data base, which will finally consist of # 5, 000, 000
digitised objective prism spectra.
Acknowledgments. N.C. acknowledges an accommodation grant by the con
ference organizers. This work was supported by the Deutsche Forschungsgemein
schaft under grants Re 353/401 and Gr 968/31.
References
Christlieb, N. et al. 1997, in WideField Spectroscopy, ed. Kontizas, E. et al.,
Kluwer, Dordrecht, 109
Christlieb, N. et al. 1998, to appear in Data Highways and Information Flooding,
a Challenge for Classification and Data Analysis, ed. Balderjahn, I. et
al., Springer, Berlin.
Hand, D. 1981, Discrimination and Classification, Wiley & Sons, New York.
Wisotzki, L. et al. 1996, A&AS, 115, 227