Документ взят из кэша поисковой машины. Адрес оригинального документа : http://geo.web.ru/users/pavel/Publications/1992_1998/ARTEN976.DOC
Дата изменения: Wed Jan 16 17:10:35 2008
Дата индексирования: Sun Apr 13 10:31:03 2008
Кодировка:

INTERNET TECHNOLOGIES AND NEW CONCEPTION OF WDCs SYSTEM

P.Yu.Pletchov, (Computer Center, RAS, Moscow, Russia, E-mail: lym@ccas.ru)
Yu.S.Tyupkin (National Geophysical Committee of Russia Federation and
Geophysical Center of RAS, Moscow, Russia, E-mail: tyupkin@wdcb.rssi.ru)

In this technical report we wish to discuss the problem of setting up the
distributed information system as a part of new conception of World Data
Centers system. The idea of creation of distributed information systems for
different branches of geosciences has a long story but only to-day INTERNET
gives a good technological basis to realize it. However, INTERNET itself
does not provide at present a convenient access to distributed information
on Earth sciences research.
The following topics shall be briefly discuss in this report:
1. A virtual thematic network as a part of new conception of WDCs system.
2. General principles of organization of a virtual thematic network.
3. A brief description of Russian Virtual GeoNet.

1. A virtual thematic network as a part of new conception of WDCs system.
Many new sources of geophysical data exclude WDCs appears during last
two decades. For example, now are functioning at least five levels of
sources of data, the archives of which contain information of interest to
seismologists:
-International (International Seismological Center, U.K.)
-Regional (European-Mediterranean Seismological Center, France, ORFEUS,
Netherlands)
-National (Centers of the national seismological services. For example, the
Finish National Data the Center at the Institute of Seismology at
University of Helsinki; the Center at the Joint Institute of Physics of the
Earth, RAS; the National Earthquake Information Center of USGS in USA,
etc.)
-World Data Centers: WDC A for Seismology and WDC A for Solid Earth
Geophysics (USA); WDC B for Solid Earth Physics (Russia); WDC D for
Seismology (China).
-Groups of researchers or individual scientists that offer possibility to
use their data sets and databases. (Datasets managed by the Institute of
the physics of the Earth of RAS or database of geofields of Ural managed by
Geophysical institute of Ural's department of RAS can be mention here, as
an example)
On the other hand during the last years, INTERNET became available to a
large part of the scientific community, thus allowing application of new,
non-traditional for geosciences methods of information resources
availability organization accumulated by different organizations in
different countries. In this process, however, we are confronted with new
problems. We shall mention only two of them concerning the extremely
democrat philosophy of the INTERNET.
One of the main problem is the reliability (the quality) of the
geophysical data circulating in INTERNET, because this information is not
subject to any expert assessment before it appears in INTERNET.
The next important problem is the problem of informational ecology of
INTERNET. The amount of HOMEPAGEs on geosciences grows in geometric
progression. Unfortunately, the information reserved for different
purposes, such as advertising, education, general information for populace,
and finally for scientific research, is thrown pell-mell into one heap in
INTERNET. As the result, about 80% of HOMEPAGEs, which one finds in
INTERNET by a search with help of keywords related with geosciences, are a
information noise for scientists wishing to acquire access to experimental
geophysical data or to other scientific results.
To reach information for scientific studies, it is necessary, as a rule,
to undertake more or less complicated and time-consuming search in
INTERNET.
It seems useful and efficient to systematize the access to information
resources of interest for geophysical community that are available via
INTERNET, and to create, on logical level, a beneficial information medium
for dissemination of data. The first step in this direction has been done
some time ago when thematic HOMEPAGEs began to be organized. For example,
a very useful Homage was created by the National Geophysical Data Center in
Boulder (http://www.ngdc.noaa.gov/) or WDCs system
(http://www.ngdc.noaa.gov/wdc/wdcmain.html). It provides a comfortable
access to HOMEPAGEs and to FTP servers of World Data Centers. The
extremely useful site for seismologists has been organized by ORFEUS
(http://orfeus.knmi.nl/), etc. But the problem of a local site is that it
is response only for information located at the own server and can present
to user only "static" links to information resources located at the severs
of other organizations. Another words it can not function as self-organized
system. As a result, local Homage can not present to user near real time
information about the available information resources distributed in
different countries. We believe that a thematic virtual network is a real
solution of the discussed problem.
ICSU Panel on WDCs stimulated already the discussion: what is the WDCs
system in the era of INTERTNET? Basic functions of the present World Data
Centers system are:
- long term storage of the results of Earth sciences observations;
- free and convenient access of scientists to information circulated
in the WDCs system;
- ensure reliability (quality) of available data. (Formally, WDCs are not
response for quality of data, but really users believe that WDC gives them
a correct data.)
We believe that new WDCs system must flexibly joint as many sources of
thematic information resources available by INTERNET as possible.
Geographical location of a dataset available by INTERNET is not important
for user. A dataset may be managed by an International data center, by a
National data center, by staff of a research institute, etc. It is
important only that this body agrees to follow the standards
which we named World Data Center. It is possible to realize only if these
standards be simple, understandable and flexible.

2. General principles of organization of the virtual thematic network.
The main goal of the thematic virtual network is composed of information
distributed among several independent servers. The information located on
each of these servers is updated independently of the rest of the servers
and can differ by subject, mode of direct access to information resources,
etc.. The persons that maintain these servers or data banks are free to
make any changes without special coordination with the administrator of the
virtual network. The description of information resources that are
available for users of network is updated automatically in near real time
regime.
From our point of view a thematic virtual network must be a certain
organic unity with three layers of information resources overlain by
different logical organization schemes. These layers are: an internal
layer, an intermediate (or buffer) layer, an outer layer. Every information
layer is a set of information resources united by methods of organization
of information and by the mode of their concordance.
The internal information layer is composed of those information
resources that have passed the reviewing system and correspond to the
internal standard of the virtual network. This internal standard is
intended for strict correlation of information resources deposited in the
virtual network and for their unique representation in the net. The experts
of the virtual network and the owners of information resources are
responsible for the authenticity of the data of this information layer. In
particular, quality of data available in this layer have been estimated by
independent experts.
The intermediate (or buffer) information layer is composed of those
information resources that are prepared by their owners for the internal
information layer. This layer has an extremely branched structure, which
provides for the owner of the information resource different possibilities
for entering information about his resource into the central metadata base
of the network. The owner of each information resource is responsible for
the authenticity of the data in this information layer. The transition of
the information resource from the buffer information layer into the
internal one is not associated with reorganization or transfer of data, but
mainly depends on the degree of their reliability and correspondence to the
internal standard of the virtual network.
The outer information layer is composed of those information resources
that are organized and maintained independently of the structure of the
virtual network. In practice, this layer is composed of the list of links.
The experts of thematic virtual network analyzed resources available via
INTERNET, compile preliminary list of links to external resources and send
this information to the reference system of network. The reference system
itself corresponds to the internal standard of the virtual network and
enters automatically into the internal information layer. Reference system
regularly checks all available external links. If any link is not response
it is registered in central metadata base and expert is informed about this
problem.
In such a way, the scheme of virtual network described above is a
practically self-organizing system allowing to minimize manual operations
for maintenance of large information systems and information projects.
3. The experience of construction of Russian Virtual GeoNet.
The project "Virtual network for geosciences" is realized in Russia now
under the sponsorships of the Russian Basic Research Foundation and of the
National Geophysical Committee of RF. (Principal investigator Dr.
P.Yu.Pletchov, E-mail: lym@ccas.ru). The experience of the realization of
this project and some technological solutions can be used for creation of
the virtual network of WDC system.

RUSSIAN VIRTUAL GEONET (http://www.geo.web.ru)
The main goal of this service is the organization of comfortable access of
Russian scientists to distributed datasets and software related with
geosciences. The special software has been written for this net, which
service all pages automatically.

Brief description of the Russian Virtual GeoNetwork (RVGN)
1) Common database of resources.
The description of all WWW-pages and ftp-archives of GeoNet are registered
in the central database. The database contains the description of URLs
(Universe Resource Location) such as "title", "description", "author",
"classification", "keywords". The special cron program regularly scans all
resources of GeoNet. The corresponding records of the central database are
refreshed if any changes of information resources of GeoNet are invented.
In such a way a user has a near real time information about the resources
available via GeoNet.
2) Creating the new web page.
Each user can create pages on his server or on the central server:
- If a user makes web pages on his server, he has to insert (optional) a
special hidden metadata into his pages. To register pages in GeoNet user
has to enter the central server and to fill in a simple registration form.
User's URL will be checked by the automate and be added to the common
database.
- A user can create FREE Web-pages on the central server. In this case he
has to fill in special forms and the first (sample) web page will be
created automatically. The server will insert also the description of this
page in the common database automatically. The server creates a new
directory for each new user, but it does not create a new UNIX user. Users
of RVGN are virtual, only for this system. It is a helpful feature for
server security. User is free with manipulations of information in his
directory. He can edit his pages, add new pages, put zip-files, etc.
3) Editing pages by users.
If page or another resource is located on the central server, one can use
the special FileManager. FileManager realizes a WWW-interface to edit
information in user's directory. The call of the FileManager is permitted
for authorized users only, and the FileManager automatically detects the
directory available for editing for each user.
The FileManager realizes the following manipulation with information in
user's virtual directory on the central server:
- to copy file to file,
- to remove or to delete files,
- to create new files,
- to edit any text file by HTML-in-HTML editor.
The FileManager allows also to get files to this directory from your
machine (by FTP or HTTP mirroring). In this case, an author can edit the
files on his home machine, then the server gets these files or directories
from your home machine and puts them to the directory on the server. It is
a very sophisticated program, which compares two directories and downloads
only the changed files.
4) Editing descriptions of an information resource by user.
The descriptions of properties of each resource (URL) of GeoNet are stored
in the central database (Postgresql is used now). The cron program scans
all resources of GeoNet daily. If user changes a description or keywords of
his resource (software or dataset), he has to change the META fields in
corresponding HTML file and these changes will be reflected in the central
database automatically.
5) Protection against junk.
There is some junk in any automatic systems. GeoNet has 3-level of
information: external, buffer and internal. A buffer level is open for
everybody, but information is moved from buffer level to internal level
only after recommendation of experts. There are some additional programs,
which examine resources of GeoNet for junk and report the results to
experts by E-mail.