Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://www.adass.org/adass/proceedings/adass03/P2-20/
Дата изменения: Tue Aug 17 02:31:07 2004 Дата индексирования: Tue Oct 2 05:24:41 2012 Кодировка: Поисковые слова: http astrokuban.info astrokuban |
This paper describes the Dataset Verification and Linking efforts underway among the NASA Archives and Data Centers, the American Astronomical Society (AAS), and the University of Chicago Press (UCP, publisher of ApJ, AJ and PASP). This activity has taken place under the auspices and guidance of the NASA Astrophysics Data Centers Executive Council (ADEC), and aims at fulfilling the promise of further integrating the astronomical literature and the on-line data it is based upon.
The NASA Astrophysics Data System (ADS) is developing the tools needed by publishers and users at large for both dataset verification and linking through stable, top-level services that can be maintained for the foreseeable future. Links created to datasets from on-line manuscripts will always refer to a dataset via a URI created using a well-defined identifier, and the URI will be turned into one or more URLs in real-time by a central resolver provided by the ADS. This will provide a high level of reliability and persistence to the links, as well as providing an upgrade path into any future Virtual Observatory (VO) efforts in this direction. Dataset citation, verification and linking will work as follows:
ADS will take the responsibility of maintaining services that are aware of all relevant datacenters that may have datasets available on-line, and datacenters profiles indicating which datasets are available from each of them.
In order to allow easy integration of this effort in the emerging VO framework, the ADEC has decided to adopt a syntax for the dataset identifiers which is consistent with the current International Virtual Observatory Alliance (IVOA) Dataset Identifier draft (Plante et al 2003). This adoption will facilitate integration of these identifiers and the tools that manipulate them in the VO.
According to the IVOA Identifiers Draft, the general URI format for an individual identifier is a string of the kind: ivo://AuthorityId/ResourceKey#PrivateId. While we refer the reader to the draft for a full explanation of the syntax, a few things are worth pointing out:
Given the fact that much of the VO infrastructure is still under design and development, the ADEC has decided on a specific recommendation for referring to dataset identifiers in the astronomical literature. The general form of these identifiers is: ADS/FacilityId#PrivateId. Comparing these identifiers with the general IVOA syntax we can make the following observations:
All Data Centers and Archives which provide public access to their data should structure their databases and interfaces so that when a particular dataset is released to the public, it is uniquely tagged by an identifier ID created as discussed above. Users who download such a dataset should be made aware of the identifier associated with it and how it should be referenced in the published literature. In order for a datacenter to ensure that the identifiers it is generating comply with the syntax endorsed by the ADEC, the following must occur:
Once a datacenter has published a dataset ID, it should provide access to it. This should be a human-readable page on its web server displaying the dataset's relevant metadata and offers the user the option to download the dataset itself in some form or fashion. It is left up to the datacenter to decide what to do if and when a revised version of a particular dataset is published. In general, however, it is understood that access to the latest revision of a dataset should be an option if not the default.
In order to promote an open framework that can be used for the distributed verification of dataset identifiers across data centers, the ADEC ITWG (Interoperability Technical Working Group) has created the specification for a SOAP-based web service. The corresponding WSDL file can be used to generate client and server interfaces to the service. Each datacenter providing data verification services should provide and maintain a service that abides by this specification.
In order for the ADS to coordinate the verification and linking of dataset identifiers to the appropriate datacenters, it is necessary for the datacenters to provide some basic metadata about its data holdings and services. While it is expected that the appropriate metadata will one day be made available by a public VO registry, its format and access methods are at this time not available. As an intermediate solution to the problem, we require that the data centers maintain a simple profile which will provide the ADS with the necessary metadata to maintain a central verification service that fans out queries to the appropriate datacenters (during the verification phase) and links to the individual datasets (during the link resolution phase).
The data center profile is a simple XML document that lists the data center name and description, the name and email address of the person responsible for the maintenance of the profile, the URL of the web service to be used for dataset verification, and the list of facilities that the datacenter has data for. The central verifier service will only attempt to verify and link a dataset identifier with a datacenter if its profile indicates that the datacenter archives the appropriate data collection.
To facilitate the deployment of verification services, the ADS also developed a PERL toolkit that greatly simplifies the creation of a compliant web service. Among other things, by defining a few variables and installing a simple CGI script based on this toolkit a system manager will be able to automatically define his/her site's profile described above. For more information, please see the project's description available at http://vo.ads.harvard.edu/dv.
Plante R. et al. 2003, IVOA Identifiers Working Draft v.0.2 (30 September 2003), http://www.ivoa.net/Documents/WD/Identifiers/WD-IDs.html