Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.eso.org/~qc/dfos/calSelector.html
Дата изменения: Mon Dec 15 16:27:42 2014
Дата индексирования: Sun Apr 10 00:45:35 2016
Кодировка:
calSelector

Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable New: See also:
 

This documentation refers to calSelector v2.0. It documents the design, architecture and usage of calSelector from the QC perspective.

tools related to calSelector:
calselManager
| createCalibMap | verifyAB | writeBreakpoint

general overview of calSelector here

OCA: rules | syntax

topics: basics | rules and versions | testing | calSelector and dfos | syntax | calibration maps | virtual products | certified flag | validity

Introduction to calSelector

For the transition from the old to the new version of calSelector, check the migration description here. There is also the design document available here.

[ top ] Basics

calSelector is the DFS tool linked to the archive interfaces to deliver associated calibrations for user-selected science files. The tool is based on OCA rules which are provided and managed by the QC group. To support this task, and to continuously monitor the results, the calSelector tool is also installed on the QC machines. There are several tools in the DFOS tool suite related to calSelector. This page documents the high-level functionality of calSelector.

The tool is called for a single dp_id, or a set of dp_ids, and looks for associations in a database. The associations are defined by a set of OCA rules. These OCA rules come in Raw2Master (R2M) syntax, similar to the DFOS_OPS rules. The tool returns a result set in form of a XML file. It has the flag 'complete' if all associations can be satisfied.

By default the users receive master calibrations (mcalibs) only (plus those raw files configured as RASSOC), and only those that are necessary to execute the science recipe (the last step in the cascade).

If one or more required mcalibs cannot be found, the tool switches to the Raw2Raw (R2R) mode. It still applies the R2M rules, now using either the provenance information in those master calibrations that it can find, or the virtual products mechanism (see below). In R2R mode, all raw calibrations are delivered, including the "calibrations for calibrations", thus supporting the entire calibration and reduction cascade, like in the old calSelector v1. In this mode, the tool returns certified raw files only (plus static calibrations). This certification flag is a new feature related to calSelector v2 and to the dpIngest tool: if a master calibration is ingested by QC, its provenance is checked (the list of parent raw files), and these raw files get the certified flag. See below for more information.

It is also possible for QC to configure an OCA rule for calSelector as "r2r_only", meaning that the tool is forced (by this database configuration) to go R2R (while in the above case it has decided by itself that R2M is not possible and therefore goes R2R).

Static calibrations ("gencalibs") are always delivered as such. Associations marked as RASSOC always deliver raw files.

[ top ] Rules and versions

With calSelector v2, the OCA rules come in R2M syntax, like the DFOS_OPS rules. This makes the maintenance of the OCA rules much easier and safer for QC. In particular, the DFOS_OPS and CALSELECTOR rules can now be merged.

This is the schema for creating CALSELECTOR OCA rules:

With these modifications, the rules become compatible for both DFOS_OPS and CALSELECTOR environments. The rules are then uploaded to web directories and database, and they are displayed graphically for better overview. These tasks are supported by dfos tools calselManager and createCalibMap.

DFOS_OPS   CALSELECTOR
CURRENT version unify CURRENT version
  from CURRENT HISTO1 version
  from HISTO1 HISTO2 version

For historical periods with OCA rules different from the CURRENT one, the corresponding CALSELECTOR version is developed from the CURRENT one.

For future periods with the need of a modification in DFOS_OPS, the CURRENT CALSELECTOR rule is made a historical one, and the new DFOS_OPS rule becomes the CURRENT CALSELECTOR rule.

These steps are supported by calselManager.

The alignment of DFOS_OPS and CALSELECTOR OCA rules is not enforced by any tool, it is a conceptual principle. The QC scientist has to make sure that this alignment is achieved and continued. In particular they need to take care that any change on one side is also reflected on the other side.

[ top ] Testing

With a new OCA rule set, testing the associations is an important step towards the proper formulation of the rules. Remember they come in a complex syntax and cannot be interpreted intuitively. The dfos tool verifyAB is used to expose dp_ids to calSelector OCA rules which are either already ingested, or are local and under development. The user can define reference datasets for regression tests, or typical test cases, or filelists with many dp_ids for performance tests. Also, the tool is used to check routinely all new science datasets in the DFOS workflow for proper associations (in particular completeness). These continuous tests replace logically the historical harvesting. For their own purpose, QC not only creates but also stores all new science ABs for future reference. The currently only use case for those ABs would be the IDP production.

[ top ] calSelector and dfos tools

The following dfos tools support the QC scientists in their tasks related to calSelector:

[ top ] Syntax

calSelector uses standard and well-known OCA syntax. The only new feature is the 'between' statement which is also understood by ABbuilder. It is important to make sure that all science data types have a product defined since this is needed for the creation of vproducts. While this was in place originally, these product definitions were dropped in many cases with the termination of science processing by QC, and now need to be re-introduced again.

[ top ] Calibration maps

The dfos tool createCalibMap supports, as already with calSelector v1, the creation of calib maps. It takes into account that the CALSELECTOR OCA rules now come in R2M syntax. It displays only those raw types that are relevant for the science reduction. See the entire set of calibration maps here.

[ top ] Virtual products

A new feature with calSelector v2 are the virtual products ("vproducts"). They represent datasets, as defined in the grouping rules in OCA (organization). Datasets define the smallest reasonable unit for data processing (while the traditional single file is the smallest unit for archiving). A dataset could be: a single file (e.g. UVES), or all files from the template (e.g. IMAGE stack), or a subset of those (e.g. all XSHOOTER OBJECT+SKY frames from the same template and the same arm). Vproducts can exist for both calibrations and science data.

Vproducts are used by calSelector for the following purposes:

Vproducts are created in several ways:

The tool for the creation of vproducts is called the qcproducthandler. It is a component of the calSelector jar package. It is not used by QC but is running on the archive side in the background. For information, here is its workflow:

[ top ] Certified flag

There is an automatic process in the archive that marks all raw files as 'certified' if they have been used to generate an mcalib that got archived. Raw files without flag are of unknown (not: bad!) quality. They can exist in the very recent data flow, or in old data with no pipeline support, or no good pipeline support (one recipe is known to fail), or no QC support (SM data only, or standard setups only).

calSelector gives preference to certified raw data if it comes to raw data delivery (see above).

The certified flag is set by dpIngest upon ingestion of a new mcalib, and unset if an mcalib is hidden (dpDelete).

The proper mechanism to flag a raw file as having bad quality is to hide it. Then it will always be ignored by calSelector.

The certification flag reflects an important value added by the QC group.

[ top ] Validity

The OCA rules as maintained by QC do not contain validity information in OCA syntax. For DFOS_OPS and for calChecker, validity is supported in a non-OCA syntax (the commented DELTAT_RULE section) and is evaluated by createAB. That validity concept has three values: OK, NOK, MISS. NOK means a matching calibration exists in the data pool but it is outdated. MISS means that no calibration has been found.

With v2.0, calSelector, also supports such a scheme. It is implemented by the use of two between-like statements:

Usually both statements are used, and calSelector then interprets the matches in the following way:

In addition there are the classical time match rules for static mcalibs (the PREVIOUS rule in most cases).

A special case for validities is covered by the concept of breakpoints where associations are not allowed to cross certain well-defined MJD-OBS values. The breakpoints are maintained by QC in a database table calsel_breakpoints (here) and are maintained with the dfos tool writeBreakpoint.