Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.eso.org/~qc/dfos/mucblades.html
Дата изменения: Tue Feb 23 19:33:38 2016
Дата индексирования: Sun Apr 10 00:37:34 2016
Кодировка:

Поисковые слова: trifid nebula
MUC blades: architecture and backup scheme

Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable

MUC blades: architecture, support and backup scheme

Hardware:

The architecture and hardware has been selected after extensive and careful tests executed in August 2012.

7 Dell M620 with 12 cores each (muc01-muc07), 2 Dell M820 with 32 cores each (muc08, muc09), 1 Dell M820 with 48 cores (muc10)
http://www.dell.com/us/enterprise/p/poweredge-m620/pd stored web page
http://www.dell.com/us/enterprise/p/poweredge-m820/pd stored web page

Data: external storage /fujisan3; 4 disk servers with 7.2 TB each; 2 with 9 TB, configured as RAID10 for speed and redundancy.

Fuji-san ...


Memory: muc01-muc07: 64 GB; muc08, muc10: 512 GB; muc09: 512 GB
Home: internal storage, 0.9 TB, cross-mounted from OTS home server just as before (no change)
Operating system: 64bit

Configuration:

All servers are called 'muc<nn>' where nn starts with 01
'muc' stands for 'multi-core processing system for QC'.

muc01…muc04: three accounts per UT (low-data volume instruments)
muc05: three accounts VLTI, plus pre-imaging (low-data volume instruments)
muc06, muc07: survey cameras (muc06: ocam, muc07: vircam)
muc08: science processing (phoenix) (accounts sciproc (for UVES), xshooter_ph and giraffe_ph as of 2015-06)
muc09: muse and muse_ph
muc10: muse_ph2
per account: 1 internal disk, with dfos software, pipelines etc.
1 external disk, with data and long-term memory
storage: external disk on 'fujisan3' server

[The super-computer next door is called Super-MUC.]

Note that the assignment of instrument accounts to muc blades refers to the instrument-telescope association in winter 2013. No attempt is made to re-arrange accounts if an instrument is physically moved on Paranal.

Sketch of architecture

The muc01...muc05 servers have three home accounts and three internal data disks mounted.

The muc servers muc06...muc10 have one internal data disk each.

Technology

The muc systems muc01...muc07 have 12 Intel cores, arranged in 2 CPUs with 6 cores each. The cores are 'hyperthreaded', which means they have two 'virtual cores' each. This is why e.g. ganglia reports 24 cores for each muc. The condor setup is very conservative and assigns 8 cores for condor processing, leaving the others for interactive and crontab jobs.

muc08 and muc09 have 32 cores each, muc10 has 48 cores.

For an overview of the basic parameters of the systems, go to http://qc-ganglia.hq.eso.org/ganglia/?c=qcXX%2BdfoXX&h=muc02.hq.eso.org (replace muc02 by any other muc blade).

Click here for a picture of the 8 muc blades in the data centre, starting at left with muc01 (meanwhile we have 10 mucs). The bigger blade at the very right is muc08.

Accounts.

  muc01 muc02 muc03 muc04 muc05   muc06  muc07
muc08
muc09 muc10 muc01 muc02 muc05
accounts crires fors2 kmos giraffe uves xshooter isaac sphere vimos visir hawki naco2 sinfoni amber midi2 pionier ocam vircam sciproc xshooter_ph* giraffe_ph* more* muse muse_ph* muse_ph2* fors1 qc_shift preimg
operational no yes yes yes yes yes no yes yes yes yes yes yes yes no yes yes yes never never never   yes never never no never F S N Vs Vr
*phoenix

Support.

The muc blades fall into the Level A support (operations-critical). Alerts, emails or tickets are investigated by SOS 7 days a week, from 8 to 16:45 local time.

Backup scheme.

The muc blades are all backed up. Current scheme is backing up everything which however will be fine-tuned. The homes, which are not on the mucXX machines, are backed up as part of the backups of the otshsr-vip host. From an email by Alexis on 2013-02-18:

Included:
/fujisan3/data/kmos
/fujisan3/data/pacman
/fujisan3/data/sphere
/fujisan3/data21/uves
/fujisan3/data22/naco2
/fujisan3/data23/giraffe
/fujisan3/data24/vimos
/fujisan3/data25/fors1
/fujisan3/data25/xshooter
/fujisan3/data26/fors2
/fujisan3/data27/isaac
/fujisan3/data28/midi2
/fujisan3/data29/sinfoni
/fujisan3/data30/visir
/fujisan3/data31/amber
/fujisan3/data32/crires
/fujisan3/data33/preimg
/fujisan3/qc/hawki
/fujisan3/qc/ocam
/fujisan3/qc/vircam
/fujisan3/data/sciproc

Excluded:

"*.fits"
"*.fits.gz"
"*.fits.Z"
Retrieval:
Submit a ticket (even on the weekend SOS is informed about tickets). The ticket should specify:

1) The host and full path to the file you want restored
(e.g. muc01:/fujisan3/data/kmos/AB).

2) Please try your best to reference the path *without* using symlinks; this helps SOS from thinking the file is not backed up, when actually it is!
A nice command, which might help here, is 'readlink' which can turna path full of symlinks into a the real physical path. E.g.:

muc01# cd /datavlt/kmos/ <--- where I thought 'AB' is
muc01# readlink -f AB
/fujisan3/data/kmos/AB <--- where it really is

3) The 'last good version time' (e.g. 12:35 Mon 7 Jan). SOS will attempt to restore the file from the most recent backup *before* that time.

4) Where you would like it restored to (e.g. muc01:/fujisan3/data/kmos/AB.from-backup)

5) The urgency of the request. If you make the request on the weekend and it is not urgent, then we will not restore the file on the weekend.
(This applies to the mucXX machines, because the mucXX machines are "priority 1").

Feel free to gain some confidence about backups using the following procedure:

1) echo "test file" > /some/file/that/does/not/exist/but/should/be/backed/up

2) wait 24 hours

3) rm /some/file/that/does/not/exist/but/should/be/backed/up

4) submit a ticket