Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://acat02.sinp.msu.ru/presentations/matousek/ACAT2002_Matousek.pdf
Äàòà èçìåíåíèÿ: Fri Jul 12 02:02:26 2002
Äàòà èíäåêñèðîâàíèÿ: Mon Oct 1 20:34:03 2012
Êîäèðîâêà:
Efficient Storing of Multidimensional Histograms Using Advanced Compression Techniques
V. Matousek I. Turzo
1, 3 *, 1, 3

, . Krupa

1, 2

and M. Jandel

1

Institute of Physics, Slovak Academy of Sciences, Bratislava, Slovakia
2

Flerov Laboratory of Nuclear Reactions, JINR, Dubna, Russia
3

Laboratory of High Energies, JINR, Dubna, Russia
*

E-mail: matousek@savba.sk

ACAT - 2002

c

, M. MorhÀ

1, 2

, J. Kliman

1, 2

,

1, 2



June 24 ­ 28, 2002


DATA STORING
Data types
· list of events · histograms (spectra) 1, 2, 3, 4-dimensional

Event formats
Supported event formats for off-line analysis

· Acq format
E E . . E E vent variable 1 vent variable 2

vent variable N nd of event (-1)

· Gammasphere format
It is more complicated and is available on WEB

Practical problems
· enormous amount of information stored as list of events on Exabyte tapes · to read one tape (4 GB) it takes about 2 hours · in one experiment 20, 30 or more tapes of events are usually collected · only to read data from one experiment it takes sometimes several days · to handle data and to create different slices it is necessary that the spectra be available to the experimenter in interactive way · therefore it is desirable to work with histograms ­ spectra, preferably with multidimensional spectra and then to slice these spectra at specified energies to obtain one dimensional slices


· today's resolution of ADCs used in GAMMASPHERE experiments is 16k · let us assume that one channel in a multidimensional spectrum is stored in 2 bytes. Then the storage requirements for multidimensional spectra are as follows: 2-para 3-para 4-para 5-para m. m. m. m. 2x214x214=229B=512MB 2x23x14=243B=8TB 2x24x14=257B=128PB (peta) 2x25x14=271B

· to fit the capacity of co mputer me mory it is unavoidable to decrease the size of multidimensional array, i.e. to compress spectra in a way

Compression
· the aim is to decrease the size of data to the size of me mory while preserving as much information as possible. a. binning ­ channels are summed together ­ decreases resolution - loss o f information b. natural compression ­ utilizes special properties of data. In the case of multidimensional -ray spectra from GAMMASPHERE we can utilize the property of natural symmetry of the data. It holds 2-parameter spectra E(1,2)=E(2,1) · it decreases space by factor of 2 3-parameter spectra E(1,2,3)=E(1,3,2)=E(2,1,3)=E(2,3,1)=E(3,1,2)=E(3,2,1) · decreases space by factor of 6 k-parameter spectra · in general by utilizing the property of symmetry the k-dimensiona l space can be decreased by factor of k!


· me mory requirements for symmetrical histograms N= 2 2D - (N2 +N)/2 3D - (N3 +3N2+2N)/6 4D - (N4+6N3+11N2+6N)/24 5D - (N5+10N4+35N3+50N2+24N)/120
14

256MB 1.25TB 5PB 16EB

c. compression using adaptive transforms ­ off-line block data method · adaptive Walsh, Fourier, Cosine transforms · principle consists in modification of the transform kernel to reference vectors so that the distortion is as small as possible · for reference vectors we have used integral spectra Examples

d. on-line compression method using Walsh adaptive transform · we have to store 2, 3, 4- dimensional histograms in me mory · original uncompressed size of spectrum very frequently exceeds the memory capacity available · therefore it is impossible to store a multidimensional histogram in its origina l form, then transform it as block data and subsequently store only a part o f transformed spectrum


For N = 4 and bit-reversed output vector the Fourier transform matrices are


F

1 1 = C 2 1


1 1 1 -1 1 -1 W -1 -W 1 -W -1 W

1



; .

F

1 -1 = C 2
-j 2 N

11 1 1 1 -1 -W W 1 1 -1 -1 1 -1 W -W

where W = e

.

Let us generalize this transform and replace the multiplication co efficients 1 in the signal flow graph of the FFT by a, b, c, d

Signal flow graphs of the adaptive direct (a) and inverse (b) fast Fourier transforms.


The direct and inverse adaptive Fourier transform matrices for N = 4 are a0a2 a1c2 a b -a1d2 FA = 0 2 b0 a3 b1c3w b0b3 -b1d3w a0a2 b2a0 a c -d2a1 F-1 = 1 2 A a2c0 b2c0 c2c1 -d2c1


c0a2 c1c2 c0b2 -c1d2 , -d0a3 -d1c3w -d0b3 d1d3w a3b0 b3b0 -c3b1w d3b1w . -a3d0 -b3d0 -c3d1w -d3d1w



The matrices FA and F-1 must b e orthonorA mal and therefore it must hold that

FA · F

-1 A=

E,

where E denotes the unit matrix.

The conditions are satisfied if and only if aibi - cidi = 0 a2 + c2 = 1 i i and thus bi = ±ci ai = di.

b2 + d2 = 1, i i


For each butterfly of the signal flow graph, it must hold that the input energy must b e equal to the output energy. One of the ways how to obtain the b est compression ratio is to concentrate the energy from the input into the first no de in the output so that
1

xi =

x2 + x2 , i j

1x = 0. j Since the values 1x2, 1x3 are zeros the co ef-

ficients of the second butterfly in the second iteration step cannot b e computed using this algorithm. Therefore we shall supp ose that 1 a3 = b3 = . 2 The calculation of co efficients is needed only in the butterflies not containing the complex co efficient w. When considering the ab ove equations with the p ositive sign then in general for one butterfly of the adaptive transform one can write axi + bxj = yi, bxi - axj = yj , The solution for co efficients a0,b0,c0,d0 of the first butterfly in the first iteration step is x0 x2 b0 = , a0 = 2 + x2 2 + x2 x0 x0 2 2 d0 = a0 c0 = b0.


Original two-dimensional signal (coincident nuclear sp ectrum)


Two-dimensional nuclear sp ectrum after the compression using classic Walsh-Hadamard transform with CR = 64


Two-dimensional nuclear sp ectrum after the compression using classic cosine transform with CR = 64


Two-dimensional nuclear sp ectrum after the compression using adaptive Walsh-Hadamard transform with CR = 64


Two-dimensional nuclear sp ectrum after the compression using adaptive cosine transform with CR = 64


Co efficients of the two-dimensional nuclear sp ectrum in the transform domain after transformation using classic Walsh-Hadamard transform


Co efficients of the two-dimensional nuclear sp ectrum in the transform domain after transformation using adaptive Walsh-Hadamard transform


Since the energy in the transform domain of two-parameter (also multiparameter) sp ectra is concentrated around the null p oint of the co ordinate system we shall define the energy packing efficiency (EPE). For twodimensional case it is defined as a part of energy contained in the first M â M from N â N transformed co efficients.

M -1

M -1 j =0 N -1 j =0

EP E (M ) =

i=0 N -1 i=0

2 yi,j

· 100 [%].
2 yi,j


EPE functions for the following transforms: 1 - classic Walsh-Hadamard transform, 2 - classic cosine transform, 3 - adaptive Walsh-Hadamard transform, 4 - adaptive cosine transform


Example of the signal flow graph for on-line compression


e. randomizing transformation mode compression · the size of multidimensional arrays for higher dimensions (4,5) is very large · the number of different descriptors, i.e. the values read out from ADCs which actually occur during an experiment is much smaller · therefore the multidimensional space must be almost empty. Very large part of locations of multidimensional arrays are zeros or statistical fluctuations only. · to preserve correspondence between original space and compressed array in this method of compression the descriptors (addresses in multidimensiona l arrays, e.g. values read out fro m ADCs) and counts are stored for eac h channel separately · the descriptors are passed through a transformation · it would be ideal if any distribution of descriptors would produce the uniform distribution throughout the memory · however in practice there exists a possibility of more descriptors being transformed to the same address · because of that in our algorithm this address is used to determine positio n where to start searching · a list of d successive locations, where d is depth of searching are checked · if a location within this depth is empty descriptor is written to this locatio n and counts is set to 1 · if in a location the descriptor coincides with already recorded descriptor the counts in this location are incremented. Otherwise the event is ignored. · we employ modular arithmetic. Let a is address in original space, then b=a 1 (mod M), where M is prime, gives pseudorando m distribution · for 3D compression we have chosen M=601.
-


DATA STORING
Data types
· list of events · histograms (spectra) 1,2,3, 4-dimensional

Event formats
Supported event formats for off-line analysis

· Acq format
Event Event . . Event End o variable 1 variable 2

variable N f event (-1)

· Gammasphere format
It is more complicated and is available on WEB

Practical problems
· enormous amount of information stored as list of events on Exabyte tapes · to read one tape (4 GB) it takes about 2 hours · in one experiment 20, 30 or more tapes of events are usually collected · only to read data from one experiment it takes sometimes several days · to handle data and to create different slices it is necessary that the spectra be available to the experimenter in interactive way · therefore it is desirable to work with histograms ­ spectra, preferably with multidimensional spectra and then to slice these spectra at specified energies to obtain one dimensional slices


# of minicubes

# of minicubes

# of minicubes

# of minicubes 64 113

method of compression Adaptive transforms Radware

Data organization in - - coincidence spectra compression schemes fro m Gammasphere data


Example of 1-parameter slice from 3-parameter sp ectrum (solid line) and slice (dashed line) compressed using non-constant binning (Radware software package, CR 34000)


Example of 1-parameter slice from 3-parameter sp ectrum (solid line) and slice (dashed line) compressed using our adaptive WHT transform (CR 750000)


Example of 1-parameter slice from 4-parameter sp ectrum (solid line) and slice (dashed line) compressed using our adaptive WHT transform (CR 4.5 â 109)


Example of 1-parameter slice from 4-parameter sp ectrum (solid line) and slice (dashed line) compressed using address randomizing transform (CR 750 â 108)


List of some relevant publications:
1. M. MorhÀ , V. Matousek: Multidimensional nuclear data compression using fast adaptive Walsh-Haar transform. Acta Physica Slovaca 51(6), 2001, pp.307-337. 2. M. MorhÀ , V. Matousek: Adaptive Hartley transform and its use in multidimensional data compression. Applied signal processing, Vol. 6, No.3, 1999, pp.165-174. 3. M. MorhÀ , V. Matousek: New adaptive cosine-Walsh transform and its application to nuclear data compression. IEEE Transactions on Signal Processing, Vol. 48, 2000, No. 9, pp.2693-2696. 4. M.MorhÀ , V. Matousek: Fast adaptive Fourier-based transform and its use in multidimensional data compression. Signal Processing 68, pp.141-153, 1998. 5. M. MorhÀ , V. Matousek: Multidimensional data compression using a new fast adaptive cosine transform. Applied signal processing, 4, pp.16-26, 1997. 6. M.MorhÀ , V.Matousek: Data compression using new fast adaptive Cosine-Haar transforms. Digital Signal Processing, Vol.8, 2, 1998, pp.63-81. 7. MorhÀ M., Matousek V., Turzo I.: Multiparameter data acquisition and analysis system with on-line compression. IEEE Transaction on Nuclear Science, Vol 43., No. 1, Feb. 1996, pp.140-148. 8. MorhÀ M., Matousek V.: A new method of on-line multiparameter analysis with compression. NIM Section A A 370, 1996, pp.499-508.

c

c

c

c

c c

c

c