Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://rswater.phys.msu.ru/Assets/Water/ICANN_manuscript.pdf
Äàòà èçìåíåíèÿ: Mon Oct 15 23:26:45 2012
Äàòà èíäåêñèðîâàíèÿ: Sat Feb 2 22:11:00 2013
Êîäèðîâêà:
Comparison of Input Data Compression Metho ds in Neural Network Solution of Inverse Problem in Laser Raman Sp ectroscopy of Natural Waters
Sergey Dolenko1 , Tatiana Dolenko2 , Sergey Burikov2, Victor Fadeev2 , Alexey Sabirov2 , and Igor Persiantsev1
1

2

D.V. Skob eltsyn Institute of Nuclear Physics, M.V.Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Russia dolenko@srd.sinp.msu.ru Physical Department, M.V.Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Russia

Abstract. In their previous pap ers, the authors of this study have suggested and realized a method of simultaneous determination of temp erature and salinity of seawater using laser Raman sp ectroscopy, with the help of neural networks. Later, the method has b een improved for determination of temp erature and salinity of natural water using Raman sp ectra, in presence of fluorescence of dissolved organic matter as disp ersant p edestal under Raman valence band. In this study, the method has b een further improved by compression of input data. This pap er presents comparison of various input data compression methods using feature selection and feature extraction and their effect on the error of determination of temp erature and salinity. Keywords: neural networks, inverse problems, input data compression, feature selection, feature extraction, Raman sp ectroscopy.

1

Intro duction

Knowledge of such parameters of seawater as temperature (T) and salinity (S) is of great importance, because it helps to understand the evolution of climate change, to study energy exchange between water surface and atmosphere. Necessity of global monitoring of T and S arises from the tendency observed during last years - decrease of icecap in polar latitudes because of global warming. Melting of ice leads to desalination of the surface layer of o cean. It can give an impulse to reconstruction of o ceanic current system and become the reason of considerable climate changes not only in polar areas but in planetary scale. It is obvious that for ecological monitoring of nature waters - for determination of such key parameters as T and S, one needs express non-contact metho ds of diagnostics, which can be implemented in real time. Such properties are inherent in the non-contact radiometric metho d of determination of either S or T of the
A.E.P. Villa et al. (Eds.): ICANN 2012, Part I I, LNCS 7553, pp. 443­450, 2012. c Springer-Verlag Berlin Heidelb erg 2012


444

S. Dolenko et al.

surface of sea waters [1,2], which is most widespread in o ceanology. Measurement of S with the help of radiometers, based on the dependence of absorbing capacity of water surface on concentration of salts, allows its determination with the error not better than unit or tenth of practical salinity unit (psu) [3]. The error of determination of sea surface T with the radiometric metho d, e.g. with the help of the Advanced Very High Resolution Radiometer [4,5], is 1 C. Inaccuracy of determination of T and S is due to the necessity to distinguish small changes of thermal radiation caused by changes of T and S against very intensive signal caused by surface unevenness. One should also account for influence of weather conditions on the absorbance of surface layer of water. Metho ds of laser spectroscopy represent a more convenient to ol for determination of parameters of natural water. Using Raman and fluorescence spectroscopy, one can determine water parameters remotely (using lidar or optical fiber) in real time. Influence of T and S on water Raman spectrum was found by Walrafen [6,7,8]. Based on the dependence of water Raman valence band on T and S, metho d of determination of these characteristics of sea water was elaborated in [9,10,11]. Using dependence of ratio of intensities of high- and low-frequency parts of spectrum of water Raman valence band on T, authors of these papers obtained error of determination of T of 0.5 C in laboratory and 2 C in field conditions. Vibrational (in particular, Raman) spectroscopy can be used for determination of parameters of water because of sensitivity of Raman spectra to type and concentration of dissolved salts and water temperature [12,13,14]. Influence of water temperature and dissolved salts on spectra of water Raman valence band manifests itself in changes of shape and position of this band [12,13,14], Fig. 1-2 from [15]. At increasing T and/or salt concentration, the intensity of highfrequency region of spectrum increases, the intensity of low-frequency region decreases, the band narrows and shifts to high frequencies. In [16], it was suggested to use decomposition of water Raman valence band into contours of Gauss or Voigt shape for determination of T. Linear section of temperature dependence of ratio of the two most intense components (high- and low-frequency) on T was used. The error of determination of T was 1 C [16]. Dependence of the same parameter on salts concentration was used in [17,18] for determination of seawater salinity. Authors of this paper have demonstrated that T and S can be measured simultaneously using water Raman valence band [15,19]. Determination of T and S by three-wavenumber method provided error of determination of the required parameters 0.7 C and 1.0 psu (in vitro), and 1.1 C and 1.4 psu (in natural conditions). Use of artificial neural networks (NN) allowed decreasing the error of determination of T and S (0.5 C and 0.7 psu in laboratory, [15]). In this study, determination of T and S was performed in presence of fluorescence of dissolved organic matter (DOM) in a wide range of concentration. DOM is always present in natural waters, its concentration is variable and it depends on measurement place (much more DOM is present in river mouths), season etc. [20]. Fluorescence of DOM overlaps with Raman spectrum, thus providing an additional source of errors.


Comparison of Input Data Compression Methods in NN Solution

445

2

Experiment

Solution of stated problem (determination of T and S taking into account fluorescence of DOM) using NN was performed via experiment-based approach [21]. It means that only experimental spectra were used for NN training. In this case one needs no a priori constructed model and all specific features of the ob ject are automatically taken into consideration. To perform this study, an array of experimental spectra with different values of the parameters (temperature, salinity and concentration of DOM) was recorded. Solutions were prepared from bidistilled water, river humus and sea salt. Salinity was changed from 0 to 45 psu (step 5 psu), concentration of humus - from 0 to 350 mg/l, temperature - from 0 to 35 C (srep 5 C).

Fig. 1. Scheme of exp erimental setup: 1 - argon laser (488 nm), 2 - b eam splitter, 3 - laser p ower meter, 4 - focusing lens, 5 - thermo-stabilized cuvette, 6 - system of thermo-stabilization, 7 - system of lenses, 8 - edge-filter, 9 - monochromator, 10 photomultiplier, 11 - CCD-camera, 12 - computer

Diagram of the Raman spectrometer is presented in Fig. 1. Spectra were measured in the range 800-4000 cm-1 with practical resolution 2 cm-1 . Argon laser with output power near 500 mW at wavelength 488 nm was used for excitation of Raman scattering. Spectra were recorded by CCD camera. System of thermostabilization made it possible to set and control the temperature of samples with accuracy 0.1 C. Spectra were normalized to the laser power and time of data accumulation. Fig. 2 presents panoramic Raman spectra of solutions in a wide spectral range 800-3800 cm-1 and in a wide range of change of temperature, salinity and concentration of DOM in solutions. Registration was performed by the photomultiplier tube (PMT) (using CCD, one can measure spectra only in a relatively narrow spectral range). That is why the quality of these spectra is not so high (in comparison with those obtained by CCD, e.g., Fig. 3). Hence, these spectra were not used for NN training.


446

S. Dolenko et al.

Fig. 2. Panoramic Raman sp ectra. 1 - 25 C, 0 psu, 0 mg/l; 2 - 25 C, 45 psu, 0 mg/l; 3 - 25 C, 45 psu, 175 mg/l; 4 - 25 C, 45 psu, 350 mg/l.

In the preceding study [22], measurements of the valence band (2220-3870 cm-1 , 1024 features, Fig. 3) were supplemented by an additional set of identification features - low-frequency region of water Raman spectra (from 800 up to 1800 cm-1 , also 1024 features), which depends on T, S and DOM to o and which can have its own Raman bands of such anions as NO3 - , SO4 2- , PO4 3- , HCO3 - . It was expected that thus it would be easier to measure salinity taking into account the DOM fluorescence, as noise.

Fig. 3. Raman valence band sp ectra obtained by CCD: 1 - (0 C, 25 psu, 0 mg/l); 2 (25 C, 15 psu, 175 mg/l); 3 - (15 C, 45 psu, 350 mg/l)

All spectra used for work with NN were measured with 5 s camera exposure time for valence bands and 10 s for low-frequency bands.

3

Metho ds

In the preceding study [22], the same experimental data array has been used for NN determination of T and S in presence of DOM. The best results were obtained with perceptrons with three hidden layers. Using only Raman valence band, the best results obtained were 1.2 C for mean absolute error (MAE) of temperature determination, and 1.5 psu for MAE of salinity determination.


Comparison of Input Data Compression Methods in NN Solution

447

Using both valence band and low-frequency region, it was possible to reduce errors down to 0.8 C and 1.1 psu. Remind that maximum MAE values that can make a metho d interesting for practical applications are about 1 C and 1 psu. The purpose of the present study was to achieve the same level of results or better using only the valence band. Such an opportunity would be important, as recording the low-frequency region of Raman spectra requires more sophisticated and therefore more expensive experimental equipment. It was planned to achieve this goal by reducing the initial dimensionality of the input data (1024 features - spectra channels). It is quite obvious that the actual dimensionality of the problem should be much lower. So, different metho ds of feature selection and feature extraction were applied to achieve input data compression. For all NN experiments in this study, a fixed NN architecture was used. It was a perceptron with a single 64-neuron hidden layer, logistic activation function in the hidden layer and linear activation function in the output one. Learning rate r=0.01, moment m=0.5. Training was stopped after 1000 epo chs after minimum error on test set. The results were estimated on the examination (out-of-sample) set. To account for random factors due to weight initialization, 5 NNs with different initial weights were trained for each experiment. 1) Cross-correlation. The values of cross-correlation (CC) of each of the input features with the output ones were calculated. Then, only the input features with CC exceeding a pre-defined threshold value (0.3), were used to solve the problem. The main shortcoming of this metho d is that linear correlation can capture only linear relationships between variables, thus missing to find significant input features with nonlinear influence on the determined output variable. The determined dependence of CC on spectral shift corresponding to each feature is presented in Fig. 4.

Fig. 4. Sp ectral dep endences of cross-correlation and cross-entropy coefficients

2) Cross-entropy. The values of cross-entropy (CE) of each of the input features with the output ones were calculated. Then, only the input features with CE exceeding a pre-defined threshold value (0.2), were used to solve the problem. While CE can capture non-linear relationships, the precision of its calculation is


448

S. Dolenko et al.

po or for a small number of samples that can be provided from experiment. The determined spectral dependence of CE is presented in Fig. 4. 3) General Regression NN (GRNN, [23]) with correcting co efficients for the smo othing factor for each input feature, as implemented in NeuroShell 2 software package [24]. Only input features with correcting co efficient exceeding a pre-defined threshold value (0.5) were used to solve the problem. As there are obviously interconnections among input features, and as the correction co efficients are determined using genetic algorithm, the set of co efficients that are determined from a single launch of the algorithm has a strong influence of random factors. Therefore, the procedure was applied recurrently several times, each new launch producing a narrower set of significant features. Each of the iteratively obtained sets was used to solve the problem. The dependence of MAE for T and S on the number of selected features is presented in Fig. 5.

Fig. 5. Mean absolute error for T and S de- Fig. 6. Mean absolute error for T and S termination vs numb er of features selected determination vs binary logarithm of the by GRNN numb er of features extracted by adjacent channel aggregation

4) Feature Aggregation. The simplest metho d of feature extraction is aggregating adjacent spectral channels, thus simultaneously reducing the number of input features and the level of noise. Summing up intensities in a fixed number of adjacent channels corresponds to reducing spectral resolution of the device, thus making the equipment simpler and cheaper. The studied problem was solved for the following numbers of aggregated channels: 2, 4, 8, 16, 32, 64, 128, thus pro ducing 512, 256, 128, 64, 32, 16 or 8 aggregated input features, respectively. The dependence of MAE for T and S on the binary logarithm of the number of extracted features is presented in Fig. 6.


Comparison of Input Data Compression Methods in NN Solution

449

4

Results

The best results obtained in this study for different metho ds of input data compression are summarized in Table 1. The presented values are mean absolute error on the out-of-sample set of data.
Table 1. Mean absolute error of problem solution for T and S on the out-of-sample set of data for different methods of input data compression
Method of feature selection/extraction None Cross-correlation Cross-entropy GRNN-GA Channel aggregation Numb er of input features 1024 375 694 319 64 T, C 1.15± 0.92± 0.91± 0.97± 0.69± 0.08 0.06 0.05 0.04 0.02 S, psu 1.37± 1.18± 1.15± 1.02± 0.76± 0.24 0.09 0.07 0.07 0.04

5

Conclusion

This study was devoted to comparison of various metho ds of feature selection and extraction for NN solution of the inverse problem of determination of seawater temperature and salinity by valence band of Raman spectrum, in presence of fluorescence of dissolved organic matter in a wide range of concentrations. The best results were obtained for feature extraction by aggregating each 16 adjacent spectral channels, producing 64 input features. This means that practical spectral resolution required to solve the problem is as large as 32 cm-1 , which can be easily achieved by inexpensive spectroscopy equipment. The obtained values of mean absolute error on the out-of-sample set of data are 0.69±0.02 C and 0.76±0.04 psu, which are not much greater than the results obtained by NN solution of the same problem with no dissolved organic matter. Acknowledgments. This study was supported by RFBR grants No. 11-0501160-a and 12-01-00958-a. All NN calculations were performed with NeuroShell 2 software [24].

References
1. Font, J., Camps, A., Borges, A., et al.: SMOS: The challenging measurement of sea surface salinity from space. In: P. IEEE, vol. 98 (5), pp. 649­665. IEEE Press, New York (2010) 2. Turiel, A., Nieves, V., Garcia-Ladona, et al.: The multifractal structure of satellite sea surface temp erature maps can b e used to obtain global maps of streamlines. Ocean Sci. 5, 447­460 (2009)


450

S. Dolenko et al.

3. Boutin, J., Waldteufel, P., Martin, N., et al.: Surface salinity retrieved from SMOS measurements over the global ocean: Imprecisions due to sea surface roughness and temp erature uncertainties. J. Atmos. Ocean. Technol. 21, 1432­1447 (2004) 4. Eugenio, F., Marcello, J., Hernandez-Guerra, A., Rovaris, E.: Methodology to obtain accurate sea surface temp erature from locally received NOAA-14 data in the Canary-Azores-Gibraltar area. Scientia Marina 65(1), 127­137 (2001) 5. Garcia-Santos, V., Valor, E., Caselles, V.: Determination of temp erature by remote sensing. J. of Mediterranean Meteorology and Climatology 7, 67­74 (2010) 6. Walrafen, G.E.: Raman Sp ectral Studies of Water Structure. J. Chem. Phys. 40, 3249­3256 (1964) 7. Walrafen, G.E.: Raman Sp ectral Studies of the Effects of Temp erature on Water and Electrolyte Solutions. J. Chem. Phys. 44, 1546­1558 (1966) 8. Walrafen, G.E.: Raman Sp ectral Studies of the Effects of Temp erature on Water Structure. J. Chem. Phys. 47, 114­126 (1967) 9. Chang, C.H., Young, L.A.: Seawater Temp erature Measurement from Raman Sp ectra. Avco Everett Research Lab oratory, Inc., Interim technical rep ort (1972) 10. Leonard, D., Chang, C., Yang, L.: Remote measurement of fluid temp erature by Raman scattered radiation. U.S. Patent 3.986.775, Class 356-75 (1974) 11. Leonard, D., Caputo, B., Hoge, F.: Remote sensing of subsurface water temp erature by Raman scattering. Applied Optics 18(11), 1732­1745 (1979) 12. Terpstra, P., Comb es, D., Zwick, A.: Effect of salts on dynamics of water: A Raman sp ectroscopy study. J. Chem. Phys. 92(1), 65­70 (1990) 13. Dolenko, T.A., Churina, I.V., Fadeev, V.V., Glushkov, S.M.: Valence band of liquid water Raman scattering: some p eculiarities and applications in the diagnostics of water media. J. of Raman Sp ectroscopy 31(8-9), 863­870 (2000) 14. Sherer, J., Go, M., Kint, S.: Raman sp ectra and structure of water from 10 to 90. J. Phys. Chem. 78(13), 1304­1313 (1974) 15. Burikov, S.A., Churina, I.V., Dolenko, S.A., et al.: New approaches to determination of temp erature and salinity of seawater by laser Raman sp ectroscopy. In: 3nd EARSeL Workshop on Remote Sensing of the Coastal Zone, pp. 298­305 (2003) 16. Karl, J., Ottmann, M., Hein, D.: Measuring water temp eratures by means of linear Raman sp ectroscopy. In: Proc. of the 9th International Symp osium on Application of Laser Techniques to Fluid Mechanics, vol. I I, pp. 23.2.1­23.2.8 (1998) 17. Becucci, M., Cavalieri, S., Eramo, R., Fini, L., Materazzi, M.: Raman sp ectroscopy for water temp erature sensing. Laser Physics 9(1), 422­425 (1999) ´ 18. Furi´ K., Ciglene i, I., Cosovi´, B.: Raman sp ectroscopic study of sodium chloride c, ck c water solutions. J. Mol. Str., 550­551, 225­234 (2000) 19. Bekkiev, A., Gogolinskaya (Dolenko), T., Fadeev, V.: Simultaneous determination of temp erature and salinity of seawater by the method of laser Raman sp ectroscopy. Soviet Physics Doklady 271(4), 849­853 (1983) 20. Shubina, D.M., Patsaeva, S.V., Yuzhakov, V.I., et al.: Fluorescence of organic matter dissolved in natural water. Water: Chemistry and Ecology 11, 31­37 (2009) 21. Gerdova, I.V., Churina, I.V., Dolenko, S.A., et al.: New Opp ortunities in Solution of Inverse Problems in Laser Sp ectroscopy Due to Application of Artificial Neural Networks. In: Proc. SPIE, vol. 4749, pp. 157­166 (2002) 22. Dolenko, T.A., Burikov, S.A., Sabirov, A.R., et al.: Remote determination of temp erature and salinity in consideration of dissolved organic matter in natural waters using laser sp ectroscopy. In: EARSeL eProceedings, vol. 10(2), pp. 159­165 (2011) 23. Sp echt, D.: A General Regression Neural Network. IEEE Trans. on Neural Networks 2(6), 568­576 (1991) 24. NeuroShell 2, http://www.wardsystems.com/neuroshell2.asp