Документ взят из кэша поисковой машины. Адрес оригинального документа : http://zmmu.msu.ru/files/images/spec/Russ%20Ent%20J/ent21_2%20157_164%20Maximov%20Kuzn.pdf
Дата изменения: Mon Jun 10 06:42:00 2013
Дата индексирования: Fri Feb 28 02:08:00 2014
Кодировка: Windows-1251
Russian Entomol. J. 21(2): 157164

ї RUSSIAN ENTOMOLOGICAL JOURNAL, 2012

A similarity standard and its use in comparing species compositions with species structures of communities Эталон сходства и его использование при сравнении видового состава и видовой структуры сообществ V.N. Maximov1, N.A. Kuznetsova2 В.Н. Максимов1, Н.А. Кузнецова2
1 Lomonosov Moscow State University, Faculty of Biology, Leninskie Gory 1-12, Moscow, 119991, Russia. E-mail: V_Maximovv@rambler.ru Московский государственный университет им. М.В. Ломоносова, Биологический факультет, Ленинские горы, д. 1, стр. 12, Москва, 119991, Россия. 2 Moscow State Pedagogical University, Kibalchicha Str.6, Build.5, Moscow 129164, Russia. E-mail: mpnk@yandex.ru Московский педагогический государственный университет, ул. Кибальчича, д.6, корп.5, Москва 129164, Россия.

KEY WORDS: new method for community comparisons, similarity standard, Jaccards similarity index, Shorygins index, Collembola, sea phytoplankton, sea macrobenthos. КЛЮЧЕВЫЕ СЛОВА: новый метод сравнения сообществ, эталон сходства, индекс Жаккара, индекс Шорыгина, ногохвостки, морской фитопланктон, морской макробентос.

ABSTRACT. When comparing the composition and structure of communities using traditional indices, the problem arises of an adequate evaluation of the results related to a lack of statistical criteria for this evaluation. To resolve this problem, a new method for community comparisons is advanced, based on the use of an empirically obtained similarity standard. Soil springtail communities, sea phytoplankton and sea macrobenthos communities serve as model objects. The widely used Jaccards similarity index and Shorygins coefficient (the sum of the minimum relative abundances of species in the samples to be compared) are chosen as examples. Empirical distributions of these indices for samples taken both in ecologically remote and similar communities are studied. Significance levels for arriving at a decision concerning the degree of similarity in their species compositions and species structures are determined. An express method for creating a similarity standard of species structure is developed, based on regular observations in particular ecological conditions. Using springtail populations, we show how to select a standard dataset to apply this index when comparing communities from various ecosystems and when analyzing seasonal and between-year changes in communities within a single habitat. The use of a similarity standard renders cluster analyses or dendrogram constructions redundant, thus avoiding a diversity of data interpretations. РЕЗЮМЕ. При сравнении сообществ по составу и структуре с помощью традиционных индексов встает проблема адекватной оценки результатов, связанная с отсутствием статистических критериев

этой оценки. Для ее решения предложен новый метод сравнения сообществ, основанный на использовании эмпирически полученного эталона сходства. В качестве объекта рассмотрены сообщества почвенных коллембол, фитопланктона и макробентоса. Для примера выбраны широко используемые индексы Жаккара и Шорыгина (сумма минимальных относительных обилий видов в сравниваемых выборках). Были изучены эмпирические распределения этих индексов для проб, взятых как из экологически различных, так и сходных сообществ. Определены уровни значимости для принятия решения о сходстве их видового состава и видовой структуры. Разработан ускоренный метод создания эталона сходства видовой структуры по данным регулярных наблюдений в конкретных экологических условиях. На примере населения коллембол показано, как подобрать эталонную совокупность для использования этого индекса при сравнении сообществ различных экосистем, анализе сезонных и межгодовых изменений населения в пределах одного биотопа. Использование эталона сходства позволяет обойтись без кластер-анализа и построения дендрограмм, порождающих разнообразие вариантов интерпретации данных.

Introduction
Comparisons between species compositions in samples taken from diverse habitats and/or in different seasons subjected to various external impacts are among the approaches most frequently used in the study of


158

V.N. Maximov, N.A. Kuznetsova as a comparison of species compositions of communities, a similarity standard has long been advanced in the form of any total of parallel samples (Maximov, 1984). A practical realization of this approach is presented below, particular situations taken as examples.

communities. The species composition and/or the community structure are thereby compared through applying this or that similarity index. Then, if the number of communities exceeds two, the results are usually presented in the form of a dendrogram. The choice of a way of clustering and of a suitable similarity measure is determined by the studys objectives and the peculiarities of underlying material. Literature devoted to these problems is highly diverse. Their detailed account can be found in a still relevant monograph by Pesenko [1982]. The same problems have constructively been discussed in another, more recent publication [Shitikov et al., 2005]. The diverse methods of measuring the similarity or dissimilarity of various species lists in samples as described in the literature seem to be related to the very notion of similarity or dissimilarity, which seems to be quite clear for an intuitive grasp, but resists a univocal definition in attempts of proposing for it a mathematically substantiated measure. So the question arises if it makes sense at all to discuss the advantages of one measure over another, based on the way of its calculation, when we are only vaguely aware of what exactly we are to measure? From a practical viewpoint, however, the following question arising when we compare species lists from two samples is more important: can the differences found in the lists be considered as evidence of the samples having been taken in different communities, in different seasons, in habitats differing in the rate of pollution etc.? Or are these differences actually related to sampling inaccuracies alone? Or are they due to errors stemming from the abundance estimates (numbers calculations) of each of the species? When analyzing experimental data, this problem is usually formulated in terms of mathematical statistics, also posing the following question: at which significance level a 0hypothesis of the absence of differences between samples can be discarded, based on the available sample values of their characteristics? In the literature, such problems related to classification methods are only seldom discussed, likely because the majority of these methods are inherently nonstatistical, i.e. 0-hypotheses are formulated neither in theoretical design nor in a programme realization of the respective algorithms while the characters of the objects to classify, be they measured on a relative or absolute scale, are regarded as determined values. The pattern of their distribution as random values is simply ignored. The present paper advances the notion of a similarity standard, compared to which any total of samples taken in the course of an ecological study would contain some samples similar in species composition and structure to an independently created standard dataset. Certainly, because there is no clear definition of what is similarity, it is hardly possible to propose a standard of this similarity which would apply to all situations. Instead, to cope with such a particular task

Material and methods
Results of an analysis of the species compositions of soil Collembola, phytoplankton and sea macrobenthos served as input material. Along with the apparent differences between the objects, they shared about the same statistical reliability of abundance estimates in the study organisms. In all cases the overall numbers of individuals calculated in each separate sample ranged from several dozen to 100300 ex. while the average abundance of a species in this sample determined as the geometric mean of the species in the sample failed to exceed 56. The geometric mean was chosen because the numbers distribution per species revealed in the samples is known to be similar to exponential. In any event, the rank distribution of abundance logarithms very often looks like a simple linear regression. As it is easy to comprehend, the arithmetic mean of log values is the logarithm of their geometric mean. The species composition of soil-dwelling springtails was determined using samples taken in 1983 with square frames of 5 ? 5 cm on plots 30 ? 30 cm in size in a southern taiga lichen-moss pine forest on Silon Island in the Darwin State Biosphere Nature Reserve. Each plot supported 36 samples containing taken from the following microsites: lichens 1,612 ex., 26 springtail species; lichen with a spot of green moss 2,761 ex., 26 species; a diffused mixture of lichen and green moss 1,641 ex., 20 species; green moss with a spot of lichen 2,290 ex., 21 species; and green moss 6,646 ex., 21 species. The numbers of per species per sample on these plots averaged 3.1, 5.8, 3.3, 4.7 and 6.8, respectively. Material was extracted with Tullgren funnels and then fixed using standard techniques [Potapov & Kuznetsova, 2011]. In addition to the above springtail series used in Chapter Analysis of the similarity matrix in Potapov & Kuznetsova [2011], data were also incorporated concerning the springtail numbers obtained through summing up the samples taken in lines between two trees, five samples in each line, in further five different forest ecosystems. These were bilberry spruce forests and oak woodlands in the Mordovian Nature Reserve and in the environs of the city of Vilnius, as well as data for a bilberry spruce forest in the south of Arkhangelsk Region revealed during two sequential years of sampling [Kuznetsova, 2005]. Phytoplankton sampling in the White and Kara seas was performed by staff members of the Chair of General Ecology and Hydrobiology of the Moscow State University more than 30 years ago. In both cases, 50 samples, each 1 litre in volume from the surface horizon, were taken from an anchored boat. The samples


A similarity standard and its use

159

Fig. 1. Intervals of changes in Jaccards index in parallel samples (boxes with whiskers). Designations on the abscissa axis: : k55, ch51, ch35, k20, ch50 phytoplankton samples; L, D, LM, ML, M soil samples on Silon Island; 4L, 4D, 4LM, 4ML, 4M the same samples united by 4; BS benthos samples on the shelf of Barents Sea. Рис. 1. Интервалы изменений индекса Жаккара в параллельных пробах ('ящики с усами'). Обозначения на оси абсцисс: k55, ch51, ch35, k20, ch50 пробы фитопланктона; L, D, LM, ML, M почвенные пробы на о.Силоне, 4L, 4D, 4LM, 4ML, 4M те же пробы, объединенные по 4, BS пробы бентоса на шельфе Баренцева моря.

were first fixed using a Lyugol solution and then concentrated through sedimentation. The species compositions were determined with the aid of count chambers, scrutinizing five chambers from each parallel sample. The results of counts in each chamber were utilized as subsamples for calculating the similarity indices [Koltsova et al., 1971; Likhacheva et al., 1979]. In the Chupa Bay, White Sea, on the average each sample contained 10 species and 60 cells in June, these values in August being 19 and 520, respectively. In the Kara Sea, each subsample on the average comprised 12 species and 28 cells in August, these values in each sample being 30 and 252, respectively. Data concerning macrobenthos were kindly placed at our disposal by N.V. Kucheruk. Material had been collected from the shelf of the Barents Sea in five dredge samples taken at each of eight stations. From amongst the great variety of similarity indices we chose only two as examples. The similarity in species composition was analyzed using the above-mentioned Jaccard similarity index: JCR = c/(a+bc), where a is the number of species in list A, b is that in list B, whereas с is the number of species shared by both lists. To evaluate the similarity in species structure, Shorygins coefficient was applied, a likewise popular similarity index: SHR = 5 min(pi1, pi2), where min(pi1, pi2) is the lower of two relative abundances of i-species in the compared, pij = nij/Nj , if nij is the abundance of ispecies in sample j, whereas Nj = S nij. This index is easy to calculate using any statistics software package

containing cluster analysis which, among other things, accounts for the so-called Manhattan distance or City Block Metric: CBM = 5 |pi1 pi2|. SHR = 1-CBM/2 [Pesenko, 1982].

Results and discussion
We created matrices for the Jaccard and Shorygin indices using each of the above datasets. To test the similarity in relation to sample size, we developed matrices summing up every four soil samples taken in each of the five quadrat plots on Silon Island. This operation is obviously analogue to the summation of phytoplankton cell counts in five chambers (subsamples) taken from each water sample. The intervals of the values obtained for both similarity indices are presented in Figs 1 and 2. The abscissa axis reflects the increasingly growing values of the mean abundance of species per sample. The patterns of variation in the similarity indices, phytoplankton samples differ little from soil springtail ones. No peculiarities are observed in the summed similarity indices for macrobenthos as well. Because the sampling methods for phytoplankton, soil microarthropods and macrobenthos, and their subsequent cameral treatment differ considerably in techniques, one can conclude that these differences virtually fail to influence statistical variation estimates. No relationship between the mean value and sample size is revealed for Shorygins index. This is related to the latters low sensitivity to abundance varia-


160

V.N. Maximov, N.A. Kuznetsova

Fig. 2. Intervals of changes in Shorygins index in parallel samples. Designations as in Fig. 1. Рис. 2. Интервалы изменений индекса Шорыгина в параллельных пробах. Обозначения, как на рис. 1.

tions in scanty species. The range of variation in values for this index is even significantly less for summed samples obtained through uniting soil samples by fours (4L, 4D, 4LM, 4ML, 4M) or phytoplankton samples by five subsamples (k20, ch50). In contrast, the Jaccard index tends to increase the mean values and to decrease the variation range (the difference between the minimum and the maximum values of the index) along with a growing average abundance calculated per sample Clearly this is related to an increased reliability of species identifications along with a growing sample size to be analyzed, because the dispersion of the indexs values in this case is only linked to identification errors. Let us remind that errors as understood here include not only the purely technical ones related taking samples, fixing material etc., but also the differences between parallel samples related to an uneven distribution of individuals within a habitats space. Due to this, the counts of individuals growing through summing up several separate samples fail to result in a significant approximation even of the maximum values, let alone the mean ones, of the indices to their theoretical value, i.e. 1. When evaluating similarities, uneven distributions of organisms (the formation of groups) provide a more significant contribution to sample errors than do the purely technical errors related to counts per sample. Therefore, an empirical function of distribution derived from data obtained with the use of a sufficiently large series of parallel samples can serve as a similarity standard regardless of the similarity index chosen. We believe that the notion sufficiently large must not cause serious doubts. As a matter of fact, it is nothing more (but also nothing less) than an expert judgment.

The distribution diagrams presented above for the Jaccard and Shorygin indices (Figs 1 and 2) are based on similarity matrices calculated for 10 series containing 2050 samples each So each matrix had from 200 to 1,200 values of an index. Summing up the frequencies of occurrence of these values in all study matrices of the individual samples similarity (samples k55, L, LM, D, ch51, BS, ch35, ML), we obtain the sum total of 2,800 values each for the Jaccard and Shorygin indices. Similarly, as regards the samples obtained through combining four neighbouring samples for Collembola or five subsamples for phytoplankton, we get 1,100 values of each index. In each individual sample, the geometric mean of each species numbers did not exceed 5, with 3 to 15 species involved. In the combined samples, the geometric mean of each species abundance ranged from 6 to 12 while the species richness in some samples amounted to 20, never being lower than 9. To apply each of the indices as similarity tests, it is enough to know only the tails of the respective empirical function of distribution. First the frequency of occurrence of the minimum values must be estimated, because H0: JCR=0 or H0: SHR=0 is logical to accept as a 0-hypothesis. Let us exemplify this rather strong statement. To verify the reliability of the differences in measurement results, a 0-hypothesis is known to be formulated and tested (with a defined confidence probability of type 1 error) concerning the absence of differences, i.e. the difference being equal to 0. Most often this is a difference of arithmetic means found in two independent series of measurements (samples). The differences revealed are evidence that the means found evaluate the mathematical expectations for two different general totalities. At a 5% significance level chosen, the proba-


A similarity standard and its use bility of erring in such an assumption must not exceed 0.05%. For our goals, a hypothesis of the presence of differences between samples is of no practical interest. It is too naпve to expect that, having taken two samples even in the same habitat, after counts we would get two lists, in which the species (species composition) and their relative numbers would be absolutely the same. The following question is by far more important for solving problems of community classification: is it justified to distinguish an association of organisms observed at a given moment (a taxocene of springtails in soil samples, a group of benthic organisms on a homogeneous bottom plot, an assemblage of planktonic algae or invertebrates within a hydrologically homogeneous water mass etc.) as a community or at least as a part of one and the same biocoenosis? To arrive at a conclusion, we need to check a spatio-temporal stability of the species composition and species structure of a group of organisms under study. This means that, having taken samples in similar habitats during the same season, we must test if all the differences observed are only related to sampling errors and to heterogeneous distributions of objects. Therefore, a hypothesis of the absence of similarity must be tested, i.e. of .the similarity measure chosen being equal to 0. Then surpassing a certain threshold found based on a standard sum total for the respective similarity index would indicate that all differences in the samples compared (even those taken from various communities and ecosystems) fail to exceed the differences in parallel samples, i.e. related only to aggregated distributions of the species revealed and to technical errors of sampling and analysis. If this standard threshold is not overridden, we still remain in the same uncertain situation as in classical tests for the so-called difference reliability, yet with the opposite sign. If the sample estimate of a similarity index does not exceed a critical value, it cannot be considered as a good reason for saying that the samples to be compared were taken in different ecosystems or different habitats. However, one must keep in mind that n(n1)/2 indices in the similarity matrix defined for n samples cannot be regarded as independent realizations, because they are correlated. This correlation is easy to exem[plify as follows. If in a study total there are two samples completely equal in species composition, then their similarity to the remaining n2 samples would be represented by two equal sets of values. Therefore, if a similarity matrix contains at least one index value equal to 1, then the other n2 values would be found in the matrix at least twice. Due to the same reason, the appearance of only a single sample differing anomalously in species composition from the remaining samples results in n anomalously small values of Shorygins index. Yet this cannot strongly affect the empirical function of distribution through using in its analysis, as it is usually done, relative frequencies of occurrence of each value of the index. It is another matter

161

Table 1. Fractiles of empirical distributions for Jaccards and Shorygins indices. Таблица 1. Квантили эмпирических распределений для индексов Жаккара и Шорыгина.

Fractile

Jaccards index sum for samples single combined samples by 4 and 5 0.09 0.11 0.14 0.17 0.20 0.23 0.30 0.31 0.38 0.42 0.45 0.50

Shorygins index sum for samples single combined samples by 4 and 5 0.17 0.21 0.36 0.43 0.49 0.55 0.35 0.38 0.46 0.52 0.58 0.64

0.0001 0.001 0.01 0.025 0.05 0.1

that we consider it difficult to mathematically strictly evaluate confidence probabilities, based on the relative frequencies revealed this way. Therefore, the threshold values of the similarity indices obtained below are only to be considered as expert judgments applicable to preliminary studies, an exploration data analysis as termed by Tukey [1981]. Table 1 shows fractiles for the distribution functions found, which correspond to the significance levels the experimentalists are used to (thresholds of faultless forecasts, in terms of Plokhinskiy [1970]). Consequently, if we have a series of samples in which the geometric mean of a species numbers does not exceed 3 specimens (or else, this being nearly the same, the total counts do not exceed 200 individuals, representing not more than 1015 species per sample), the similarity matrix may contain 5% of values of Jaccards index less than 0.20 and 1% of JCR<0.14, even though all these samples were taken in the same place and at the same time. Table 1 refers to such samples as single (entomologists simply term them as samples, planktonologists as subsamples). Because in practical ecological studies the number of samples to compare amounts to dozens, in the corresponding matrices the number of values of the similarity index can reach several hundred. Then 1% of the total number of values does not look as sufficiently low as it does when testing ordinary statistical hypotheses. When exploring the similarity in species composition using Jaccards index, there is hardly any sense to utilize single samples (in the above sense). Instead, sample sizes must be selected so that the total number of counted individuals would considerably exceed 200, the number of species per sample not less than 10 while the geometric mean of abundance not less than 89 specimens. Another approach is also possible: because the above-mentioned quantitative characteristics become available only upon a cameral treatment of the samples, results of the summation of several single


162

V.N. Maximov, N.A. Kuznetsova
Table 2. Similarity of species structure using Shorygins index (SHR) for samples from forest ecosystems. Таблица 2. Сходство видовой структуры по индексу Шорыгина (SHR) для проб из лесных экосистем.

%% A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 D1 D2 D3 E1 E2 E3 E4 F1 F2 F3 F4

A1 70 50 69 28 37 27 33 46 48 51 40 34 48 30 31 33 31 33 33 33 33

A2 70 54 62 19 28 19 24 42 43 47 42 36 56 26 34 37 37 30 40 39 36

A3 50 54 45 19 27 18 23 31 34 35 34 30 39 21 21 21 20 20 23 23 20

A4 69 62 45 18 26 17 22 46 50 47 41 43 49 24 24 24 20 21 23 23 21

B1 28 19 19 18 81 87 85 25 20 25 14 14 15 13 10 11 11 16 17 17 21

B2 37 28 27 26 81 78 84 30 26 32 20 19 21 18 16 16 16 21 22 22 26

B3 27 19 18 17 87 78 86 27 23 29 18 18 18 15 12 14 13 18 19 17 22

B4 33 24 23 22 85 84 86 30 25 31 21 21 22 14 11 12 12 17 18 17 21

C1 46 42 31 46 25 30 27 30 82 80 39 47 50 30 33 34 32 39 36 33 36

C2 48 43 34 50 20 26 23 25 82 75 43 49 51 31 33 36 31 29 34 33 31

C3 51 47 35 47 25 32 29 31 80 75 42 43 55 28 29 31 32 33 34 33 35

D1 40 42 34 41 14 20 18 21 39 43 42 75 75 34 32 32 32 35 32 34 33

D2 34 36 30 43 14 19 18 21 47 49 43 75 70 23 22 23 20 21 23 23 21

D3 48 56 39 49 15 21 18 22 50 51 55 75 70 36 39 44 44 36 44 46 42

E1 30 26 21 24 13 18 15 14 30 31 28 34 23 36 77 76 55 73 63 65 64

E2 31 34 21 24 10 16 12 11 33 33 29 32 22 39 77 78 59 69 60 60 55

E3 33 37 21 24 11 16 14 12 34 36 31 32 23 44 76 78 63 72 67 69 65

E4 31 37 20 20 11 16 13 12 32 31 32 32 20 44 55 59 63 57 64 75 61

F1 33 30 20 21 16 21 18 17 39 29 33 35 21 36 73 69 72 57 73 74 71

F2 33 40 23 23 17 22 19 18 36 34 34 32 23 44 63 60 67 64 73 82 78

F3 33 39 23 23 17 22 17 17 33 33 33 34 23 46 65 60 69 75 74 82 76

F4 33 36 20 21 21 26 22 21 36 31 35 33 21 42 64 55 65 61 71 78 76

samples taken simultaneously close to one another are to be considered as initial data for calculating the similarity indices. In our case, this corresponds to summing up 4 or 5 single samples and generally agrees with the standards practiced by entomologists and planktonologists. The introduction of the notion similarity standard allows for a number of problems to be solved which arise when comparing the species composition and the species structure of communities with the use of such traditional methods of multivariate analysis as cluster analysis, multidimensional scaling etc. First of all, these imply uncertainty in choosing a measure of similarity for samples taken in natural ecosystems, as well as difficulties arising from analyses of similarity matrices. Various methods of analysis of similarity matrices, most of which are non-stochastic in the true sense of the word, often lead to fundamentally different results, even when the same similarity index is applied. Let us use the similarity matrix calculated with the aid of Shorygins index (Tab. 2) for the numbers of springtails in five forest ecosystems obtained through combining every five samples taken in lines between two trees. A and B are so summarized samples from a bilberry spruce forest and an oak woodland in the Mordovian Nature Reserve, respectively. C and D are similar sums for an oak and a broadleaved-coniferous forest in the environs of Vilnius, respectively. E and F

represent true repetitions in the fullest sense of the word, as a result of sampling in the same bilberry spruce forest at Ramenye in 1980 (E) and 1981 (F). In Tab. 1, let us find a value of SHR=45% which roughly corresponds to the 1% fractile. If we accept it as the minimum standard value and then mark boldface in Tab. 2 all values of SHR>45%, it becomes apparent that all samples taken in each of the ecosystems cluster into groups clearly isolated from one another. This primarily shows that, within each of the forests, the similarity in springtail species composition in samples corresponds well to the similarity standard. In addition, it is obvious that species the composition in all samples taken in 1980 from the bilberry spruce forest at Ramenye is similar to that in 1981 samples. When viewed from a different aspect, by species structure the Ramenye collembolan population sampled in 1980 and 1981 is as similar as the samples taken on the same day on Silon Island on a plot not exceeding ? sq. m. Let us remind that it is the sum total of SHR values that, together with phytoplankton and macrobenthos samples, we have accepted as a similarity standard. The conclusion seems to be quite sound that, during the year that passed between the two sampling repetitions at Ramenye, the species composition of springtails failed to alter significantly, although we are unable to provide a strict statistical evaluation of this conclusions reliability. It is noteworthy, however, that


A similarity standard and its use

163

Fig. 3. Dendrogram obtained by using the method of weighted between-group mean, based on data in Table 2. Рис. 3. Дендрограмма, построенная методом взвешенной межгрупповой средней по данным табл. 2.

the traditional methods of multivariate analysis are not capable of giving such an evaluation either. Along with the above, Tab. 2 also shows some samples from different forests, in which the values of Shorygins index exceed, albeit not more than by 10%, the lowest threshold of SHR=45% we have accepted as a similarity standard. Interestingly, samples D2 and D3 from a broadleaved-coniferous stand in Lithuania appear to be similar both to samples C1C3 from a Lithuanian woodland and to samples A1, A2 and A4 from a Mordovian bilberry spruce forest. Less surprisingly, but importantly enough, the similarity between samples from both forests from near Vilnius is considerably higher than with samples from the other woodlands. It seems useful to compare the above conclusions which are based of the similarity matrix in Tab. 2 with what could be obtained using traditional methods of analysis. Fig. 3 shows a dendrogram derived from a matrix of Manhattan distances (CBM) calculated using the same dataset on springtails from five woodlands which forms Tab. 2. The method of weighted between-group mean has been chosen from the usual set of connecting methods (nearest-neighbour analysis, farthest-neighbour analysis, Words test etc.), following an advice of A.T.

Terekhin it is applying cluster analysis to ecological problems that he is most experienced in. As usual in any analysis of dendrograms, the most difficult part is choosing a CBM threshold value to separate one cluster from another. If one sticks to a value of SHR = 45% (i.e. CBM = 1.10) which we proposed earlier, a clear-cut differentiation into five clusters can be seen like in Tab. 2, each cluster corresponding to one of the study forests. But there is no CBM value at which Fig. 3 would show that some samples from Lithuanian woodlands are similar in species structure to those from coniferous forests of European Russia. Along with the development and increasing distribution of personal computers, together with their statistical software, multidimensional scaling techniques have gained, absolutely unfairly in our opinion, much popularity. By the number of offered methods and algorithms these techniques steadily catch up with cluster analysis. Based on our own, however limited, but mostly negative experience in using these methods, we shall restrict ourselves to a single example. Fig. 4 depicts a so-called MDS diagram which is based on the same dataset from Tab. 2. It seems enough to compare the distances between dots in this diagram with the initial SHR values in Tab.


164

V.N. Maximov, N.A. Kuznetsova

Fig. 4. MDS diagram for dataset in Table 2. Рис. 4. MDS-диаграмма для данных табл. 2.

2 to become convinced that the algorithm applied exaggerates both the differences and similarities. This could also be seen in the so-called Sheppards diagram which we omit as redundant. Thus, cluster E (Ramenye) looks more compact than cluster B (Mordovian oak wood). In the meanwhile, the mean SEM distance in group B amounts to 0.33, whereas in group E to 0.64., i.e. nearly twice as much. If dot designations are to be removed from Fig. 2, it would never be possible to recognize the A1A4 cluster (Mordovian spruce stand) as being separate from the C1C3 and D1D3 dots. Therefore, based on our deliberately simple example, one can notice that both cluster analysis and multidimensional scaling can result either in a loss of useful information or in misleading statistics, or both. Acknowledgements. We are most grateful to Sergei Golovatch (Moscow, Russia) for his translation of the paper into English. We are also obliged to V.K. Shitikov, G.S. Rozenberg and deceased А.T. Terekhin for constructive criticism and useful advices.

References
Koltsova T.I., Konoplya L.A., Maximov V.N., Fedorov V.D. 1971. [To the problem of sample representativeness in phytoplankton sample analysis] // Gidrobiologicheskiy zhurnal. Vol.7. No.3. P.109116 [in Russian].

Konstantinov A.S. 1969. [Set theory used in biogeographical and ecological analyses] // Uspekhi sovremennoy biologii. Vol.67. No.1. P.99 108 [in Russian]. Kuznetsova N.A. 2005. [Organization of soil-dwelling springtail communities]. Moscow: Prometey. 244 pp. [In Russian]. Likhacheva N.E., Levich A.P., Koltsova T.I. 1979. [On a quantitative treatment of phytoplankton samples. II. Rank distributions of phytoplankton numbers in the Vilkitsky Strait] // Biologicheskie nauki. No.9. P.102106 [in Russian]. Maximov V.N. 1984. [Metrological properties of similarity indices (as applied to a biological analysis of water quality)] // Komplexnye otsenki kachestva poverkhnostnykh vod. Leningrad: Girgometeoizdat. P.7784 [in Russian]. Pesenko Y.A. 1982. [Principles and methods of quantitative analysis in faunistic studies]. Moscow: Nauka. 287 pp. [In Russian]. Plokhinskiy N.A. 1970. [Biometry]. Moscow: Moscow University Press. 367 pp. [In Russian]. Potapov M.B., Kuznetsova N.A. 2011. [Methods of study of microarthropod communities]. Moscow: KMK Sci. Press. 77 pp. [In Russian]. Shitikov V.K., Rozenberg G.S., Zinchenko T.D. 2005. [Quantitative hydrology: methods, criteria, solutions, Book 2]. Moscow: Nauka. 337 pp. [In Russian]. Shorygin A.A. 1939. [Nutrition, selectivity capacities and feeding interrelationships of some Gobiidae in the Caspian Sea] // Zoologicheskiy zhurnal. Vol.18. No.1. P.2751 [in Russian]. Smirnov N.A., Fedorov V.V., Maximov V.N. 1986. [Distinguishing seasonal groups in White Seas phytoplankton, based on a statistical analysis of similarity matrices] // Vestnik Moskovskogo universiteta, Ser.16 (Biologiya). No.3. P.6370 [in Russian]. Tukey J.W. 1981. Analysis of observation results. Exploratory data analysis. Moscow: Mir Publ. 693 pp. [Russian translation]. Whittaker R. 1980. [Communities and ecosystems]. Moscow: Progress. 129 pp. [Russian translation].