Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://mouse.belozersky.msu.ru/~evgeniy/cgi-bin/wLake/alg.html
Дата изменения: Fri May 6 13:34:55 2011 Дата индексирования: Tue Oct 2 05:34:05 2012 Кодировка: |
wLake 1.7 algrithm 3D structures of proteins or their complexes are considered as related if their significant parts can be reliably superimposed. Two water molecules from different structures are called equivalent if the distance between the centers of their oxygen atoms is less than a threshold. A set of pairwise equivalent water molecules from different structures is called a cluster of water molecules. An input of the algorithm is a set of superimposed related structures. The goal is to find all ConWMs in the input. Lemma. A graph G of |V| vertices and |E| edges cannot contain any cliques of a size Min or greater if |V| is less than Min or |E| is less than Min * (Min - 1) / 2. G is a complete graph if |E| = |V| * (|V| - 1) / 2. Lemma is used in algorithm to reject graphs that cannot contain cliques and to check the completeness of a graph. At the first stage, a special graph is constructed. Each water molecule corresponds to a vertex in the graph. Two vertices of the graph are joined by an edge if water molecules are equivalent. Thus, a ConWM corresponds to a clique in the graph. To find all cliques of a size exceeding a fixed minimum (Min) the following algorithm was developed. ALGORITHM wLake (Graph G) If G can't contain a clique of >= Min vertices then return NULL If G is complete then return G Vmin := a vertex from G with the minimal valence Vs := {Vmin} {vertices connected with Vmin} Es := {edges connecting any pair of vertices Vs} G' := new Graph () For each V Vs do Add copy of V into G' For each E Es do Add copy of E into G' Delete Vmin from G return wLake (G) wLake (G') This generally exponential algorithm works rather fast in cases of graphs of water molecules (a few seconds of CPU for about thirty superimposed structures). The result of the program is a list of ConWMs. To estimate the reliability of each detected cluster, we developed a special program WLStat3. The input of the program is a list of clusters of size 2 and more detected by wLake 1.7 and a numbers of total and unclusters water molecules in the every experimental structure. WLStat3 randomly put all the water molecules into the different hydration site (the hydration site is a cluster of water molecules or a simple unclustered molecule) 2000 times and count the number of clusters of a size 2, 3, etc observed. Than E-value for each detected SWM cluster is determined. wLake and WLStat programs can be downloaded for off-line usage here.