Документ взят из кэша поисковой машины. Адрес оригинального документа : http://mouse.belozersky.msu.ru/npidb/html/help.htm
Дата изменения: Thu Jun 5 17:58:00 2008
Дата индексирования: Mon Oct 1 19:50:58 2012
Кодировка:

HF clusters: algorithm

Overview of the algorithm

The list of 'non-polar groups' in proteins and nucleic acids.

An 'indivisible unit' of the hydrophobic interaction is an atomic group that includes one carbon or sulfur atom and covalently bound hydrogen atoms. In proteins and nucleic acids those groups are -CH3, -CH2-, -CH=, -SH, and -S- groups.

The program uses two variants of the list of non-polar groups. The extended list includes all such groups in protein and nucleic acids, the strict list includes those groups of extended one, whose main atom is covalently bound only with another carbon or sulfur atoms (and, thus, is not covalently bound with any polar atom).

The center of a non-polar group is the center of the carbon or sulfur atom. Hydrogen atoms are not considered in calculations.

The graph of interactions of non-polar groups in a 3D structure.

Each non-polar group corresponds to one vertex of the graph. Two vertices are connected by an edge (and are suppose to be in interaction) in the case if (a) the distance between the centers of the groups does not exceed the threshold value d; (b) the interaction of these groups is not prevented by any other (polar or non-polar) group. We say that a group C prevents the interaction between non-polar groups A and B, if the ball with the center C and the radius R_C includes the intersection of the balls with the centers A and B and the radii R_A and R_B, respectively. In the program all radii are equal to 2.7 Å, which corresponds to the minimal possible distance from a carbon-generated non-polar group to the oxygen atom of a water molecule. The value of d is a user parameter. Maximal vreasonable value of d is 5.4 Å, which corresponds to the sum of R_A and R_B radii in the case of carbon non-polar groups (roughly speaking, the hydrophobic interaction occurs if a water molecule cannot be placed between two non-polar groups).

Edges of the graph reflect two sufficiently different types of relations between atomic groups: (i) 'fixed' groups (including covalently bound ones); (ii) groups that are nearby located due to the hydrophobic interactions. Two non-polar groups A and B are 'fixed', if the distance between them is fixed due to chemical bonds, or any other interaction with energy exceeding sufficiently the free energy of the hydrophobic interaction between the groups. The examples of fixed non-polar groups are two groups, the central atoms of which are covalently bound with some third atom, or any two groups of aromatic ring. In the described algorithm and program these two different types of edges are not distinguished.

The graph of interactions of non-polar groups is denoted by Γ.

The definition of a (K,L)-cut of the graph Γ

The 1-neighborhood of a subgraph Δ of the graph Γ is the subgraph Δ' constructed from Δ by adding all edges that have at least one vertex belonging to Δ. The L-neighborhood is produced by L iterations of 1-neighborhood. A connected subgraph Δ with ≤K edges is called a (K, L)-cut, if the subgraph obtained by removing the edges of Δ from its L-neighborhood has two or more connected components (Fig. 1).

Fig. 1. Connected 2-edges subgraphs (thick lines) representing (a) (2, 1)-cut; (b) non-(2, 1)-cut. Vertices of 1-neighbourhood of the subgraphs are in dark grey. In (a), after the removal of thick edges from 1-neighbourhood the obtained subgraph is decomposed into 2 connected components. In (b), the analogous procedure gives a connected subgraph.

The algorithm of exhaustion of connected subgraphs of ≤ K edges

Assign indexes 1, 2, ... to all edges of Γ the n-th edge will be denoted by E_n. Let Δ be a connected subgraph of t<K edges, E_n be the edge of Δ with the maximal index n. The subgraph Δ' obtained by adding an edge E_x to Δ is called a 1-extension of Δ if (i) E_x belongs to the 1-neighbourhood of Δ; (ii) either x>n, or x<n and the subgraph obtained from Δ' by deletion of the edge E_n is not connected.

The algorithm starts with the exhaustion of all subgraphs of one edge. Then for each subgraph Δ all 1-extensions of Δ are considered. The procedure iterates K–1 times. It can be proved that each connected subgraph of ≤K edges is considered by the algorithm exactly ones.

The algorithm for the hydrophobic clusters detection

The suggested algorithm tries to solve the problem of the detection of hydrophobic clusters corresponding to spatial areas occupied exclusively or mainly by non-polar groups of atoms, in a given 3D structure of protein or macromolecular complex. It is rather difficult to give a strict definition of hydrophobic cluster, because of the complexity of configuration of non-polar groups in structures. Detection of hydrophobic clusters can be based on the fact that each non-polar group has many interactions inside a hydrophobic cluster, and at the same time between two hydrophobic clusters the interaction is weak. Detection of hydrophobic clusters in a planar variant is illustrated in Fig. 2.

Fig. 2. The planar illustration of (K,L)-cut algorithm. (a) The initial distribution of non-polar groups. (b) The graph of interactions of non-polar groups. (c) All (1,1)-cuts in the graph. The edges of (1,1)-cuts are thick ones. (d) The graph after removal of all (1,1)-cuts. An example of one (2,1)-cut is presented (thick edges). The clusters, found after the removal of (1,1)-cuts and one (2,1)-cut, are subscribed.

The algorithm realizes the above mentioned intuitive concept concerning a hydrophobic cluster as follows: (i) weak interactions between the hydrophobic clusters (yet not found) are searched, in terms of graph of interactions these weak interactions correspond to (K, L)-cuts; (ii) the hydrophobic clusters are detected as components obtained after the removal of weak interactions, in terms of graph these components are connected subgraphs. The parameters of the algorithm are the integers K, L used in the definition of (K, L)-cut, the list of non-polar groups, the threshold for distance between interacting non-polar groups d, and the minimal number m of non-polar groups in a found hydrophobic cluster.

The algorithm works as follows. At the first step the graph Γ of interactions of the non-polar groups is constructed (see Fig. 2b). At the second step all (K, L)-cuts in the graph Γ are found (Fig. 2c). If a subgraph with t<K edges is a (K, L)-cut, then larger subgraphs including this (K, L)-cut are not considered: the process of connected subgraphs exhaustion (see above) stops extending a subgraph if it is proved to be a (K, L)-cut. At the third step all edges of found (K, L)-cuts are removed from the graph (Fig. 2d). At the fourth step the connected components of the obtained graph are found. The vertices of each connected component that contains not less than m non-polar groups form a hydrophobic cluster. For every hydrophobic cluster the additional parameters (see further) are calculated. The found hydrophobic clusters, ordered in accordance to the number of non-polar groups, form the output of the algorithm.

For graphs of interacting non-polar groups the algorithm is linear in the number N of atoms in a given structure, because the number of interactions of a non-polar group with other groups is obviously restricted. The constant in this linear function is rather big and depends on K exponentially. Fortunately, only small K=1, 2, 3 are expected to be sufficient for reasonable hydrophobic cluster detection.

In the program realization the values K=1, L=1, m=3 are fixed.

Parameters of hydrophobic clusters

The following parameters are calculated for each hydrophobic cluster.
(i) The number of non-polar groups in the cluster.
(ii) The mean number of interactions of one non-polar group in the cluster. This parameter indirectly characterizes the form of the cluster. More compact packing of hydrophobic groups, and the form of the cluster closer to an ideal ball are characterized by a higher value of mean number of interactions of a non-polar group.
(iii) Semiaxes of the ellipsoid of inertia of the cluster. In calculating the ellipsoid, the groups of the cluster are considered as material points in space, with unitary masses. Ratios of semiaxes reflex the spatial form of the cluster. For example, if all three semiaxes are approximately equal, then the form of the cluster can be assumed to be close to a ball.

Strict list

Residue	Atom	(in PDB notation)
ALA	CB
ARG	CB
ARG	CG
ASN	CB
ASP	CB
CYS	CB
CYS	SG
GLU	CB
GLU	CG
GLN	CB
GLN	CG
HIS	CB
HIS	CG
HIS	CD2
HIS	CE1
ILE	CB
ILE	CG1
ILE	CD1
ILE	CG2
LEU	CB
LEU	CG
LEU	CD1
LEU	CD2
LYS	CB
LYS	CG
LYS	CD
MET	CB
MET	CG
MET	SD
MET	CE
PHE	CB
PHE	CG
PHE	CD2
PHE	CE2
PHE	CZ
PHE	CE1
PHE	CD1
PRO	CB
PRO	CG
THR	CG2
TRP	CB
TRP	CG
TRP	CD2
TRP	CE3
TRP	CZ3
TRP	CH2
TRP	CZ2
TRP	CE2
TYR	CB
TYR	CG
TYR	CD2
TYR	CE2
TYR	CE1
TYR	CD1
VAL	CB
VAL	CG1
VAL	CG2
N	C2*	N is for any DNA nucleotide
T	C5M
C	C5

Extended list

Residue	Atom	(in PDB notation)
X	C	X is for any amino acid residue
X	CA
ALA	CB
ARG	CB
ARG	CG
ARG	CD
ARG	CZ
ASN	CB
ASN	CG
ASN	ND2
ASP	CB
ASP	CG
CYS	CB
CYS	SG
GLU	CB
GLU	CG
GLU	CD
GLN	CB
GLN	CG
GLN	CD
HIS	CB
HIS	CG
HIS	CD2
HIS	CE1
ILE	CB
ILE	CG1
ILE	CD1
ILE	CG2
LEU	CB
LEU	CG
LEU	CD1
LEU	CD2
LYS	CB
LYS	CG
LYS	CD
LYS	CE
MET	CB
MET	CG
MET	SD
MET	CE
PHE	CB
PHE	CG
PHE	CD2
PHE	CE2
PHE	CZ
PHE	CE1
PHE	CD1
PRO	CB
PRO	CG
PRO	CD
SER	CB
THR	CB
THR	CG2
TRP	CB
TRP	CG
TRP	CD2
TRP	CE3
TRP	CZ3
TRP	CH2
TRP	CZ2
TRP	CE2
TRP	CD1
TYR	CB
TYR	CG
TYR	CD2
TYR	CE2
TYR	CZ
TYR	CE1
TYR	CD1
VAL	CB
VAL	CG1
VAL	CG2
N	C5*	N is for any DNA nucleotide
N	C4*
N	C3*
N	C2*
N	C1*
T	C2
T	C4
T	C5
T	C5M
T	C6
C	C2
C	C4
C	C5
C	C6
A	C8
A	C5
A	C6
A	C2
A	C4
G	C8
G	C5
G	C6
G	C2
G	C4