Документ взят из кэша поисковой машины. Адрес оригинального документа : http://kodomo.fbb.msu.ru/FBB/StudentScience/themes_2008s/Kister_article.doc
Дата изменения: Wed Mar 12 13:32:20 2008
Дата индексирования: Tue Oct 2 12:41:05 2012
Кодировка: koi8-r

Protein sequence/supersecondary structure relationship.
Sequence patterns.

Alexander Kister

Department of Health Informatics, University of Medicine and Dentistry of
New Jersey, School of Health Related Professions, 65 Bergen Street, Newark,
NJ 07107, USA


Abstract


Motivation: It is known from Anfinsen's experiments that there are strict
regularities for determining protein structure uniquely from amino acid
sequences. Knowledge of sequence/ structure relationship principles will
allow us to understand the basis of protein folding algorithm.

Results:
Proteins with an identical supersecondary structure were shown to share
common sequence pattern even if they have very low sequence similarities.
To find common sequence regularities for these proteins, a novel
algorithm of supersecondary structure-based multiple sequences alignment
was developed. It based on the suggestion that an alignment of pairs of
residues that are connected by the hydrogen bonds is interdependent. The
alignment of proteins with the same supersecondary structure reveals that
up to 30-35% of positions in sequences are 'conserved positions'.
Residues at the conserved positions form the pattern of a supersecondary
structure that could be used for protein classification as a 'sequence
tag' of sandwich proteins with a given supersecondary structure.


Contact: kisterae@umdnj.edu


Supplementary information: Supplementary data are available on
Bioinformatics online




1. Introduction


Understanding protein sequence-structure relationship is key to solving
many problems of molecular biology, such as annotation of genome
sequences, protein structure prediction, protein-protein interaction, and
protein evolution, among others. A fundamental insight into sequence-
structure relationship of proteins is due to Anfinsen, who showed that
all information about native structure of a protein is encoded in its
amino acid sequence (Anfinsen, (1973). Therefore, it is to be expected
that similar sequences would encode similar structures and that structure
can be determined by analogy with known protein structures of similar
sequences. The idea that sequence similarity translates into structural
similarity underlies most modern high-accuracy algorithms of structure
prediction (Bowie et al, 1991, Wallner B. and Elofsson, (2005), Dalton
and. Jackson (2007), Misura et al (2006), Nayeem et al (2006), Kopp and
Schwede (2004), Xiang, (2006), Tramontano A. and Morea V. (2003)).

Crucial question in sequence-structure field is: how similar must the
sequences be in order for their structures to be similar? To answer this
question it is necessary to define what is meant by sequence and
structure similarity. Similarity among sequences can be measured with
alignment procedures, which are designed to maximize the number of
matching residues that are identical or chemically similar (Chakrabarti,
(2006), Edgar, Batzoglou (2006)). To determine similarity of structures,
one may superpose Ca atoms coordinates of residues in different
structures and obtain root mean square deviation (RMSD) of the distances
among Ca atoms (Koehl (2001), Jewett et al (2003), Konagurthu, (2006),
Vlanovicek et al (2002), Carugo, (2007)). Using these definitions of
sequence and structure similarity, it has been shown that proteins tend
to share similar three-dimensional structures when their sequence
identity exceeds 30% (Ginalski, (2006)). An important corollary of this
result is that though each residue makes some contribution to 3D
structure formation, the relative weights of the contributions vary
greatly. A relatively small number of residues conserved throughout
evolution are expected to be critical to structure stability. Residues
conserved across all proteins with similar 3D structure are referred to
as structure determinants.

The concept of structure determinants may help explain various exceptions
to the statement that 30% sequence similarity results in structure
similarity. The exceptions occur in either direction: some sequences with
very low residue similarity have very similar structure (Chothia (1986),
Tian, Skolnick, J. (2003), Devos, Valencia (2000), Espadaler, 2005) Tian,
Skolnick (2003), Devos, Valencia (2000)), while others, with very high
sequence similarity, have very different 3D structures (Alexander et al
(2007)). Assuming the decisive role in structure formation of just a few
key residues, we can explain why very similar sequences are structurally
dissimilar by positing that they do not share structure determinants.
Conversely, even widely dissimilar sequences could fold into similar
structures if they share a set of structure determinants.


Analysis of this investigation focuses on the relationship between amino
acid sequences and supersecondary structures (SSS) of beta sandwich-like
proteins. The reason of this analysis, rather than between sequences and
3D structures with atomic coordinates, because a) the concept of SSS
identity is much more rigorous than the semi-quantitative notion of 3D
structure similarity. By definition, beta sandwich proteins have
identical SSS if they have same number of strands in the two beta sheets
and same order (arrangement) of strands in these beta sheets; b)
existence of structural classification of SSS, which is based on the
arrangement of strands in the two main beta sheets (Chiang et al, 2007).
Proteins with identical SSS may markedly differ in the number and
composition of residues within strands, and in length and conformation of
loops among strands. These data give the possibility to compare very
different sequences to uncover protein sequence regularities, unrelated
with family similarity, but common for a particular supersecondary
structure.


This research evaluates the hypothesis that proteins with identical SSS -
regardless of degree of sequence identity among sequences, which can be
as low as 10%. - share a number of 'conserved positions' occupied either
by exclusively hydrophobic or by exclusively hydrophilic residues.
Residues at the conserved positions are called 'SSS determinants',
because they are expected to be critical to SSS formation.

Another hypothesis that was tested in this work is that of uniqueness of
SSS determinants. It was suggested that each SSS 'archetype' is described
by a distinctive set of SSS determinants that could not be found in
proteins with a different SSS. In other words, each set of determinants
is SSS-specific: it is a kind of a 'tag' that identifies all proteins
with a given SSS. As such, SSS determinants can be used for protein
classification.

Identification of SSS determinants involves multiple sequence alignments
of all proteins with same SSS. The widely used alignment algorithms - PSI-
BLAST, HMM - are not applicable to sequences with very low sequence
similarity. Therefore, for comparison of sequences of proteins that share
same SSS, a supersecondary structure-based multi-sequence alignment
algorithm was developed. The proposed method involves 'projecting' common
supersecondary structural features onto the sequence to reveal SSS
determinants.

An essential feature of this algorithm is that the sequences are first
divided into fragments that correspond to secondary structure elements -
strands and loops, which are then aligned individually. Another important
characteristic is that the alignment of residues in strands is centered
on residues that form hydrogen bond contacts between strands. Residues
that form hydrogen bonds serve as 'nucleus' of alignment. This stratagem
- alignment of residues that play a role in supersecondary structure
formation - allows one to compare sequences with very low similarity and
variable lengths which would be impossible to do with the traditional
alignment techniques.

2 MATERIALS AND METHODS




2.1 Study material


Sandwich-like proteins comprise a large group of great sequence diversity
and variety of biological functions, but all composed of two beta sheets
packed face-to-face. Recently, a classification of SSS of sandwich
proteins was developed, which serves as an organizing framework for the
new 'SSS Database' (http://binfs.umdnj.edu/sssdb) (Chiang, (2007)).
Proteins in the database are grouped in accordance with their SSS.
Classification is based solely on identification of corresponding strands
and the knowledge of inter-strand hydrogen bonds, which determine the
arrangement of strands within a domain. The supersecondary structural
classification does not take into account sequence similarity, and
proteins with same SSS can have very different amino acid sequences with
less than 10% of residues identity and may belong to different families
and superfamilies in SCOP and CATH databases (Murzin et al., 1995, Orengo
et al, 1997).

For multiple sequences alignment a representative set of protein
sequences was chosen according to the SCOP database classification. The
lowest classification unit in SCOP database may be called a 'cluster-
species'. Usually, all proteins from same SCOP clusters have the same
number and arrangement of strands in a protein domain, i.e. they have
identical SSS motif. For supersecondary structure classification in SSS
database all SCOP clusters were classified according to their SSS motifs.
In this research for an alignment of proteins with the same SSS motif one
or two randomly selected proteins is selected from each cluster with a
given SSS motif. The goal of this paper is to uncover and describe
conserved sequence characteristics, the SSS determinants, for proteins of
each of three SSS motifs shown in Fig. 1a, b, c.

2.2 Algorithm for the alignment of proteins with same supersecondary
structure

The goal of sequence alignment is to maximize the number of conserved
positions that are occupied by identical or similar residues in all
aligned sequences. To achieve this goal one sometimes needs to allow for
'gaps' within sequences so that similar amino acids can be assigned to
the same position despite having different sequential indices within the
sequence.

The search for conservative positions entails comparing amino acids at
aligned position with respect to their chemical and physical properties
as well as to their structural role. The most popular heuristic methods
of sequence alignment, such as PSI-BLAST and HMM, use dynamic approach to
examine numerous variants of alignments to estimate the number of
conserved positions (Altschul et al., 1997, Durbin et al., 1999).
However, when it comes to widely non-homologous sequences, standard
methods are ineffective in uncovering conserved positions. For proteins
of low sequence similarity, structure-based sequence alignment can be
applied instead (Konagurthu et el. 2006, Yang, Honig, 2000, Kim, Lee
(2007)). The advantage of using structural data for purposes of alignment
is in that structure is less susceptible to change than sequence during
evolution. On the other hand, comparison of structures is more difficult
than of sequences because the criteria of assessing structure similarity
are not well defined (Ye, Godzik (2005)).


To test our hypothesis that proteins with identical SSS but with widely
dissimilar sequences share common sequence characteristics, a new
algorithm of sequence alignment was introduced. The essential aspect of
this algorithm is that the alignment procedure is performed separately
for residues in loops and strands.

a) Two rules for alignment of residues in strands. Consider structures A
and B shown in Fig. 2a. In the first structure, residues a and a' in
strands I, and II, respectively, form a hydrogen bond between the main
chain atoms, and in the second structure residues b and b', in strands I
and II, respectively, are connected by a hydrogen bond.
Rule 1. If residues a and b as the result of alignment are assigned the
same position index, then residues a' and b' should be also be aligned to
each other.
Rule 2. No gaps are allowed for alignment of residues in strands.

Consider, for example, alignment of residues in strands I, II and III in
structures A and B shown in Fig. 2a. Let us pick a residue in structure A
that forms an inter-strand hydrogen bond with another residue such as
residue a1 in strand I which shares a hydrogen bond with residue a'1.
Suppose that residue a1 is aligned with residue b1 in strand I of
structure B. It then follows from Rule 1 that residues a1' and b1' are
aligned with each other. Rule 2 states that no gaps are allowed within
the strands, so downstream residues a2 and a3 in strand I of structure A
must be aligned with residues b2 and b3 in strand I of structure B.
Likewise, residues a5 and a'3 in strands II in structure A must be
aligned with residues b8 and b'3 in structure B. Invoking Rules 1 and 2
in this manner allows one to alignment all residues in strands II and
strands III as illustrated in fig. 3a. Thus, initial alignment of a pair
of H-bond-forming residues in different structures in combination with
Rules 1 and 2 leads to unambiguous alignment of all residues within a
beta sheet.

It should be noted that the above-described procedure of sequence
alignment works well only for hydrogen contact network that define beta
sheet whose strands do not have bulges, such as the one shown in fig. 2a.
However, if inter-strand hydrogen bonds cause a bulge in a strand (see
strand 3 in structure B fig. 2b), then the strand needs to be divided
into two pieces in order to preserve the 'no gap within each strand'
rule. Residues of each strand piece are aligned independently. For
example, strand 3 in structure B is divided into two parts: residues
[b'7, b10, b'8] in one, and residues [b11, b12, b13] in the second one.
It is clear that if strand 3 were not broken up into two, there would
have been a gap within this strand as a result of alignment of residues
from structure A and B.

Various variants of alignment are possible depending on the initial
choice of the pair of H-bonded residues. The variant that maximizes the
number of conserved positions is the one to be preferred. Let us consider
an alternative variant of aligning structures A and B, in which residue
a1 is matched with residue b3. This initial choice of matched residue
pair leads to the alignment presented in fig. 3b. The number of possible
variants is, however, quite limited; since there are only a few initial
choices of residues pairs around which all other residues are aligned.
Usually strands are connected by 2-4 hydrogen bonds; consequently there
are only 2-4 variants of alignment of residues in a beta sheets

b) Alignment of residues in the loops. The multiple sequence alignment is
performed independently for sequence fragments of each loop. Thus, all
sequences in proteins that corresponded to loops between strand 1 and 2
are aligned among themselves, same for loops between strands 2 and 3 and
so forth. Because conformation of loops may be very variable in different
proteins, no structure data are used in loop alignment.

2.3 Selection of conserved positions.

The optimal sequence alignment is the one with the maximum number of
conserved positions of whole beta sandwich sequence. It combines the best
variants of alignments of residues in each of the two beta sheets and the
best local alignments for each loop. If the numbers of conserved
positions are identical or very close in several variants, the priority
is given to the variant in which one or several of the conserved
positions are occupied by a single residue or a very few similar
residues. Another important parameter that is taken into consideration in
selecting of the best alignment is the value of RMSD of Ca atoms of
aligned residues in the strands.

In sequence alignments the conserved positions are occupied by chemically
similar residues with similar structural properties. The precise
definition of what constitutes 'residue similarity' for purposes of
alignment could vary: in some instances conserved positions are occupied
only by identical residues, in others - only by charged residues, in
others - only by hydrophilic or only by hydrophobic residues. Selection
of most appropriate definition of residue similarity in this research was
guided by the following considerations. Sequences of proteins with same
SSS are so diverse, that it is exceptionally rare to find conserved
positions occupied by a single residue in all sequences. It is known that
in many investigations were demonstrated critical importances of
distribution of hydrophobic and hydrophilic amino acids in defining the
secondary structures (Hennetin et al., 2003, Xiong et al., 1995, Eudes et
al., 2007, Mandel-Gutfreund, Gregoret, 2005). Generalizing the later idea
to the level of SSS, hydrophobicity and hydrophilicity of residues was
chosen as the criterion for selection of conserved positions in the
alignment procedure.

3 RESULTS

As the result of multiple sequence alignment positions were classified as
conserved hydrophobic or hydrophilic if all residues at a given position -
with one exception allowed - belong to the either hydrophobic (V, I, L, M,
F, W, and C) or hydrophilic (Q, N, E, D, R, K, H, T, S, G and P) group of
residues. Two residues, A and Y, according to our observation of the extent
to which they are conserved in sandwich proteins - have roughly equal
chance of being either in hydrophobic conserved positions in strands or in
hydrophilic conserved positions in loops. The residues at all conserved
positions constitute the defining sequence pattern of SSS. To test the
specificity of the pattern, the EMBOSS/Preg program (Rice et al., 2000) was
applied to 50,577 sequences of proteins found in SCOP database.

The pattern for SSS motif 2E (fig. 2 a). According to SSS database 418
proteins belong to this SSS motif (designated as 2E motif). The SSS motif
2E describes proteins of 2 folds, 4 superfamilies, 4 families and 6
clusters. Results of alignments of 7 representative sequences are presented
in Table 1 (see Supplementary Material). In total 27 conserved positions
was found - 14 were hydrophilic and 13 were hydrophobic. Residues at these
positions form the defining pattern of motif 2E (Fig. 3). The search for
protein sequences with this pattern picked up 340 'true positives' (out of
total of 418 contained within 50,577 sequences). Permission to mismatch one
position in the pattern gives additional 75 true positive sequences (415
out of 418). Remaining 3 proteins with the motif 3E (true negative) that
were not picked up in the search have 2 or 3 mismatching positions. The
search of pattern without mismatch revealed 8 proteins that are not
described by 2E motif (false positive). All these proteins are sandwich-
like proteins with 7 strands, but have different SSS motifs.

The pattern for SSS motif 2U (fig. 2 b). According to SSS database, 15
proteins belong to this SSS motif (2U). The SSS motif 2U contains proteins
of 2 folds, 4 superfamilies, 4 families and 6 clusters. Results of
alignments of 6 representative sequences are presented in Table 2 (see
Supplementary Material). Residues at 15 hydrophobic and 16 hydrophilic
conserved positions form the pattern for motif 2E (see the sequence pattern
of this SSS motif in Fig. 3). Although, proteins described by this motif
belong to different superfamilies, several of the conserved positions are
occupied by just a few similar residues Search for sequences containing the
pattern characteristic of motif 2U uncovered all 15 proteins of this motif
and no sequences with other motives (100% specificity and sensitivity).

The pattern for motif 3D (fig. 2 c). In SSS database, 48 proteins belong to
3D motif, which encompasses proteins from of 3 superfamilies, 4 families
and 12 clusters. Proteins from different superfamilies have very variable
sequences with less than 10% of residues identity. Results of alignments of
representative sequences are presented in Table 3 in Supplementary
Material). Residues at 14 hydrophobic and 21 hydrophilic conserved
positions form the pattern for this motif. Search for the protein sequences
with this pattern disclosed 17 true positives and 1 false positive.
Permission to mismatch any one position in the pattern picked up additional
28 true positive (for a total of 45 out of 48 proteins with this motif) and
26 false positive sequences. Remaining 3 proteins with motif 3E had 2
mismatching positions.

4 DISCUSSION

Existence of 'motif patterns' characteristic and specific for very diverse
group of sequences describing the same SSS supports the idea that a number
of key residues play the decisive role in SSS formation.

Residues in a protein domain could be conceptually divided into two groups:
a small select set of SSS determinants - about 25-35% of all residues -
that are critically responsible for design (arrangement of strands) of a
SSS; and larger group of remaining residues, with 'supporting roles' in
structure formation.

Substitution (mutation) of SSS determinants is generally limited to
residues that belong to same group - either hydrophilic or hydrophobic - as
the residue being substituted for. We observed typically no more than one
or two mutations in the conserved positions in a protein, which were not
'same-type' mutations (Table 1, 2 and 3, see Supplementary Material). By
contrast, mutations of residues of the 'supporting' residues are much more
variable and exchange of hydrophobic for hydrophilic amino acid and vice
versa are common.

Sequences can be analyzed with respect to their structural determinants and
support residues. Four scenarios are possible: 1) both SSS determinants and
supportive residues are similar across all analyzed proteins. These
proteins are relatively similar to each other on a sequence level, and have
similar SSS and most likely similar 3D structure; 2) SSS determinants are
similar, while among supportive residues there is a large degree of
variability. This results in low total sequence similarity. Most likely,
that these proteins have identical SSS and many variations in 3D
structures. In this work, proteins of this kind were studied, they have
very diverse sequences (high variability among support residues). 3) Large
variability is observed both among the SSS determinants and among the
supporting residues. These proteins have very low total sequence similarity
of proteins and most likely belong to different SSS motifs. 4) Large
variability among SSS determinants, but high degree of similarity among
most of the residues in the support group: here there is a high total
sequence similarity of proteins. Proteins likely belong to different folds,
since they do not share SSS determinates. As an example, consider two very
different tertiary structures: a 3-[pic] helix fold and an [pic]/[pic] fold
which share 88% sequence identity (Alexander et al., 2007). This example
illustrates the idea that fold can be encoded by only 12% of the amino
acids (7 SSS determinants) and led the authors to conclude that 49 residues
in these proteins ('a support group') "provide a relatively neutral
sequence background".

Our work suggests that the key point of sequence/structural relationship
analysis is to reveal the conserved positions, whose residues serve as SSS
determinants. In order to overcome the problem of aligning residues in very
dissimilar sequences we employed, so-called, "blind" residue procedure in
the first step of alignment. In the blind alignment the chemical
properties of residues to be aligned are not taken into consideration, only
their structural role - their participation in H-bond formation - counts.
Once hydrogen bond-forming residues in sequences are aligned to each other,
alignment of all other residues follows. Only then, on the next step
selection the optimal variants of the alignment, residues content at each
position analyzed to determine the conserved positions.

Notable feature of the alignment method proposed in our work is that
alignment of just one pair of H-bonded residues automatically generates
alignment of all other residues that make up the beta sheet. The choice of
which pair of residues should serve as linchpin of alignment is made
retroactively, by comparing various variants of alignment with respect to
how many conserved positions each one results in. The choice of initial
pair can sometimes be guided by commonsense considerations, such as those
'structurally significant' residues such as Cys, or Trp should probably be
aligned in all sequences. These residues suggest a particular alignment
variant that can be later tested against others.

Another advantage of this alignment procedure is that it is mostly based on
hydrogen bond contacts. (fig.2a). In fact, this alignment can be considered
as a kind of superposition of a rigid body - networks of hydrogen bonds
between residues, which crucially decrease the number of possible variants
of alignments.


ACKNOWLEDGEMENT

This work was partially supported by the UMDNJ research grant. I thank Dr.
I. Gelfand for very useful discussions and Drs. M. Shibata and V. Sobolev
for critical comments, and Mrs. M. Goldman for continious encouragement of
the research project.

References

Alexander, P.A., Y. He,Y., Chen,Y., Orban,J., Bryan,P.N. The design and
characterization of two proteins with 88% sequence identity but different
structure and function PNAS (2007) 104,: 11963-11968

Altschul,S.F., Madden,T.L., SchДffer,A.A., Zhang, J., Zhang,Z.,
Miller,W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs Nucleic Acids Research, (1997),. 25, :3389-
3402

Anfinsen, C.B. "Principles that Govern the Folding of Protein Chains"
Science, (1973) 181, : 223-230.

Bowie JU, Luthy R, Eisenberg D. "A method to identify protein sequences
that fold into a known three-dimensional structure". Science (1991) 253, :
164-170.

Carugo,O. Recent progress in measuring structural similarity between
proteins. Curr Protein Pept Sci.(2007), 8(3), : 219-241.


Chakrabarti S, Lanczycki CJ, Panchenko AR, Przytycka TM, Thiessen PA,
Bryant SH State of the art: refinement of multiple sequence alignments. BMC
Bioinformatics. (2006) 7: 499-508.


Chiang,Y-S., Gelfand,T.I., Kister,A.E., Gelfand,I.M. New classification of
supersecondary structures of sandwich-like proteins uncovers strict
patterns of strand assemblage Proteins (2007), 68, : 915-921

Chothia C, Lesk A.M. The relation between the divergence of sequence and
structure in proteins. EMBO J. (1986) 5(4):823-826


Dalton, J. A. R. and. Jackson R. M, "R. M. R. An evaluation of automated
homology modelling methods at low target template sequence similarity

Bioinformatics ((2007)), 23,: 1901 - 1908.


Devos D, Valencia A Practical limits of function prediction. Proteins
(2000) 41(1): 98-107

Durbin, R., Eddy, S.R., Krogh, A., Mitchison,G. Biological Sequence
Analysis: Probabilistic Models of Proteins and Nucleic Acids. (1999)
Cambridge University Press.
Edgar, R.C., Batzoglou,S. Multiple sequence alignment Current Opinion in
Structural Biology (2006) 16, : 368-373

Espadaler,J., AragЭИs,R., Eswar,N., Marti-Renom,M.A., Querol,E.,
AvilИs,F.X., Sali,A., Oliva,B. Detecting remotely related proteins by their
interactions and sequence similarity Proc Natl Acad Sci U S A. (2005);
102(20): 7151-7156


Eudes R, Le Tuan K, DelettrИ J, Mornon JP, Callebaut I. A generalized
analysis of hydrophobic and loop clusters within globular protein
sequences. BMC Struct Biol. (2007), 7,: 2-24

Gunalski,K. Comparative modeling for protein structure prediction. Curr
Opin Struct Biol. (2006), 16(2): 172-177.

Hennetin J, Le Tuan K, Canard L, Colloc'h N, Mornon JP, Callebaut I. Non-
intertwined binary patterns of hydrophobic/non hydrophobic amino acids are
considerably better markers of regular secondary structures than
nonconstrained patterns. Proteins (2003), 51: 236-44.

Jewett AI, Huang CC, Ferrin TE MINRMS: an efficient algorithm for
determining protein structure similarity using root-mean-squared-distance.
Bioinformatics. (2003) 19(5),: 625-634

Kim C, Lee B. Accuracy of structure-based sequence alignment of automatic
methods (2007) BMC Bioinformatics, 8:355-372


Koehl, P."Protein structure similarities" Curr. Opin. Struct. Biol.
(2001), 11, 348-353.


Konagurthu A.S., Whisstock J. C., Stuckey P.J. and Lesk AM MUSTANG: A
Multiple Structural Alignment Algorithm PROTEINS (2006), 64,:559-574

Kopp J. and Schwede T. Automated protein structure homology modeling: A
progress report Pharmacogenomics, ((2004)), 5, : 405-416


Mandel-Gutfreund Y, Gregoret LM On the significance of alternating patterns
of polar and non-polar residues in beta-strands J Mol Biol. (2002), 323,:
453-461

Misura, K. M. S., Chivian, D., Rohl, C. A., Kim, D. E., Baker, D.

Physically realistic homology models built with ROSETTA can be more
accurate than their templates, PNAS, ((2006)); 103(14), : 5361 - 5366.

Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a
structural classification of proteins database for the investigation of
sequences and structures. J. Mol. Biol. 247, 536-540.

Nayeem A, Sitkoff D, Krystek S Jr "A comparative study of available
software for high-accuracy homology modeling: From sequence alignments to
structural models". Protein Sci (2006) 15: 808-824

Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B.,
Thornton, J.M. CATH- A Hierarchic classification of Protein Domain
Structures" Structure. (1997)), 5.: 1093-1108.

Rice,P. Longden,I. and Bleasby,A. EMBOSS: The European Molecular Biology
Open Software Suite Trends in Genetics (2000) 16, : 276-277

Tian,W., Skolnick,J. How Well is Enzyme Function Conserved as a Function of
Pairwise Sequence Identity? J. Molec. Biol. (20030,: 863-882

Tramontano A., Morea V. Assessment of homology-based predictions in CASP5
Proteins (2003) 53: 352-368.

Vlahovicek,K., Carugo,O., Pongor,S The PRIDE server for protein three-
dimensional similarity J. Appl. Cryst. (2002). 35,: 648-649



Wallner B., Elofsson, A. All are not equal: A benchmark of different
homology modeling programs Protein Science (2005), 14,:1315-1327

Xiang, Z., Advances in homology protein structure modeling. Curr. Protein
Pept Sci. (2006); 7(3),: 217-27.

Xiong H, Buckwalter BL, Shieh H-M, Hecht MH. Periodicity of polar and non
polar amino acids is the major determinant of secondary structure in self-
assembling oligomeric peptides. Proc. Natl Acad Sci USA (1995), 92: 6349-
6353

Yang A.S., Honig B. An integrated approach to the analysis and modeling of
protein sequences and structures. III. A comparative study of sequence
conservation in protein structural families using multiple structural
alignments. J Mol Biol.(2000), 301,: 691-711

Ye,Y., Godzik,A. Multiple flexible structure alignment using partial order
graphs Bioinformatics (2005) 21,: 2362-2369

Legend

Figure 1. Schematic presentation of the arrangement of strands in SSS.
The numbers represent the strands that make up sheets I and II. Hydrogen
bonds are indicated with lines between strands. Structural and sequence
characteristics, and SSS classification are presented in the SSS database.
SSS describes proteins of a) 1 fold, 3 superfamilies, 4 families and 11
domains - SSS motif 2E; b) 4 superfamilies, 4 families and 6 domains - SSS
motif 3D; c) 2 folds, 4 superfamilies, 4 families and 6 domains - SSS motif
2U.

Fig. 2. The beta sheets with 3 anti-parallel strands in structure I and II.
The strands are schematically shown by arrows. The hydrogen bonds between
residues are presented by dotted lines. a) "Regular" beta sheet, with
standard H-bond contacts between residues in strands; b) strands with a
bulge (strand 3) may create "non-regular" H-bond contacts - one residue
(b13) form hydrogen bonds with two residues (b'1 and b9).

Fig. 3 Sequence alignments based on hydrogen bonds contacts. The positions,
which are occupied by matching residues, are numbered. The alignment in
the variant a) corresponds to 11 pairs of corresponding residues, and 10
mismatching positions (marked by "-"); the variant b) - 13 pairs of
corresponding residues and 6 mismatching positions.

Table 1 The hydrophobic/hydrophilic sequence pattern for proteins of the
motif 2E, 2U and 3D
Patterns are shown in PROSITE format. Residues, which occupy the conserved
positions, are shown in brackets. The variable positions are marked as x.
The expression {d - r} x shows that a distance (number of residues) between
two consecutive conserved positions is varied between d and r.