Документ взят из кэша поисковой машины. Адрес оригинального документа : http://kodomo.cmm.msu.su/FBB/year_04/doc/term2/help_pattern.doc
Дата изменения: Fri Apr 22 13:43:19 2005
Дата индексирования: Tue Oct 2 00:22:38 2012
Кодировка:

Pattern syntax rules

Pattern syntax used in the PROSITE database :

1. The standard IUPAC one-letter codes for the amino acids are used.
2. The symbol `x' is used for a position where any amino acid is
accepted.
3. Ambiguities are indicated by listing the acceptable amino acids for a
given position, between square brackets `[ ]'. For example: [ALT]
stands for Ala or Leu or Thr.
4. Ambiguities are also indicated by listing between a pair of curly
brackets `{ }' the amino acids that are not accepted at a given
position. For example: {AM} stands for any amino acid except Ala and
Met.
5. Each element in a pattern is separated from its neighbor by a `-'.
6. Repetition of an element of the pattern can be indicated by following
that element with a numerical value or, if it is a gap ('x'), by a
numerical range between parentheses.
Examples:
x(3) corresponds to x-x-x
x(2,4) corresponds to x-x or x-x-x or x-x-x-x
A(3) corresponds to A-A-A
Note: You can only use a range with 'x', i.e. A(2,4) is not a valid
pattern element.
7. When a pattern is restricted to either the N- or C-terminal of a
sequence, that pattern either starts with a `<' symbol or respectively
ends with a `>' symbol. In some rare cases (e.g. HREF="/cgi-
bin/nicesite.pl?PS00267">PS00267 or PS00539), '>' can also occur
inside square brackets for the C-terminal element. 'F-[GSTV]-P-R-L-
[G>]' means that either 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>' are
considered.

Extended syntax allowed in the ScanProsite tool :

. If your pattern consists of one-letter amino acid codes only, without
any ambiguous residues, you need not specify the '-', i.e. you can
directly copy/paste peptide sequences into the text field.

Example: M-A-S-K-E can be written as MASKE.
. To search all sequences which do not contain a certain amino acid, e.g
Cys, you can use <{C}*>.

Examples :

[AC]-x-V-x(4)-{ED}

This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any
but Glu or Asp}

< A-x-[ST](2)-x(0,1)-V

This pattern, which must be in the N-terminal of the sequence (`<'), is
translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val

<{C}*>

This pattern describes all sequences which do not contain any Cysteines.

IIRIFHLRNI

This pattern describes all sequences which contain the subsequence
'IIRIFHLRNI'.