allpy
view allpy/base.py @ 822:d87129162eb4
Implemented & tested new markup API. See #95
1) Sequences, Alignment and Blocks now have two new methods:
- add_markup(name, markup_class=optional, **kwargs=optional)
- remove_markup(name)
name refers to the same name as in aln.markups[name] or sequence[i].name
It is now explicitly denied to create markups any other way.
2) Markups now have `remove()` method that means 'release all memory that would
not be released otherwised, if we just remove markup from the dictionary'. For
sequences markups it removes markup attribute from each monomer.
3) Added necessary del sequence_markup[monomer] method.
4) Many base classes have attribute `kind`; for Alignments and Blocks it is
'alignment', for Sequences it is 'sequence' for AlignmentMarkups it is
'alignment_markup' for SequenceMarkups it is 'sequence_markup'. This attribute
is crucial for new alignment construction API.
5) Common stuff for MarkupContainers (Alignments and Sequences) is in
MarkupContainerMixin.
author | Daniil Alexeyevsky <dendik@kodomo.fbb.msu.ru> |
---|---|
date | Fri, 15 Jul 2011 16:43:03 +0400 |
parents | 91e73fb1ac79 |
children | 0192c5c09ce8 |
line source
8 # import this very module as means of having all related classes in one place
12 """Set of characters to recoginze as gaps when parsing alignment."""
15 """Monomer object."""
18 """Either of 'dna', 'rna', 'protein'."""
21 """Mapping of related types. SHOULD be redefined in subclasses."""
24 """A mapping from 1-letter code to Monomer subclass."""
27 """A mapping from 3-letter code to Monomer subclass."""
30 """A mapping from full monomer name to Monomer subclass."""
32 @classmethod
34 """Create new subclass of Monomer for given monomer type."""
36 pass
47 # Save the class in data.monomers so that it can be pickled
48 # Some names are not unique, we append underscores to them
49 # in order to fix it.
58 # We duplicate distinguished long names into Monomer itself, so that we
59 # can use Monomer.from_code3 to create the relevant type of monomer.
64 @classmethod
66 """Create all relevant subclasses of Monomer."""
70 @classmethod
72 """Create new monomer from 1-letter code."""
77 @classmethod
79 """Create new monomer from 3-letter code."""
82 @classmethod
84 """Create new monomer from full name."""
91 """Returns one-letter code"""
95 """Monomers within same monomer type are compared by code1."""
105 """Common functions for alignment and sequence for dealing with markups.
106 """
109 """Hook to be called from __init__ of actual class."""
113 """Create a markup object, add to self. Return the created markup.
115 - `name` is name for markup in `self.markups` dictionary
116 - optional `markup_class` is class for created markup
117 - optional keyword arguments are passed on to the markup constructor
119 For user markups you have to specify `name` and `markup_class`,
120 for the standard automatical markups just `name` is enough.
121 """
122 # We have to import markups here, and not in the module header
123 # so as not to create bad import loops.
124 # `base` module is used extensively in `markups` for inherinance,
125 # so breaking the loop here seems a lot easier.
136 """Remove markup."""
141 """Sequence of Monomers.
143 This behaves like list of monomer objects. In addition to standard list
144 behaviour, Sequence has the following attributes:
146 * name -- str with the name of the sequence
147 * description -- str with description of the sequence
148 * source -- str denoting source of the sequence
150 Any of them may be empty (i.e. hold empty string)
151 """
154 """Mapping of related types. SHOULD be redefined in subclasses."""
157 """Description of object kind."""
167 @classmethod
169 """Create sequence from a list of monomer objecst."""
179 @classmethod
181 """Create sequences from string of one-letter codes."""
193 """Returns sequence of one-letter codes."""
197 """Hash sequence by identity."""
201 """Alignment. It is a list of Columns."""
204 """Mapping of related types. SHOULD be redefined in subclasses."""
207 """Ordered list of sequences in alignment. Read, but DO NOT FIDDLE!"""
210 """Description of object kind."""
213 """Initialize empty alignment."""
218 # Alignment grow & IO methods
219 # ==============================
222 """Add sequence to alignment. Return self.
224 If sequence is too short, pad it with gaps on the right.
225 """
234 """Add row from a string of one-letter codes and gaps. Return self."""
242 ]
249 """Add row from row_as_list representation and sequence. Return self."""
258 """Pad alignment with empty columns on the right to width n."""
263 """Append sequences from file to alignment. Return self.
265 If sequences in file have gaps (detected as characters belonging to
266 `gaps` set), treat them accordingly.
267 """
272 """Write alignment in FASTA file as sequences with gaps."""
276 # Data access methods for alignment
277 # =================================
280 """Return list of rows (temporary objects) in alignment.
282 Each row is a dictionary of { column : monomer }.
284 For gap positions there is no key for the column in row.
286 Each row has attribute `sequence` pointing to the sequence the row is
287 describing.
289 Modifications of row have no effect on the alignment.
290 """
291 # For now, the function returns a list rather than iterator.
292 # It is yet to see, whether memory performance here becomes critical,
293 # or is random access useful.
305 """Return list of rows (temporary objects) in alignment.
307 Each row here is a list of either monomer or None (for gaps).
309 Each row has attribute `sequence` pointing to the sequence of row.
311 Modifications of row have no effect on the alignment.
312 """
323 """Return list of string representation of rows in alignment.
325 Each row has attribute `sequence` pointing to the sequence of row.
327 `gap` is the symbol to use for gap.
328 """
343 """Return representaion of row as list with `Monomers` and `None`s."""
347 """Return string representaion of row in alignment.
349 String will have gaps represented by `gap` symbol (defaults to '-').
350 """
359 """Return list of columns (temorary objects) in alignment.
361 Each column here is a list of either monomer or None (for gaps).
363 Items of column are sorted in the same way as alignment.sequences.
365 Modifications of column have no effect on the alignment.
366 """
376 # Alignment / Block editing methods
377 # =================================
380 """Remove all gaps from alignment and flush results to one side.
382 `whence` must be one of 'left', 'right' or 'center'
383 """
395 """Remove all empty columns."""
401 """Turn all row positions into gaps (but keep sequences intact)."""
407 """Replace contents of `dst` with those of `new`.
409 Replace contents of elements using function `merge(dst_el, new_le)`.
410 """
417 """Replace contents of sequences with those of `new` alignment."""
418 # XXX: we manually copy sequence contents here
419 # XXX: we only copy, overlapping parts and link to the rest
429 """Replace column contents with those of `new` alignment.
431 In other words: copy gap patterns from `new` to `self`.
433 `self.sequences` and `new.sequences` should have the same contents.
434 """
443 ]
448 """Replace alignment contents with those of other alignment."""
454 """Apply function to the alignment (or block); inject results back.
456 - `function(block)` must return block with same line order.
457 - if `copy_descriptions` is False, ignore new sequence names.
458 - if `copy_contents` is False, don't copy sequence contents too.
460 `function` (object) may have attributes `copy_descriptions` and
461 `copy_contents`, which override the same named arguments.
462 """
471 """Realign self.
473 I.e.: apply function to self to produce a new alignment, then update
474 self to have the same gap patterns as the new alignment.
476 This is the same as process(function, False, False)
477 """
482 """Column of alignment.
484 Column is a dict of { sequence : monomer }.
486 For sequences that have gaps in current row, given key is not present in
487 the column.
488 """
491 """Mapping of related types. SHOULD be redefined in subclasses."""
494 """Return hash by identity."""
498 """Block of alignment.
500 Block is an intersection of several rows & columns. (The collections of
501 rows and columns are represented as ordered lists, to retain display order
502 of Alignment or add ability to tweak it). Most of blocks look like
503 rectangular part of alignment if you shuffle alignment rows the right way.
504 """
507 """Alignment the block belongs to."""
510 """List of sequences in block."""
513 """List of columns in block."""
515 @classmethod
517 """Build new block from alignment.
519 If sequences are not given, the block uses all sequences in alignment.
521 If columns are not given, the block uses all columns in alignment.
523 In both cases we use exactly the list used in alignment, thus, if new
524 sequences or columns are added to alignment, the block tracks this too.
525 """
537 """Base class for sequence and alignment markups.
539 We shall call either sequence or alignment a container. And we shall call
540 either monomers or columns elements respectively.
542 Markup behaves like a dictionary of [element] -> value.
544 Every container has a dictionary of [name] -> markup. It is Markup's
545 responsibility to add itself to this dictionary and to avoid collisions
546 while doing it.
547 """
550 """Name of markup elements"""
553 """Markup takes mandatory container and name and optional kwargs.
555 Markups should never be created by the user. They are created by
556 Sequence or Alignment.
557 """
563 """Recalculate markup values (if they are generated automatically)."""
564 pass
567 """Remove the traces of markup object. Do not call this yourself!"""
568 pass
570 @classmethod
572 """Restore markup from `record`. (Used for loading from file).
574 `record` is a dict of all metadata and data related to one markup. All
575 keys and values in `record` are strings, markup must parse them itself.
577 Markup values should be stored in `record['markup']`, which is a list
578 of items separated with either `record['separator']` or a comma.
579 """
583 """Save markup to `record`, for saving to file.
585 For description of `record` see docstring for `from_record` method.
586 """
590 """Return list of elements in the container in proper order."""
594 """Return list of markup values in container."""
598 """Markup for sequence.
600 Behaves like a dictionary of [monomer] -> value. Value may be anything
601 or something specific, depending on subclass.
603 Actual values are stored in monomers themselves as attributes.
604 """
613 """Remove the traces of markup object. Do not call this yourself!"""
618 """Return list of monomers."""
622 """Return list of markup values, if every monomer is marked up."""
626 """Part of Mapping collection interface."""
632 """Part of Mapping collection interface."""
636 """Part of Mapping collection interface."""
640 """Part of Mapping collection interface."""
644 """Part of Mapping collection interface."""
648 """Markupf for alignment.
650 Is a dictionary of [column] -> value. Value may be anything or something
651 specific, depending on subclass.
652 """
661 """Return a list of columns."""
665 """Return a list of makrup values, if every column is marked up."""
668 # vim: set ts=4 sts=4 sw=4 et: