allpy

changeset 817:835efa2a8c71
optimization of rasmol_homology: keep structure loaded of two sequences only One of steps of this program is superimposition of all sequences with main sequence and saving of all structures to pdb file. Loaded structure of all sequences is not needed to do this. At every moment only structure of main sequence and of superimposing sequence. This optimization results in essential memory saving. Output files should be the same to previous revision. To implement this optimization methods supeimpose and save_pdb of alignment were replaced with methods with same names of sequence. So some code is same as code of methods of alignment. Note: behaves as before, with superimpose and save_pdb methods of alignment. Model was returned by these methods but never used while generating spt script. This can result in collisions of rasmol selections when number of sequences is greater than max number of chains of one model.
author: boris (kodomo) <bnagaev@gmail.com>
date: Fri, 15 Jul 2011 02:23:27 +0400
parents: d137df18a8bf
children: 0b888869a4d8
files: utils/rasmol_homology.py
diffstat: 1 files changed, 23 insertions(+), 5 deletions(-) [+]
[-]

utils/rasmol_homology.py 28 utils/rasmol_homology.py 28
utils/rasmol_homology.py 28
     1.1 --- a/utils/rasmol_homology.py	Fri Jul 15 02:06:23 2011 +0400
     1.2 +++ b/utils/rasmol_homology.py	Fri Jul 15 02:23:27 2011 +0400
     1.3 @@ -44,7 +44,7 @@
     1.4  alignment = Alignment().append_file(open(options.markup), format='markup')
     1.5  markups.AlignmentNumberMarkup(alignment)
     1.6  
     1.7 -for sequence in alignment.sequences:
     1.8 +def pdb_loader(sequence):
     1.9      sequence.__class__ = Sequence
    1.10      sequence.markups['pdb_resi'].add_pdb(download_pdb=download_pdb)
    1.11      markups.SequenceNumberMarkup(sequence)
    1.12 @@ -59,8 +59,26 @@
    1.13      columns |= set(block.columns)
    1.14  columns = list(columns)
    1.15  
    1.16 -alignment.superimpose(columns, extra_columns=True)
    1.17 -idmap = alignment.save_pdb(open(options.output_pdb, 'w'))
    1.18 +chains = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    1.19 +pdb_file = open(options.output_pdb, 'w')
    1.20 +monomer2id = {}
    1.21 +idmap = {}
    1.22 +def process_sequence(sequence, i):
    1.23 +    for monomer in sequence:
    1.24 +        monomer2id[monomer] = monomer.pdb_residue.get_id()[1]
    1.25 +    chain = chains[i % len(chains)]
    1.26 +    model = i // len(chains)
    1.27 +    idmap[sequence] = (chain, model)
    1.28 +    sequence.save_pdb(pdb_file, chain, model)
    1.29 +main_sequence = alignment.sequences[-1]
    1.30 +pdb_loader(main_sequence)
    1.31 +for i, sequence in enumerate(alignment.sequences[:-1]):
    1.32 +    pdb_loader(sequence)
    1.33 +    sequence.superimpose(main_sequence, columns, extra_columns=True)
    1.34 +    process_sequence(sequence, i)
    1.35 +    sequence.pdb_unload()
    1.36 +process_sequence(main_sequence, len(alignment.sequences)-1)
    1.37 +main_sequence.pdb_unload()
    1.38  
    1.39  for sequence, chain_model in idmap.items():
    1.40      chain, model = chain_model
    1.41 @@ -85,8 +103,8 @@
    1.42      parts = []
    1.43      for sequence in block.sequences:
    1.44          chain, model = idmap[sequence]
    1.45 -        resi_from = block.columns[0][sequence].pdb_residue.get_id()[1]
    1.46 -        resi_to = block.columns[-1][sequence].pdb_residue.get_id()[1]
    1.47 +        resi_from = monomer2id[block.columns[0][sequence]]
    1.48 +        resi_to = monomer2id[block.columns[-1][sequence]]
    1.49          parts.append('%i-%i:%s' % (resi_from, resi_to, chain))
    1.50  
    1.51      selection = ','.join(parts)