allpy
changeset 817:835efa2a8c71
optimization of rasmol_homology: keep structure loaded of two sequences only
One of steps of this program is superimposition of all sequences
with main sequence and saving of all structures to pdb file.
Loaded structure of all sequences is not needed to do this.
At every moment only structure of main sequence and of superimposing sequence.
This optimization results in essential memory saving.
Output files should be the same to previous revision.
To implement this optimization methods supeimpose and save_pdb
of alignment were replaced with methods with same names of sequence.
So some code is same as code of methods of alignment.
Note: behaves as before, with superimpose and save_pdb methods of alignment.
Model was returned by these methods but never used while generating spt script.
This can result in collisions of rasmol selections when number of sequences is
greater than max number of chains of one model.
author | boris (kodomo) <bnagaev@gmail.com> |
---|---|
date | Fri, 15 Jul 2011 02:23:27 +0400 |
parents | d137df18a8bf |
children | 0b888869a4d8 |
files | utils/rasmol_homology.py |
diffstat | 1 files changed, 23 insertions(+), 5 deletions(-) [+] |
line diff
1.1 --- a/utils/rasmol_homology.py Fri Jul 15 02:06:23 2011 +0400 1.2 +++ b/utils/rasmol_homology.py Fri Jul 15 02:23:27 2011 +0400 1.3 @@ -44,7 +44,7 @@ 1.4 alignment = Alignment().append_file(open(options.markup), format='markup') 1.5 markups.AlignmentNumberMarkup(alignment) 1.6 1.7 -for sequence in alignment.sequences: 1.8 +def pdb_loader(sequence): 1.9 sequence.__class__ = Sequence 1.10 sequence.markups['pdb_resi'].add_pdb(download_pdb=download_pdb) 1.11 markups.SequenceNumberMarkup(sequence) 1.12 @@ -59,8 +59,26 @@ 1.13 columns |= set(block.columns) 1.14 columns = list(columns) 1.15 1.16 -alignment.superimpose(columns, extra_columns=True) 1.17 -idmap = alignment.save_pdb(open(options.output_pdb, 'w')) 1.18 +chains = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 1.19 +pdb_file = open(options.output_pdb, 'w') 1.20 +monomer2id = {} 1.21 +idmap = {} 1.22 +def process_sequence(sequence, i): 1.23 + for monomer in sequence: 1.24 + monomer2id[monomer] = monomer.pdb_residue.get_id()[1] 1.25 + chain = chains[i % len(chains)] 1.26 + model = i // len(chains) 1.27 + idmap[sequence] = (chain, model) 1.28 + sequence.save_pdb(pdb_file, chain, model) 1.29 +main_sequence = alignment.sequences[-1] 1.30 +pdb_loader(main_sequence) 1.31 +for i, sequence in enumerate(alignment.sequences[:-1]): 1.32 + pdb_loader(sequence) 1.33 + sequence.superimpose(main_sequence, columns, extra_columns=True) 1.34 + process_sequence(sequence, i) 1.35 + sequence.pdb_unload() 1.36 +process_sequence(main_sequence, len(alignment.sequences)-1) 1.37 +main_sequence.pdb_unload() 1.38 1.39 for sequence, chain_model in idmap.items(): 1.40 chain, model = chain_model 1.41 @@ -85,8 +103,8 @@ 1.42 parts = [] 1.43 for sequence in block.sequences: 1.44 chain, model = idmap[sequence] 1.45 - resi_from = block.columns[0][sequence].pdb_residue.get_id()[1] 1.46 - resi_to = block.columns[-1][sequence].pdb_residue.get_id()[1] 1.47 + resi_from = monomer2id[block.columns[0][sequence]] 1.48 + resi_to = monomer2id[block.columns[-1][sequence]] 1.49 parts.append('%i-%i:%s' % (resi_from, resi_to, chain)) 1.50 1.51 selection = ','.join(parts)