[chimerax-users] incomplete output from matchmaker embedded in python script
Borchert, Milan
milan.borchert at tu-braunschweig.de
Thu Jan 19 01:07:22 PST 2023
Hello Eric,
Thank you for your swift answer. This helps a lot for my analysis.
Best wishes
Milan
________________________________
Von: Eric Pettersen <pett at cgl.ucsf.edu>
Gesendet: Mittwoch, 18. Januar 2023 19:44:19
An: Borchert, Milan
Cc: chimerax-users at cgl.ucsf.edu
Betreff: Re: [chimerax-users] incomplete output from matchmaker embedded in python script
Hi Milan,
The return value from matchmaker is a list of dictionaries, one per chain pairing. The atomic.molarray.Atoms objects in them are efficient containers for multiple Atom objects, that allows vector-like operations on them for setting and getting attribute values. But those objects can also be treated just like lists, iterated over, etc. Therefore you can use len() to find out how many atoms there are. The 'full' Atoms objects are all the paired atoms from the initial sequence alignment, and the 'finall' Atoms objects are after iterative pruning to discard poorly matching regions.
--Eric
Eric Pettersen
UCSF Computer Graphics Lab
On Jan 18, 2023, at 4:33 AM, Borchert, Milan via ChimeraX-users <chimerax-users at cgl.ucsf.edu<mailto:chimerax-users at cgl.ucsf.edu>> wrote:
Hello ChimeraX community,
I have been trying to use the Chimerax tool to sequentially check one query pdb against multiple (later up to 10000) pdb files. For this I have been working with the python scripting functionality. With the following code I was able to get the resulting data (number of matched atoms and resulting RMSD) saved as a html log file:
import os
from chimerax.core.commands import run as rc
# change to folder with data files
os.chdir("/home/ubuntu/Desktop/Chimerax_search/N_crassa_pdb")
# gather the names of .pdb files in the folder
file_names = [fn for fn in os.listdir(".") if fn.endswith(".pdb")]
# give path to query file to be searched against pdb library
query = "/home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb"
# open the query file as #1 in ChimeraX
rc(session, "open /home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb")
# loop through the files
for fn in file_names:
# open first target pdb as #2
rc(session, "open " + fn)
# match both structures
rc(session, "match #1 to #2")
# save log with ID matching input filename, close #2 and clear log
rc(session, "log save " + fn.split(".")[0] + ".log")
rc(session, "close #2")
rc(session, "log clear")
Instead of trying to gather the relevant data from thousands of rather large html output files I instead would prefer to only get then resulting values from matchmaker. I learned that the command actually has all the values I wanted as return values, so I tried to write a modified script, that saves these values in a variable and collects the values for every target pdb file in a dict:
import os
from chimerax.core.commands import run as rc
# change to folder with data files
os.chdir("/home/ubuntu/Desktop/Chimerax_search/N_crassa_pdb")
# gather the names of .pdb files in the folder
file_names = [fn for fn in os.listdir(".") if fn.endswith(".pdb")]
# give path to query file to be searched against pdb library
query = "/home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb"
# open the query file as #1 in ChimeraX
rc(session, "open /home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb")
# create empty dict to collect matchmaker output
output = {}
# loop through the files, opening, processing, and closing each in turn
for fn in file_names:
id = fn.split("/")[-1].split(".")[0]
rc(session, "open " + fn)
match = rc(session, "match #1 to #2")
output.update({id : match})
rc(session, "log save " + fn.split(".")[0] + ".log")
rc(session, "close #2")
rc(session, "log clear")
# display dict
print(output)
For now I just used 4 target pdbs with my query and only tried printing the result to quickly check if it fits. However the result is a bit baffling to me. I do get the RMSD values, but not the other data:
{'NCU01382': [{'full match atoms': <chimerax.atomic.molarray.Atoms object at 0x7f1944492130>, 'full ref atoms': <chimerax.atomic.molarray.Atoms object at 0x7f194448f610>, 'final match atoms': <chimerax.atomic.molarray.Atoms object at 0x7f1946694430>, 'final ref atoms': <chimerax.atomic.molarray.Atoms object at 0x7f1946694550>, 'full RMSD': 63.065617285514556, 'final RMSD': 1.365365439813019, 'transformation matrix': <chimerax.geometry.place.Place object at 0x7f194669c130>, 'aligned ref seq': <chimerax.atomic.molobject.Sequence object at 0x7f19f7b8f8b0>, 'aligned match seq': <chimerax.atomic.molobject.StructureSeq object at 0x7f19f7b8f610>}
here is one of the resulting hits, for my first target pdb (NCU01382). As you can see, the RMSD scores are correctly given, but for other values the internal object names and no resulting value are displayed. Any help with figuring this out would be greatly appreciated.
Best Wishes
Milan Borchert
_______________________________________________
ChimeraX-users mailing list
ChimeraX-users at cgl.ucsf.edu<mailto:ChimeraX-users at cgl.ucsf.edu>
Manage subscription:
https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rbvi.ucsf.edu/pipermail/chimerax-users/attachments/20230119/51425d4e/attachment.html>
More information about the ChimeraX-users
mailing list