[chimerax-users] incomplete output from matchmaker embedded in python script

Wed Jan 18 04:33:36 PST 2023

Hello ChimeraX community,

I have been trying to use the Chimerax tool to sequentially check one query pdb against multiple (later up to 10000) pdb files. For this I have been working with the python scripting functionality. With the following code I was able to get the resulting data (number of matched atoms and resulting RMSD) saved as a html log file:

import os
from chimerax.core.commands import run as rc

# change to folder with data files
os.chdir("/home/ubuntu/Desktop/Chimerax_search/N_crassa_pdb")

# gather the names of .pdb files in the folder
file_names = [fn for fn in os.listdir(".") if fn.endswith(".pdb")]

# give path to query file to be searched against pdb library
query = "/home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb"

# open the query file as #1 in ChimeraX
rc(session, "open /home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb")

# loop through the files
for fn in file_names:
# open first target pdb as #2
    rc(session, "open " + fn)
# match both structures
    rc(session, "match #1 to #2")
# save log with ID matching input filename, close #2 and clear log
    rc(session, "log save " + fn.split(".")[0] + ".log")
    rc(session, "close #2")
    rc(session, "log clear")

Instead of trying to gather the relevant data from thousands of rather large html output files I instead would prefer to only get then resulting values from matchmaker. I learned that the command actually has all the values I wanted as return values, so I tried to write a modified script, that saves these values in a variable and collects the values for every target pdb file in a dict:

import os
from chimerax.core.commands import run as rc

# change to folder with data files
os.chdir("/home/ubuntu/Desktop/Chimerax_search/N_crassa_pdb")

# gather the names of .pdb files in the folder
file_names = [fn for fn in os.listdir(".") if fn.endswith(".pdb")]

# give path to query file to be searched against pdb library
query = "/home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb"

# open the query file as #1 in ChimeraX
rc(session, "open /home/ubuntu/Desktop/Chimerax_search/query/BDFB_007497.pdb")

# create empty dict to collect matchmaker output
output =  {}

# loop through the files, opening, processing, and closing each in turn
for fn in file_names:
    id = fn.split("/")[-1].split(".")[0]
    rc(session, "open " + fn)
    match = rc(session, "match #1 to #2")
    output.update({id : match})
    rc(session, "log save " + fn.split(".")[0] + ".log")
    rc(session, "close #2")
    rc(session, "log clear")

# display dict
print(output)

For now I just used 4 target pdbs with my query and only tried printing the result to quickly check if it fits. However the result is a bit baffling to me. I do get the RMSD values, but not the other data:

{'NCU01382': [{'full match atoms': <chimerax.atomic.molarray.Atoms object at 0x7f1944492130>, 'full ref atoms': <chimerax.atomic.molarray.Atoms object at 0x7f194448f610>, 'final match atoms': <chimerax.atomic.molarray.Atoms object at 0x7f1946694430>, 'final ref atoms': <chimerax.atomic.molarray.Atoms object at 0x7f1946694550>, 'full RMSD': 63.065617285514556, 'final RMSD': 1.365365439813019, 'transformation matrix': <chimerax.geometry.place.Place object at 0x7f194669c130>, 'aligned ref seq': <chimerax.atomic.molobject.Sequence object at 0x7f19f7b8f8b0>, 'aligned match seq': <chimerax.atomic.molobject.StructureSeq object at 0x7f19f7b8f610>}

here is one of the resulting hits, for my first target pdb (NCU01382). As you can see, the RMSD scores are correctly given, but for other values the internal object names and no resulting value are displayed. Any help with figuring this out would be greatly appreciated.

Best Wishes
Milan Borchert

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rbvi.ucsf.edu/pipermail/chimerax-users/attachments/20230118/46883aa1/attachment.html>