[chimerax-users] label_seq_id in mmcif files for Chimera
Greg Couch
gregc at cgl.ucsf.edu
Tue Jul 7 17:01:47 PDT 2020
On 7/7/2020 3:50 PM, Marcin Wojdyr wrote:
> On Tue, 7 Jul 2020 at 22:38, Greg Couch <gregc at cgl.ucsf.edu> wrote:
>> As you know, the entity_poly_seq table says which polymeric residues are
>> adjacent to each other (entity_poly_seq.num values are the
>> atom_site.label_seq_id values).
> Yes, it's an enhanced equivalent of the SEQRES record in the PDB
> format. Files from the wwPDB always have it, but files from other
> sources will often miss it. I think currently almost all the mmCIF
> files in circulation originate from the wwPDB. But this is slowly
> changing and people are starting to use mmCIF as a working format.
The sequence information is extremely useful for eliminating ambiguities
in the atom_site data. And it is frequently known by the user, even
when it is missing from the mmCIF file.
>> Without that information, ChimeraX
>> still needs the atom_site.label_seq_id to imply the polymeric ordering.
> Ok. It would be useful, though, if ChimeraX could infer the ordering
> and connectivity in the same way as it does for PDB files.
>
> Thanks,
> Marcin
Adding a atom_site.label_seq_id isn't different from supplying a residue
number in PDB file. When there are adjacent residues of the same type,
does the PDB reader see a duplicate atom and generate a new residue?
merge the residues? generate an error? I haven't tested the PDB reader,
but a residue number helps it too.
Learning to use mmCIF effectively will take time. Programmers should
use the PDB's mmCIF Dictionary Suite to validate the mmCIF their
programs output -- it won't validate completely (because it's for mmCIF
files served by the PDB), but it will show where the mmCIF output can be
improved.
For more information about ChimeraX reading mmCIF, see
https://www.rbvi.ucsf.edu/chimerax/docs/devel/bundles/mmcif/src/mmcif_guidelines.html.
ChimeraX's mmCIF reader tries really hard to show you the data exactly
as it is specified. So if there's badly specified data, you'll be able
to see it. Over time, we will improve ChimeraX's ability to handle
underspecified data as developers start generating mmCIF files. But we
don't plan to modify the mmCIF reader if we can help it. Adding code,
to infer information that should be explicitly given in the mmCIF file,
slows down the mmCIF reader for everyone. This is especially true for
large structures with over a million atoms.
-- Greg
More information about the ChimeraX-users
mailing list