[chimerax-users] label_seq_id in mmcif files for Chimera

Tue Jul 7 17:01:47 PDT 2020

On 7/7/2020 3:50 PM, Marcin Wojdyr wrote:
> On Tue, 7 Jul 2020 at 22:38, Greg Couch <gregc at cgl.ucsf.edu> wrote:
>> As you know, the entity_poly_seq table says which polymeric residues are
>> adjacent to each other (entity_poly_seq.num values are the
>> atom_site.label_seq_id values).
> Yes, it's an enhanced equivalent of the SEQRES record in the PDB
> format. Files from the wwPDB always have it, but files from other
> sources will often miss it. I think currently almost all the mmCIF
> files in circulation originate from the wwPDB. But this is slowly
> changing and people are starting to use mmCIF as a working format.
The sequence information is extremely useful for eliminating ambiguities 
in the atom_site data.  And it is frequently known by the user, even 
when it is missing from the mmCIF file.
>> Without that information, ChimeraX
>> still needs the atom_site.label_seq_id to imply the polymeric ordering.
> Ok. It would be useful, though, if ChimeraX could infer the ordering
> and connectivity in the same way as it does for PDB files.
>
> Thanks,
> Marcin

Adding a atom_site.label_seq_id isn't different from supplying a residue 
number in PDB file.  When there are adjacent residues of the same type, 
does the PDB reader see a duplicate atom and generate a new residue?  
merge the residues?  generate an error? I haven't tested the PDB reader, 
but a residue number helps it too.

Learning to use mmCIF effectively will take time.  Programmers should 
use the PDB's mmCIF Dictionary Suite to validate the mmCIF their 
programs output -- it won't validate completely (because it's for mmCIF 
files served by the PDB), but it will show where the mmCIF output can be 
improved.

For more information about ChimeraX reading mmCIF, see 
https://www.rbvi.ucsf.edu/chimerax/docs/devel/bundles/mmcif/src/mmcif_guidelines.html.

ChimeraX's mmCIF reader tries really hard to show you the data exactly 
as it is specified.  So if there's badly specified data, you'll be able 
to see it.  Over time, we will improve ChimeraX's ability to handle 
underspecified data as developers start generating mmCIF files.  But we 
don't plan to modify the mmCIF reader if we can help it.   Adding code, 
to infer information that should be explicitly given in the mmCIF file, 
slows down the mmCIF reader for everyone.  This is especially true for 
large structures with over a million atoms.

     -- Greg