[Chimera-users] Thanks, Re: mmcif, auth_xx information versus label_xx information

Fri Dec 7 09:28:38 PST 2018

So it's not so simple.  The auth_.... fields are not required, they just 
happen to be in the files that the PDB provides.  Other programs can not 
be expected to provide them.  The RCSB has software that can validate 
mmCIF files against various dictionaries: the MMCIF Dictionary Suite, 
https://sw-tools.rcsb.org/apps/MMCIF-DICT-SUITE/index.html. Programs 
that generate mmCIF files should verify that the generated files are 
complete.  And the dictionary used for validation should be documented 
in the audit_conform table.

If your data is in the mmCIF format,  you are better off using ChimeraX 
than Chimera.  The reader is orders of magnitude faster. And I am 
working on fixing any bugs that come up :-).  I have documented the 
parts of mmCIF files that are important for ChimeraX.  For example, the 
Cartesian coordinates are required for atoms.  See 
https://www.cgl.ucsf.edu/chimerax/docs/devel/bundles/mmcif/src/mmcif.html 
-- in particular, read "ChimeraX Fast mmCIF Guidelines" and "Writing 
mmCIF FIles in ChimeraX".

     -- Greg

On 12/7/2018 12:47 AM, moocow at mindless.com wrote:
> Thanks for the explanation. The water-numbering argument is convincing 
> as a reason to use author_... fields. I think these also correspond 
> with the fields in the old pdb format.
> I can see the PDB's logic in using a flexible format with room to put 
> in every conceivable piece of information, but one can wonder if they 
> really did the world a favour. It is almost guaranteed that this 
> program will use author_.. fields and that program will use label_... 
> fields.
>
> Yes, Chimera uses the author's numbering because that it what is used 
> in the author's publication. It turns out that this is especially 
> important for waters since the PDB/RCSB refuses to number waters in 
> the label_seq_id field which breaks the one-to-one mapping of 
> (label_seq_id, label_asym_id) to a residue.
>
> In Chimera, the mapping with label_seq_id is discarded after is it 
> used.  In ChimeraX, there is no UI for label_seq_id, but, in Python 
> code, the nth residue in a Chain instance is label_seq_id n.  And 
> residues have a mmcif_chain_id attribute which is the label_asym_id.  
> So it is possible to reconstruct the mapping if need be.
>
>     HTH,
>
>     Greg
>
> On 12/6/18 4:03 AM, moocow at mindless.com wrote:
>
>     An mmcif / numbering question...
>     When specifying residues, chimera seems to use "auth_seq_id" (the
>     authors' numbering), as well as the authors' naming of chains
>     (auth_asym_id).
>     Is there any way to ask chimera to find residues using
>     "label_seq_id" and "label_seq_id" (the numbering the protein data
>     bank gives) ?
>     Is the label_xxx information discarded when chimera reads an mmcif
>     file or is it hidden in an attribute somewhere ?
>
>
> _______________________________________________
> Chimera-users mailing list: Chimera-users at cgl.ucsf.edu
> Manage subscription: http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimera-users/attachments/20181207/b6832a52/attachment.html>