[chimerax-users] CIF writer renumbers residues

Tue Jun 23 12:39:30 PDT 2020

It's not that simple.  The label_seq_id values are sequential positive 
integers, and if the structure is edited, then those numbers can 
change.  The auth_seq_id is a much better fit.  Here's part of the 
description of atom_site.auth_seq_id:

> The author may assign values to _atom_site.auth_seq_id in any desired 
> way. For instance, the values may be used to relate this structure to 
> a numbering scheme in a homologous structure, including sequence gaps 
> or insertion codes. Alternatively, a scheme may be used for a 
> truncated polymer that maintains the numbering scheme of the full 
> length polymer. In all cases, the scheme used here must match the 
> scheme used in the publication that describes the structure.

And the auth_seq_id is used for a consistent numbering scheme in 
homologous structures in the PDB for easy comparison.

     -- Greg

On 6/23/2020 11:55 AM, Tom Goddard wrote:
> Hi Greg,
>
> The mmCIF writer should preserve both the author and the mmcif (label_seq_id) numbering since this is essential for easy comparison of different structures. You say ChimeraX preserves author numbering but it should also preserve label_seq_id.  Other software may only use label_seq_id and not author numbering and the user will have no control over that.
>
> 	Tom
>
>
>> On Jun 23, 2020, at 11:40 AM, Greg Couch <gregc at cgl.ucsf.edu> wrote:
>>
>> If your mmCIF input file is incomplete and the missing sequence information, then the mmCIF writer does not write it out either. The are too many ways to get it wrong and mislead the scientist. That said, I can envision adding a "best guess" option to the mmCIF writer someday.
>>
>> As for the residue renumbering, there are two sets of residue numbers in a mmCIF file, the internal label_seq_id that is used to link the the atom_site table entries with other mmCIF tables.  And the auth_seq_id, which is the author assigned value.  The auth_seq_id written is the same as the one read in.
>>
>> In your case, you should use the auth_seq_id for matching (assuming it's present, you could add it with the same value as the label_seq_id to your original mmCIF file).  Or, the original mmCIF file needs to supply the sequence information (the entity, entity_poly, and entity_poly_seq tables), so the gaps are known instead of guessed.  Or, perhaps ISOLDE could help you by inserting "unknown" gap residues in to the chain to preserve the numbering.
>>
>>      HTH,
>>
>>      Greg
>>
>> On 6/23/2020 10:47 AM, Daniel Asarnow wrote:
>>> Hi all,
>>> When I save a CIF from ChimeraX (while using ISOLDE), the warning "Not saving entity_poly_seq for non-authoritative sequences" is produced, and all the residues have been sequentially renumbered from 1. When there are missing residues, each segment has to be renumbered manually afterwards. Is there some way to avoid this with PDB or CIF inputs? Is it a bug?
>>>
>>> Best,
>>> -da
>>>
>> _______________________________________________
>> ChimeraX-users mailing list
>> ChimeraX-users at cgl.ucsf.edu
>> Manage subscription:
>> https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
>>