[chimerax-users] label_seq_id in mmcif files for Chimera

Wed Jul 8 04:27:56 PDT 2020

Hi Greg,

> Adding a atom_site.label_seq_id isn't different from supplying a residue
> number in PDB file.  When there are adjacent residues of the same type,
> does the PDB reader see a duplicate atom and generate a new residue?
> merge the residues?  generate an error? I haven't tested the PDB reader,
> but a residue number helps it too.

The sequence number/id from the PDB format tells which atoms are in
the same residue, but it doesn't imply connectivity between residues,
because the numbers don't need to be consecutive. In the mmCIF format
it is stored as _atom_site.auth_seq_id (+pdbx_PDB_ins_code for the
full id). So this makes conversion from PDB to mmCIF problematic. If
SEQRES is present I do sequence alignment to determine label_seq_id.
If SEQRES is missing I could ask the user to supply the full sequence,
but then the user may think that (since this was not obligatory when
working with PDB files) moving to mmCIF is a step backward. I could
infer gaps and increase label_seq_id by 2 if there is a gap, but the
resulting mmCIF file can be used for any purpose, not only by Chimera.
The apparent gap may actually be caused by misplaced atoms and it
could turn out that the gap in numbering is causing later different
problems. It's not clear to me what's better here - leaving
label_seq_id null or filling it with the best guesses (which sometimes
will be wrong).

Marcin