[Chimera-users] Questions about Matchmaker and GDT
Elaine Meng
meng at cgl.ucsf.edu
Fri Dec 10 16:35:45 PST 2021
On Dec 10, 2021, at 3:46 PM, Ralph Loring <rhloring at gmail.com> wrote:
> Hi,
> I don't know if there's a special user group for Matchmaker but this also involves other aspects of Chimera.
> 1) I'm looking at AlphaFold structures of RIC-3, a chaperone for nicotinic receptors that is also an intrinsically disordered protein. There are pdbs for six species in the AlphaFold database: Human, mouse rat, fish, fly and worm. These differ tremendously in terms of sequence, size and predicted secondary structures but they all have a putative signal sequence and a transmembrane domain. I used Matchmaker to align the human pdb to each of the others. The results of combining all 6 looks like a bowl of spaghetti until the parts before the putative signal sequence and the C-terminal tails are deleted. Then I can see that the putative signal sequences and the transmembrane domains are in the same plane as if they are in a membrane. I just want to confirm that this is merely a lucky coincidence and that Matchmaker has no preference for aligning hydrophobic alpha helices into a planar space corresponding to a membrane. The coincidence, if it is one, is striking and I just want to make sure. Note that I have a similar question for AlphaFold, the source of the pdbs.
> 2) Does Chimera do Global Distance Testing (GDT)? For that matter, does ChimeraX (I've been thinking about switching and if it does, this may be what it takes to get over the activation energy barrier)? I can't find GDT in any of the directions. When Matchmaker runs, for instance comparing human and fly RIC-3, the sequence alignment score is only 270 but the RMSD is 1.6 angstroms (but only for 7 pruned atom pairs and when all 233 atom pairs are considered, the RMSD is 55 angstroms, so clearly not much overlap). I've heard that GDT corrects somewhat better than RMSD for not having precise superimposition. However, I don't know that it would make a lot of difference in this case but I'd like to find out. The mystery is that drosophila RIC-3 works almost as well as human RIC-3 in helping assemble human pentameric nicotinic receptors even though the structures are so different.
> Thanks for your help,
> Ralph Loring
Hi Ralph,
You got the right address, at least if you're using Chimera (not ChimeraX).
If a structure is intrinsically disordered, then AlphaFold prediction is probably very low-confidence in those regions. So it will not be very meaningful to include that disordered part in any structural superposition. Did you get the predictions from the AlphaFold database? ChimeraX (not Chimera) can automatically fetch the structures from the AlphaFold database and show the confidence of each part with coloring, where orange to red means very low confidence.
Here is the ChimeraX AlphaFold documentation:
<https://rbvi.ucsf.edu/chimerax/docs/user/tools/alphafold.html>
Both Chimera and ChimeraX have matchmaker tools and commands. MatchMaker fitting is rigid. If you have multidomain proteins with flexible linkers between them, it is highly unlikely you could superimpose all the domains well at the same time. Flexible linker regions in AlphaFold predictions also tend to have very low confidence values.
The matchmaker algorithm is fully described in the help (shown if you click the Help button on the dialog or use Chimera command "help matchmaker") and a paper cited therein also describes how it was tested.
<https://www.rbvi.ucsf.edu/chimera/docs/ContributedSoftware/matchmaker/matchmaker.html>
Tools for integrated sequence-structure analysis with UCSF Chimera. Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339.
<https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-339>
However, I'll try to answer your questions which requires describing the method somewhat here.
For each superposition, Matchmaker simply makes a sequence alignment and then does a least-squares fit of the alpha-carbons of residues in the same column of the sequence alignment. By default it also does an iterative pruning, so that spatially far-apart pairs are eliminated and you get a tighter fit on the pairs that are left. That way you may get a better superposition of the spatially conserved core while floppy parts can go their own ways. I never heard of GDT before, but as I understand it, Matchmaker does not use it. It doesn't try to put hydrophobic stuff in a membrane-like area, but a sequence alignment generally scores better when aligning like or similar types of residues in the same columns as each other.
There are lots of user-adjustable parameters in every step of the algorithm. We chose defaults based on our tests detailed in the paper, but one can always find cases where different values work better, and you could try changing them.
NOTE: If the results look like spaghetti and/or only very few pairs are used in the final fit, the result may not be meaningful. I wouldn't try to read much into it if the structural match looks poor.
Of course, if two structures don't share domains that have both the same fold and sufficient sequence similarity for the sequence alignment step to work, you won't get a useful result.
A few more points:
(1) the sequence alignment uses two kinds of scores added together, one based on amino acid type (your choice of several standard amino acid similarity matrices) and another based on secondary structure (i.e. better score for aligning helix with helix and strand with strand compared to other combinations of helix, strand, and coil). You can adjust the relative weights of these two terms (even making the weight of one term zero so that only the other is used), as well as choose the amino acid similarity matrix, and change the values in the secondary-structure scoring matrix to whatever you like, as well as control the gap penalties. There is an option to show each sequence alignment, which may help in assessing whether the results mean anything.
(2) you can turn off iteration and simply fit on the whole sequence alignment if you want. It may or may not help given your specific structures. Probably not if these are multidomain or intrinsically disordered.
(3) each least-square fit is pairwise between 2 structures only. However, you can get multiple structures superimposed by using a consistent reference structure. E.g. if you have structures A-F, then use the same one, say D, as the reference that each of the other ones is matched to in a pairwise fit.
I hope this helps,
Elaine
-----
Elaine C. Meng, Ph.D.
UCSF Chimera(X) team
Department of Pharmaceutical Chemistry
University of California, San Francisco
More information about the Chimera-users
mailing list