MatchMaker
MatchMaker superimposes protein or nucleic acid structures
by first creating pairwise sequence alignments,
then fitting the aligned residue pairs.
Residue types and/or secondary structure information
can be used to create the initial sequence alignments.
Fitting uses one point per residue.
Optionally, a structure-based multiple sequence alignment can be computed
after the structures have been superimposed.
Note: if it is already known which residue numbers in one structure
should be paired with which residue numbers in the other,
another possibility is to use the command
match.
See superimposing
structures for a discussion of the different methods available in Chimera.
See also:
Match -> Align,
Multalign
Viewer,
the Superpositions and
Alignments tutorial, and
Tools for integrated sequence-structure analysis with UCSF Chimera.
Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE.
BMC Bioinformatics. 2006 Jul 12;7:339.
There are several ways to start
MatchMaker, a tool in the Structure Comparison category.
MatchMaker is also implemented as the command
mmaker
(or matchmaker).
The MatchMaker dialog is organized by the main steps to be performed:
- generating pairwise sequence alignments
- matching, i.e.,
superimposing the structures according to those pairwise alignments
- optionally, creating a multiple sequence alignment
from the structural superposition
Save settings writes the current MatchMaker parameters to the
preferences
file. Reset to defaults resets the dialog to the factory default
parameter settings without changing any preferences.
Clicking OK or Apply will start the calculations
with or without closing the dialog, respectively.
Sequence alignment scores, parameter values, and structure RMSDs will be
reported in the Reply Log.
Cancel simply closes the dialog, while Help opens
this manual page in a browser window.
Chain pairing options:
- Best-aligning pair of chains between reference and match structure
(default)
- One reference structure and one or more structures to match
should be chosen. For each structure to be matched, the
reference-match pair of chains with the highest
sequence alignment score will be used.
- Specific chain in reference structure
with best-aligning chain in match structure
- One reference chain and one or more structures to match should be chosen.
Individual lines or blocks of lines can be chosen with the left
mouse button; Ctrl-click toggles the status of a line.
For each structure to be matched,
the chain that aligns to the reference chain with the highest
sequence alignment score will be used.
- Specific chain(s) in reference structure
with specific chain(s) in match structure
- One or more reference chains should be chosen from the list.
For each reference chain chosen, one chain to be matched should
be chosen from the corresponding pulldown menu. If multiple chains are to
be matched to the same reference chain, it is necessary to match them
in separate steps (by choosing the chain to match and then clicking
Apply). A given chain cannot be matched to two different
reference chains simultaneously, and chains from the same structure
(molecule model)
cannot simultaneously serve as a reference chain and a chain to match.
Alignment algorithm:
- Needleman-Wunsch (default) - global
- Smith-Waterman - local
Sequence alignment scoring can include a residue similarity term,
a secondary structure term, and gap penalties.
- Matrix (default BLOSUM-62) - what
substitution matrix
to use for the residue similarity part of the score.
If an amino acid matrix is chosen, only peptide sequences
will be aligned; if a nucleic acid matrix is chosen, only
nucleic acid sequences will be aligned. An error message will appear
if there are no reference-match pairs of the appropriate type.
- Gap penalties: When secondary structure scoring
is included, the secondary-structure-specific Gap opening penalties
(Intra-helix, Intra-strand, Any other)
are used instead of the single Gap opening penalty.
The same Gap extension penalty is used, however.
- Include secondary structure score (N%)
(default on and N=30)
- whether to include a secondary structure term in the score,
and at what weight relative to the residue similarity term.
Show parameters reveals the secondary structure scoring parameters.
N reflects the relative weights of the terms,
which can be adjusted by moving the slider.
If the weight is 30%, for example,
total score = 0.70(residue similarity score) + 0.30(secondary structure score)
– gap penalties
Setting the weight to 0% is not the same
as turning the option off, however.
The values in the secondary structure Scoring matrix
(for all pairwise combinations of H helix, S strand, and
O other) and the secondary-structure-specific
Gap opening penalties can be adjusted.
Reset secondary structure scoring parameters to defaults
can be used to restore the default values of all secondary structure
scoring parameters.
- Compute secondary structure assignments
(available when
secondary structure scoring is used; default on)
- whether to first identify helices and strands by running the
ksdssp algorithm,
overwriting any pre-existing secondary structure assignments.
A reason to use this option despite existing assignments
is that the use of consistent criteria tends to improve the results.
Pre-existing secondary structure assignments may have been determined
using different methods or different parameters for different structures.
Ksdssp
parameter defaults can be adjusted with the
compute SS
dialog (opened from the
Model Panel).
- Show pairwise alignment(s) (default off)
- whether to display the resulting pairwise
reference-match sequence alignments; each will be shown in a separate
Multalign Viewer window.
When fit iteration is employed,
the pairs used in the final fit will be shown in the alignment as a
region
(colored boxes) named matched residues. The
header
named RMSD is automatically shown and other headers hidden.
This line shows the spatial variation among residues associated with
a column. In the pairwise case, the value is simply the distance between
atoms in the two residues associated with a column.
*These pairwise sequence alignments can be considered a by-product
of superposition.
Successful superposition only requires these alignments to be partly
correct, as incorrect portions tend to be omitted during
fit iteration.
If the sequences are easy to align (highly similar),
the sequence alignments are likely to be correct throughout.
However, if the sequences are more distantly related,
parts of the alignments may be incorrect even when a successful
superposition is produced. In those cases, a
structure-based alignment should be superior.
Fitting uses one point per residue: CA atoms in amino acids and
C4' atoms in nucleic acids.
If a nucleic acid residue lacks a C4' atom
(some lower-resolution structures are P traces),
its P atom will be paired with the P atom of the aligned residue.
Iterate by pruning long atom pairs until no pair exceeds [x]
angstroms
(default on and x=2.0)
- whether to iteratively remove far-apart residue pairs from
the "match list" used to superimpose the structures. This does not
change the initial sequence alignment, but restricts which columns of
that alignment will be used in the final fit.
Otherwise, all of the columns containing both sequences
(i.e. without a gap) will be used. In each cycle of iteration,
atom pairs are removed from the match list and the remaining
pairs are fitted, until no matched pair is
more than x Å apart.
The atom pairs removed are either the 10% farthest apart of all pairs
or the 50% farthest apart of all pairs exceeding the cutoff, whichever
is the lesser number of pairs.
Iteration tends to exclude sequence-aligned but conformationally dissimilar
regions such as flexible loops, allowing a tighter fit of the
best-matching "core" regions.
Regardless of which chain(s) in a model to be matched are
aligned in sequence to the reference, the entire model will be reoriented.
If MatchMaker is used simply to superimpose structures,
this step can be omitted.
However, if one also wants a corresponding structure-based
sequence alignment, this step is recommended,
especially if the sequences are dissimilar.
After superposition, compute structure-based
multiple sequence alignment (default off)
- call Match -> Align
to generate a sequence alignment consistent with the superposition.
If not called with this option,
Match -> Align
can still be started later independently.
Calculating a structure-based alignment can take several minutes,
depending on the number of structures, but there are advantages:
- it can provide better statistics for describing structural similarity
(RMSD, etc.) because more alignment columns are correct
- it can produce a multiple sequence alignment, whereas
the initial sequence alignments are only pairwise
The output sequence alignment is automatically shown in
Multalign Viewer and can be
saved to a file
from that tool. The fully populated columns are highlighted as a
region
(colored boxes). Clicking the region will select the corresponding parts
of the structures, in effect their common cores.
The header
named RMSD shows the spatial variation per column.
Notes
Meaning of 0% secondary structure score.
Turning off Include secondary structure score
is not the same as moving the slider to zero with the option turned on.
When the option is on:
- The secondary-structure-specific gap opening penalties are used
regardless of the slider position.
- If Compute secondary structure assignments
is also turned on,
ksdssp is run and
may change pre-existing secondary structure assignments.
UCSF Computer Graphics Laboratory / July 2009