Alignments Tutorial

In this tutorial, MatchMaker is used to align protein structures (create a superposition) and Match -> Align is used to generate a multiple sequence alignment from the structural superposition. Sequence alignments are displayed in Multalign Viewer, which is covered in more detail in the Sequences and Structures tutorial.

On Windows/Mac, click the chimera icon; on UNIX, start Chimera from the system prompt:

unix: chimera

A basic Chimera window should appear after a few seconds; resize it as desired. Open the Command Line (choosing Tools... General Controls... Command Line is one way).

Choose Favorites... Add to Favorites/Toolbar to place some icons on the toolbar. This opens the Tools section of the preferences, which recapitulates Chimera's Tools menu. In the On Toolbar column, check the boxes for:

toolbar icons

Side View (under Viewing Controls)
MatchMaker (Structure Comparison)
Match -> Align (Structure Comparison)

and if desired, additional tools. If you would like the settings to apply to subsequent uses of Chimera, click Save before closing the preferences. Toolbar location/orientation can be controlled in the General preferences.

In this tutorial, we will align the structures of three distantly related glycoside hydrolases, also used in the Images for Publication tutorial (see scientific background). If you have internet connectivity, the structures can be obtained directly from the Protein Data Bank:

Command: open 1uyp Command: open 1gyd Command: open 1oyg

If you do not have internet connectivity, instead download the files included with this tutorial 1uyp.pdb, 1gyd.pdb, and 1oyg.pdb into your working directory and then open them in that order as local files (for example, with File... Open).

Move and scale the structures with the mouse in the graphics window and the Side View as desired throughout the tutorial.

Some salient features of the structures:

PDB ID enzyme family chains conserved residues
1uyp invertase,
T. maritima GH32 A-F Asp17 Asp138 Glu190
1gyd arabinanase A,
C. japonicus GH43 B Asp38 Asp158 Glu221
1oyg levansucrase,
B. subtilis GH68 A Asp86 Asp247 Glu342

PDB ID	enzyme	family	chains	conserved residues
1uyp	invertase, T. maritima	GH32	A-F	Asp17	Asp138	Glu190
1gyd	arabinanase A, C. japonicus	GH43	B	Asp38	Asp158	Glu221
1oyg	levansucrase, B. subtilis	GH68	A	Asp86	Asp247	Glu342

Simplify the display by deleting unwanted (for our purposes) portions of the structures such as extra chains and solvent, and then displaying just the alpha-carbon traces.

Command: delete #0:.b-f Command: delete solvent | ions Command: chain @ca Command: linewidth 2

Superimposing the structures will allow examination of their commonalities and differences. The residue numbers listed above (obtained by reading the literature) could be used with the match command to specify atoms to use in a least-squares fit. However, such information is frequently not handy when one has a set of different but related structures to compare.

MatchMaker addresses this situation by first constructing a sequence alignment and then using the alpha-carbons of aligned residue pairs to match the structures. Click the icon for MatchMaker. Lacking specific knowledge of which structure might be the best Reference structure, just use the first one opened, 1uyp. The options for Chain pairing are all equivalent, because at this point, each structure has just one chain.

A variety of parameters control the sequence alignment step:

alignment algorithm: global (Needleman-Wunsch, default) or local (Smith-Waterman)
alignment scoring:
- what substitution matrix should be used in the residue similarity term (default BLOSUM-62)
- whether the score should include a secondary structure term (default: yes, with 30% weighting of this term and thus 70% weighting of the residue similarity term)
- gap penalties

Click Apply to use the default settings. The two pairwise sequence alignments will appear, and then the structures will be matched using those alignments. In the sequence alignments, orange boxes show the residues used in the final iteration of structure fitting. Keep the sequence alignments for now.

Refocus the view; show and select key conserved residues to help evaluate the fit:

Command: focus Command: alias key #0:17,138,190 #1:38,158,221 #2:86,247,342 Command: rep sphere key Command: select key

structures superimposed correctly

The atom specification in the aliasing command should not contain spaces except (optionally) before the model symbols #. As explained elsewhere, chain specifications are not required; although :17.a could be used, :17 is sufficient.

The selection is highlighted (by default, with green outlines) on the structures and sequence alignments. The green boxes around the selected residues in the sequence alignments make it easy to see if they have been paired correctly. The correct pairings are indicated by columns in the table above. Mousing over a residue in a sequence alignment shows the corresponding structure residue number at the bottom of the sequence window.

If the selection is accidentally changed or erased, just select the key residues again:

Command: select key

The green and orange boxes show that all three key residues are paired correctly in the sequence alignment of 1uyp (white) and 1gyd (magenta) and used in the final fit iteration, while only the second two are paired correctly and used in the final fit of 1uyp and 1oyg (cyan). The structural superposition is still essentially correct (see the figure) because MatchMaker's option to Iterate by pruning excluded misaligned or poorly superimposed parts of the structures from the final fits. Match statistics are given in the Reply Log (Tools... Utilities... Reply Log). Quit from each pairwise alignment.

Use MatchMaker again, but uncheck Include secondary structure score (leaving other settings as their defaults) and click Apply. This uses a purely standard approach to sequence alignment: the Needleman-Wunsch algorithm (dynamic programming) with BLOSUM-62 scoring.

The boxes in the resulting pairwise alignments show that none of the key residues were aligned correctly or used in the final fits. The alignment failure is understandable because the sequences are not very similar to one another. Thus, structural information in addition to sequence information is helpful for superimposing this set of structures. Quit from each pairwise alignment.

The BLOSUM-62 matrix was used to score residue similarity in the preceding calculations. Using a BLOSUM matrix intended for more distantly related structures might be expected to improve the results. However, if secondary structure scoring is not used, BLOSUM-30 only superimposes one of the match structures (1oyg) correctly onto the reference structure, 1uyp. (Note it may be necessary to select the key residues again to bring the green boxes in front of the orange boxes where they overlap.) When combined with 30% secondary structure scoring, BLOSUM-30 and BLOSUM-62 are about equally successful:

scoring parameters^a	# of pairs, RMSD (Å)^b		key residue positions aligned correctly
scoring parameters^a	1uyp vs. 1gyd	1uyp vs. 1oyg	1uyp vs. 1gyd	1uyp vs. 1oyg
BLOSUM-62, 30% secondary structure (MatchMaker defaults)	71, 1.164	76, 1.056	3	2
BLOSUM-62, no secondary structure	6, 1.276	7, 1.217	0	0
BLOSUM-30, no secondary structure	49, 1.259	33, 0.805	0	1
BLOSUM-30, 30% secondary structure	65, 1.192	78, 1.059	3	2

^aother parameters defaults
^bafter fit iteration with cutoff 2.0 Å

Of the pairwise superpositions in this table, only those with at least one key residue position aligned correctly in the sequence alignment are correct by visual inspection.

Before proceeding with the tutorial, make sure all three structures are superimposed in a reasonably correct way, such as in the figure. Cancel the MatchMaker dialog and Quit from any existing sequence alignments.

Whereas MatchMaker produces only pairwise sequence alignments, Match -> Align can construct a multiple sequence alignment from a superposition of structures. It does not matter how a superposition was created, and only the distances between alpha-carbons are used, not residue types. Click the icon for Match -> Align and click OK to use the default settings. With these parameters, only sets of residues whose alpha-carbons are within 5.0 Å of each other can be placed in the same column of the output sequence alignment.

The calculation may take several seconds. When the new sequence alignment appears, check it to see if the key residues have been aligned correctly. Because fit iteration in MatchMaker improves superpositions by excluding wrong portions of the initial pairwise alignments, sequence alignments subsequently created with Match -> Align generally have more positions aligned correctly than the initial alignments.

The sequence alignment can be saved to a file by choosing File... Save As from the Multalign Viewer (alignment window) menu.

Also in Multalign Viewer:

Use Tools... Percent Identity to compute pairwise sequence identities (<20% for these glycoside hydrolases).
Choose Structure... Select by Conservation and move the sliders to select only the residues that all three have in common (100% mavPercentConserved), then click OK.

Display these residues:

Command: disp sel Command: rep bs sel Command: ~sel

When finished, end the Chimera session:

Command: stop

meng@cgl.ucsf.edu / January 2008