Visualizating Deep Mutational Scan Scores in ChimeraX

Tom Goddard
June 10, 2024

Here are some commands added to ChimeraX to visualize deep mutational scan data. This data consists of scores obtained from experimental assays for almost all possible mutations of every residue in a protein. The ideas here were developed with Willow Coyote-Maestas, Matt Howard, and Aashish Manglik at UCSF. We show the data by coloring atomic models, labeling residues, and showing histograms, scatter plots, and umap projections.

Example

We will look at deep mutational scan data for a proton-sensing by G protein-coupled receptor GPR68 described in this bioRxiv preprint

Molecular basis of proton-sensing by G protein-coupled receptors
Matthew K. Howard, Nicholas Hoppe, Xi-Ping Huang, Christian B. Macdonald, Eshan Mehrota, Patrick Rockefeller Grimes, Adam Zahm, Donovan D. Trinidad, Justin English, Willow Coyote-Maestas, Aashish Manglik

Here are the scores gpr68_scores_processed.csv as a comma-separated values file and an alphafold predicted structure for the protein ClassA_ogr1_human_Active_AF_2022-08-16_GPCRdb.pdb. The first few lines of the scores showing a header of field names and lines for 3 mutations of residue alanine 122 look like this

hgvs,SE_ph55,epsilon_ph55,score_ph55,SE_ph65,epsilon_ph65,score_ph65,SE_surface,epsilon_surface,score_surface,pos,len,mutation_type,variants,is.wt
p.(A122A),0.118655667,6.98E-21,-0.153383837,0.454753988,1.11E-16,0.338088558,0.080569858,1.02E-75,-0.335911984,122,1,S,A,TRUE
p.(A122C),0.381316703,5.40E-19,-1.801114,0.919180067,2.22E-16,-0.190261006,1.127075425,4.47E-45,0.50327328,122,1,M,C,FALSE
p.(A122D),0.220801473,3.72E-50,-1.445750881,0.482410836,1.11E-16,-0.053051142,0.25624828,6.47E-70,-2.255614715,122,1,M,D,FALSE
...
    

Opening scores in ChimeraX

Open the atomic model ClassA_ogr1_human_Active_AF_2022-08-16_GPCRdb.pdb using the ChimeraX menu entry File / Open.... Then open the scores file gpr68_scores_processed.csv. This requires a ChimeraX daily builds from June 11, 2024 or newer (it is not in ChimeraX 1.8). The ChimeraX Log will show

   Opened deep mutational scan data for 7128 mutations of 364 residues,
   assigned to 364 of 365 residues of chain /A,
   score column names score_ph55, score_ph65, score_surface.

Histogram of mutation scores

This data has 3 assays assessing GPCR function at pH 5.5 (score_ph55), at pH 6.5 (score_ph65) and how much of the protein makes it to the membrane (score_surface). To show a histogram of score_ph55 values use ChimeraX command

      dms histogram /A column score_ph55
    

The /A indicates chain A of the atomic model. This atomic model has only one chain.

Coloring the atomic model by mutation scores

We can look at where in the protein mutations increase GPCR activity and where mutations decrease activity. The following command computes a gain in activity score for each residue by summing the scores for mutations of that residue with activity score > 1.5.

      dms attribute /A column score_ph55 type sum above 1.5 name ph55_gain
    

Then using the Render by Attribute panel (menu Tools / Depiction / Render or Select by Attribute) we can color blue the residues that have mutations that cause gain in activity. The brown residues have keep their color because no mutations had score > 1.5. Likewise we can compute a loss of activity score for each residue by summing the scores for mutations activity score < -1.5 and render them as red.

      dms attribute /A column score_ph55 type sum below -1.5 name ph55_loss
    

Surprisingly the residues that have mutations that gain a lot of activity and those that have mutations that lose a lot of activity are never the same residues as we can see by trying to select the residues that have high total gain and high total loss scores.

	select ::ph55_gain>=4 & ::ph55_loss<=-10
	Nothing selected

	select ::ph55_gain>=4
	34 residues

	select ::ph55_loss<=-10
	52 residues
      

Coloring by alanine mutation scores

Similarly we can color all residues according to the score when mutated to alanine.

      dms attribute /A column score_ph55 type ala name ala_score
      color byattribute r:ala_score palette -2.0,red:0,white:2.0,blue
    

Using the dms attribute command with with any of the standard twenty 3-letter amino acid codes was added to ChimeraX daily builds dated July 3, 2024 and later.

Labeling residues

To show the mutation scores on the atomic model we can create a label for each residue with 20 colored squares showing the activity change for the 20 possible amino acids at that position with red indicating loss of activity and blue gain of activity. Residues H269, H20, E174 and Y102 are known to be important in GPR68 activity. To show labels and side chain atoms for the residues within 3 Angstroms of the two histidines use ChimeraX command:

      dms label /A:269,20 :< 3 column score_ph55 range -4,4
      show /A:269,20 :< 3 atoms
    

Hide the labels with

      label delete
    

Scatter plots

To compare the pH 5.5 activity to the pH 6.5 activity of the mutations we can make a scatter plot where each point in the plot corresponds to a specific mutation of a specific residue.

      dms scatterplot /A xcolumn score_ph55 ycolumn score_ph65 correlation true
    

If there was no difference in activity then all the points would lie exactly on a diagonal line. As seen in the plot the pH 6.5 and pH 5.5 activity are only weakly correlated. The "correlation true" option shows a least squares fit of the points, and the Log reports an R-squared value close to 0.

      Plotted 7073 mutations in chain /A with
      score_ph55 on x-axis and score_ph65 on y-axis,
      least squares fit slope 0.225, intercept 0.0311, R squared 0.0564      
    

Clicking on any point in the plot brings up a menu identifying the mutation and allowing coloring, selecting (green outline), or zooming on the residue in the atomic structure. For example mutation N104V has very high activity at pH 6.5 (score 8), but modest activity at pH 5.5 (score 1).

Mean and standard deviation

The deep mutational scan data contains synonymous mutations where the DNA triplet changed but the amino acid remains the same. To show a histogram of just the synonymous mutation scores use:

      dms histogram /A column score_ph55 type synonymous
    

To measure the mean and standard deviation of the synonymous scores and all mutation scores use

      dms statistics /A column score_ph55 type synonymous

        Column score_ph55, 234 synonymous mutations,
        mean = 0.0857, standard deviation = 0.618,
        mean -/+ 2*SD = -1.15 to 1.32

      dms statistics /A column score_ph55 type all

        Column score_ph55, 7092 all mutations,
        mean = -0.298, standard deviation = 1,
        mean -/+ 2*SD = -2.3 to 1.7
    

Non-synonymous mutations tend to decrease GPR68 activity when compared to synonymous mutations.

Mutation activity patterns with UMAP

Are there patterns in how the activity varies across the 20 mutations of a residue? For instance changing a hydrophobic residue to a hydrophillic one may more often than not reduce activity. To look for such patterns we can think of the activity scores for a residue as a 20 element vector and project it to two dimensions using UMAP, uniform manifold approximation and projection, to see if there are clusters representing patterns of activity variation.

      dms umap /A column score_ph55

            211 of 364 have 20 mutations
    

The command currently colors the 20 different amino acids in 20 randomly chosen colors to help see if a specific amino acid type clusters in the plot.

Clusters are not evident. The GPR68 deep scan data has all 20 mutations for about 60% of the residues so only those 211 residues are plotted. It is possible that there are many patterns (e.g. 20 amino acid types times 5 patterns of activity variation per amino acid type = 100 different patterns) so we don't have enough data to reveal the clusters.

Subtracting correlated scores

In the GPR68 cell assays reduced expression on the membrane of the protein reduces the pH 5.5 and 6.5 scores. We could try to correct for this by making a least squares linear fit of score_ph55 versus score_surface and assuming that line estimates the change in activity due to surface expression level we can subtract that estimate from the pH 5.5. Here is how to show a histogram with that subtraction for the synonymous mutations.

      dms histogram /A column score_ph55 type synonymous subtractFit score_surface
    

It might be reasonable to suspect that the synonymous mutation activity variation is due to variation of surface expression levels.

      dms statistics /A column score_ph55 type synonymous subtractFit score_surface

        Column score_ph55, 234 synonymous mutations,
        mean = 0.164, standard deviation = 0.459,
        mean -/+ 2*SD = -0.754 to 1.08
    

We see the standard deviation of the corrected activity scores is 0.459 which is less than for the uncorrected activity 0.618 (shown above in the Mean and standard deviation section).

Command syntax

To see all the options of the dms (deep mutational scan) commands use the "usage" command:

      usage dms

        Subcommands are:
          dms attribute
          dms histogram
          dms label
          dms scatterplot
          dms statistics
          dms umap
      
      usage dms histogram

        dms histogram chain columnName a text string
                         [subtractFit a text string] [bins an integer]
                         [curve true or false] [smoothWidth a number]
                         [type type] [above a number] [below a number]
                         [replace true or false]
          - Show histogram of deep mutational scan scores
          type: one of all_mutations, sum, sum_absolute, or synonymous
    

When running commands they can be abbreviated, for example

      dms hist /A col score_ph55 type syn sub score_surface
    

Column names currently have to fully spelled out in the commands.

What next?

The dms commands are a very preliminary effort to allow looking at deep mutational scan data together with atomic models. So there are countless additional things to try.

  1. Select plot regions. Would be nice to drag a box on scatter plots or histograms to select all residues on the structure.
  2. Show atomic structure selections in plots. If a residues in the atomic model are selected (e.g. near active site) it would be nice to see those outlined in green on the scatter plots.
  3. Context menu on histograms. Want to be able to click on histogram bars to get a menu to show, color, select the residues involved on the atomic model.
  4. Plot mutations for specific amino acid types. Would like to be able to show scatter plots and histograms for specific amino acid types.
  5. Use color on scatter plots and histograms. Could color scatter plot points to show amino acid types. Or could show specific mutations, e.g. aromatic to non-aromatic. Likewise could make each histogram bar a stack of colored segments.
  6. Error bars. Would be interesting to see scatter plots of replicas of deep scan data to see how big the error bars should be on the scores.