Virus Capsid Poster

Poster for NCRR Principle Investigator meeting June, 2004

New Approaches for Visualizing Virus Capsids

Tom Goddard, Conrad Huang, Thomas Ferrin

Resource for Biocomputing, Visualization and Informatics
University of California, San Francisco

Introduction

We are developing software for interactive visualization of virus capsids and other large molecular assemblies. Here we describe capabilities for visualizing large assemblies that are beyond what current molecular graphics programs provide. We also discuss data quality issues with virus structures that we are working with the Protein Data Bank to remedy. And we present approaches to making virus assemblies more user accessible via a variety of web technologies.

Current molecular graphics packages for analyzing individual macromolecules or small complexes perform well with up to a few hundred thousand atoms. Large virus capsids contain millions of atoms and run into memory and CPU speed limitations on desktop computers. Some optimizations allow responsive interactive visualization on such systems. In addition to optimizations, new capabilities such as cartoon representations for macromolecules and facilities for naming and navigating quaternary structure are basic ingredients for visualizing large assemblies.


Figure 1. Bluetongue virus capsid with viral RNA stuck to outside of capsid. The capsid has 2 protein layers with half of the outer layer being cutaway in the picture. Atomic interactions between the capsid and RNA are shown. Residues involved in contacts of less than 5 angstroms are shown in black. These are PDB models 2btv and 1h1k.


Software Optimizations

Figure 1 shows an atomic resolution virus capsid structure containing 3 million atoms. Current molecular graphics programs use about 1 to 2 kilobytes of memory per atom. Building 3 gigabytes of data structures to open this model in a visualization program would take minutes. To avoid this high overhead we have developed code that only loads the atoms needed for atomic resolution display styles. Low resolution surfaces representing entire protein subunits can be shown using atomic information for just one copy of a multimeric subunit. This allows the structure shown to be loaded using modest amounts of memory (90,000 atoms for the unique subunits) in several seconds.

All contacts between the RNA (20,000 atoms) and capsid (3,000,000 atoms) were computed with a maximum range of 5 angstroms. A naive calculation of distances for all 60 billion atom pairs would take a few minutes on current desktop computers. A more sophisticated algorithm that places all atoms in 5 angstrom bins and limits comparisons to atoms in adjoining bins can be done in seconds.

These optimizations contribute to complexity in writing and maintaining the software and adding subsequent features. The calculation of atomic contacts has to account for the fact that not all atoms in the structure have been loaded into memory. It knows to use copies of atoms from structurally identical subunits to avoid loading atoms unnecessarily. These optimizations are relatively simple requiring a few thousand lines of code beyond the simple techniques used in molecular graphics programs aimed at small molecular systems.


Figure 2. Satellite tobacco mosaic virus capsid with about 60% of viral single strand RNA genome. A single capsid protein makes contacts with many neighboring capsid proteins and several resolved segments of RNA.


New Visualization Features

Visualizing virus capsids requires capabilities not found in current molecular graphics software. The virus shown in figure 1 is studded with 780 copies of the outer capsid protein. Showing any substructure within these proteins provides an overwhelming level of detail when viewing the whole capsid. We have implemented a variable resolution surface depiction for sets of atoms that produces low resolution (e.g. 10 angstrom) cartoon depictions. Molecular surfaces used for studying individual macromolecules have much higher resolution and could not be used to render the virus in figure 1 at an interactive frame rate on even the fastest graphics hardware.

Most available virus capsid structures are icosahedral having an exact 60-fold symmetry. Sixty copies of the same atomic coordinates are used to form the full shell. The positions of the copies are specified using sixty 3 by 4 matrices each expressing a rotation and translation. Code to read these matrices and generate the multimeric structure is a basic requirement for visualizing large multimeric structures.

The virus capsid in figure 1 has two protein layers, the outer layer is made out of 13 slight structural variants of a single protein grouped in trimers, while the inner layer consists of two structural forms of another protein. Our software organizes the structure levels in a tree. The root of the tree is the full virus particle which has two children which are the inner and outer protein layers, the inner layer has children which are the 260 trimers and the leaves of the tree are the protein monomers.

The cutaway view in figure 1 was made by selecting the outer layer and then slicing it in half by dragging a box with the mouse. To select the outer layer a single protein is selected with the mouse and then promoted by pressing a button to select the containing trimer and again to select the outer layer. Without the hierarchical organization, selecting hundreds of individual proteins with a mouse would be extremely cumbersome. Another useful selection method is to promote a selection to all copies of currently selected subunits.


Figure 3. Ribgrass virus is helical. All available atomic resolution virus capsid structures are icosahedral or helical.

Figure 4. Sindbis virus has a lipid bilayer sandwiched between an inner protein nucleocapsid and outer glycoprotein layer.


Making Virus Structures Accessible

Our software for virus visualization has just basic capabilities, yet to our knowledge it is far more capable than any other distributed program. Because few programs can display virus capsids, the matrix data contained in the Protein Data Bank virus entries has not been extensively used. About half of the 200 virus entries do not describe the matrices needed to produce a full particle in a machine readable and correct form. We are working with the Protein Data Bank to identify the problem files and correct them.

It is difficult for researchers to explore the existing rich data set of over 200 virus structures (figures 2-5). Although our software, UCSF Chimera, has the basic virus display functionality, it requires reading extensive documentation to use it effectively. To make Chimera more accessible to researchers we are working with the Multiscale Modeling Tools for Structural Biology NCRR resource. They develop the Virus Particle Explorer (ViPER) web site which presents curated structure data for about 100 icosahedral viruses. We are trying three web technologies to let users explore this data beyond providing a fixed set of images: offering Virtual Reality Markup Language three dimensional models created by Chimera and viewable with a web browser plugin, launching our Chimera program directly from web browser links to show virus structures, and running Chimera on a web-server to render custom requested virus images in response to user settings specified in a browser.


Satellite Panicum Mosaic Virus, 1stm
Satellite Panicum Mosaic Virus, 1stm
Satellite Tobacco Mosaic Virus, 1a34
Satellite Tobacco Mosaic Virus, 1a34
Satellite Tobacco Necrosis Virus, 2stv
Satellite Tobacco Necrosis Virus, 2stv
Alfalfa Mosaic Virus, 1amv
Alfalfa Mosaic Virus, 1amv
ccmv_t2_672919
Cowpea Chlorotic Mottle Virus, ccmv_t2_672919
Densovirus, 1dnv
Densovirus, 1dnv
Bacteriophage Fr, 1fr5
Bacteriophage Fr, 1fr5
Bacteriophage FR, 1frs
Bacteriophage FR, 1frs
Bacteriophage MS2, 2ms2
Bacteriophage MS2, 2ms2
Canine Parvovirus Strain D, 4dpv
Canine Parvovirus Strain D, 4dpv
Bacteriophage ms2, 1aq3
Bacteriophage ms2, 1aq3
Bacteriophage ms2, 1bms
Bacteriophage ms2, 1bms
Feline Panleukopenia Virus Empty Capsid, 1c8e
Feline Panleukopenia Virus Empty Capsid, 1c8e
Bacteriophage GA, 1gav
Bacteriophage GA, 1gav
Porcine Parvovirus Capsid, 1k3v
Porcine Parvovirus Capsid, 1k3v
Bacteriophage Ms2, 1mst
Bacteriophage Ms2, 1mst
Feline Panleukopenia Virus, 1fpv
Feline Panleukopenia Virus, 1fpv
Brome Mosaic Virus, 1js9
Brome Mosaic Virus, 1js9
Murine Minute Virus, 1mvm
Murine Minute Virus, 1mvm
Canine Parvovirus (CPV) Empty, 2cas
Canine Parvovirus (CPV) Empty, 2cas
Cowpea Chlorotic Mottle Virus, 1cwp
Cowpea Chlorotic Mottle Virus, 1cwp
Bacteriophage Qb, 1qbe
Bacteriophage Qb, 1qbe
Tobacco Necrosis Virus, 1tnv
Tobacco Necrosis Virus, 1tnv
Adeno-Associated Virus, 1lp3
Adeno-Associated Virus, 1lp3
Tobacco Necrosis Virus, 1c8n
Tobacco Necrosis Virus, 1c8n
Tobacco Ringspot Virus, 1a6c
Tobacco Ringspot Virus, 1a6c
Bacteriophage PP7, 1dwn
Bacteriophage PP7, 1dwn
Red Clover Mottle Virus, 1rcmv
Red Clover Mottle Virus, 1rcmv
Sesbania Mosaic Virus, 1smv
Sesbania Mosaic Virus, 1smv
Cucumber Mosaic Virus, 1f15
Cucumber Mosaic Virus, 1f15
Rice Yellow Mottle Virus, 1f2n
Rice Yellow Mottle Virus, 1f2n
Cowpea Mosaic Virus, 1ny7
Cowpea Mosaic Virus, 1ny7
Turnip Yellow Mosaic Virus, 1auy
Turnip Yellow Mosaic Virus, 1auy
Desmodium Yellow Mottle Virus, 1ddl
Desmodium Yellow Mottle Virus, 1ddl
Cocksfoot Mottle Virus, 1ng0
Cocksfoot Mottle Virus, 1ng0
Southern Bean Mosaic Virus, 4sbv
Southern Bean Mosaic Virus, 4sbv
Human Papillomavirus 16 L1 capsid, 1dzl
Human Papillomavirus 16 L1 capsid, 1dzl
Tomato Aspermy Virus, 1laj
Tomato Aspermy Virus, 1laj
Foot and Mouth Disease Virus (FMDV), 1bbt
Foot and Mouth Disease Virus (FMDV), 1bbt
Bovine Enterovirus, 1bev
Bovine Enterovirus, 1bev
Bean Pod Mottle Virus, 1bmv
Bean Pod Mottle Virus, 1bmv
Physalis Mottle Virus, 1qjz
Physalis Mottle Virus, 1qjz
Physalis mottle virus, 1e57
Physalis mottle virus, 1e57
Human Rhinovirus 16, 1aym
Human Rhinovirus 16, 1aym
Echovirus 1 (FAROUK Strain), 1ev1
Echovirus 1 (FAROUK Strain), 1ev1
Human Rhinovirus 1A, 1r1a
Human Rhinovirus 1A, 1r1a
Human Rhinovirus 3, 1rhi
Human Rhinovirus 3, 1rhi
Coxsackievirus B3, 1cov
Coxsackievirus B3, 1cov
Echovirus 11 (Strain 207), 1h8t
Echovirus 11 (Strain 207), 1h8t
Chimeric Human Rhinovirus 14 w/ HIV1-V3 Loop, 1k5m
Chimeric Human Rhinovirus 14 w/ HIV1-V3 Loop, 1k5m
Swine Vesicular Disease Virus, 1oop
Swine Vesicular Disease Virus, 1oop
Human rhinovirus 14, 1r08
Human rhinovirus 14, 1r08
Human rhinovirus 14, 1rmu
Human rhinovirus 14, 1rmu
Human Rhinovirus 14, 4rhv
Human Rhinovirus 14, 4rhv
Human Rhinovirus 2, 1fpn
Human Rhinovirus 2, 1fpn
Theiler's Murine Encephalomyelitis (DA Strain), 1tme
Theiler's Murine Encephalomyelitis (DA Strain), 1tme
Mengo Encephalomyocarditis Virus, 2mev
Mengo Encephalomyocarditis Virus, 2mev
Cricket Paralysis Virus, 1b35
Cricket Paralysis Virus, 1b35
Poliovirus Type 1 (Mahoney Strain) at -170c, 1asj
Poliovirus Type 1 (Mahoney Strain) at -170c, 1asj
Bacteriophage G4, 1gff
Bacteriophage G4, 1gff
Poliovirus Type 1 (Mahoney Strain), 1hxs
Poliovirus Type 1 (Mahoney Strain), 1hxs
Poliovirus Type 3 (Sabin Strain), 1pvc
Poliovirus Type 3 (Sabin Strain), 1pvc
Poliovirus Type 1 (Mahoney Strain) Empty Capsid, 1pov
Poliovirus Type 1 (Mahoney Strain) Empty Capsid, 1pov
Bacteriophage phix174, 2bpa
Bacteriophage phix174, 2bpa
Human poliovirus 1 strain mahoney, 2plv
Human poliovirus 1 strain mahoney, 2plv
Theiler's Murine Encephalomyelitis (BeAn Strain), 1tmf
Theiler's Murine Encephalomyelitis (BeAn Strain), 1tmf
Poliovirus-2 Lansing Complexed with SCH48973, 1eah
Poliovirus-2 Lansing Complexed with SCH48973, 1eah
Cowpea Chlorotic Mottle Virus, Swollen Form (C), ccmv_swln_c
Cowpea Chlorotic Mottle Virus, Swollen Form (C), ccmv_swln_c
Cowpea Chlorotic Mottle Virus, Swollen Form (D), ccmv_swln_d
Cowpea Chlorotic Mottle Virus, Swollen Form (D), ccmv_swln_d
Black Beetle Virus, 2bbv
Black Beetle Virus, 2bbv
d75n
d75n
c74
c74
Flock House Virus, 1fhv
Flock House Virus, 1fhv
Flock House Virus, 2fhv
Flock House Virus, 2fhv
Hepatitis B Virus, 1qgt
Hepatitis B Virus, 1qgt
Carnation Mottle Virus, 1opo
Carnation Mottle Virus, 1opo
Pariacoto Virus, 1f8v
Pariacoto Virus, 1f8v
Bacteriophage phix174+scaffold, 1al0
Bacteriophage phix174+scaffold, 1al0
Tomato Bushy Stunt Virus, 2tbv
Tomato Bushy Stunt Virus, 2tbv
Nodamura Virus, 1nov
Nodamura Virus, 1nov
Norwalk Virus Capsid, 1ihm
Norwalk Virus Capsid, 1ihm
Nudaurelia Capensis w Virus, 1nwv
Nudaurelia Capensis w Virus, 1nwv
sindbis
Sindbis virus, sindbis
Human Rhinovirus 16 ICAM1 complex, 1d3e
Human Rhinovirus 16 ICAM1 complex, 1d3e
L-A Virus, 1m1c
L-A Virus, 1m1c
Simian Virus 40 (SV40), 1sva
Simian Virus 40 (SV40), 1sva
Murine Polyomavirus, 1sid
Murine Polyomavirus, 1sid
Dengue virus, 1k4r
Dengue virus, 1k4r
pro1
pro1
Bacteriophage HK97 ProheadII, 1if0
Bacteriophage HK97 ProheadII, 1if0
Bacteriophage HK97 HeadII, 1fh6
Bacteriophage HK97 HeadII, 1fh6
Human Papillomavirus 16 L1, 1l0t
Human Papillomavirus 16 L1, 1l0t
Bacteriophage PRD1 Model, 1hb5
Bacteriophage PRD1 Model, 1hb5
Bacteriophage PRD1, 1gw7
Bacteriophage PRD1, 1gw7
Bacteriophage PRD1 SUS607 mutant model, 1gw8
Bacteriophage PRD1 SUS607 mutant model, 1gw8
Bacteriophage PRD1 SUS1 mutant model, 1hb7
Bacteriophage PRD1 SUS1 mutant model, 1hb7
Bacteriophage PRD1 Wt Model, 1hb9
Bacteriophage PRD1 Wt Model, 1hb9
Bluetongue Virus, 2btv
Bluetongue Virus, 2btv
Bluetongue Virus inner layer, 2btv_t2
Bluetongue Virus inner layer, 2btv_t2
Rice Dwarf Virus, 1uf2
Rice Dwarf Virus, 1uf2
Rice Dwarf Virus inner layer, 1uf2_t2
Rice Dwarf Virus inner layer, 1uf2_t2
Reovirus core, 1ej6
Reovirus core, 1ej6
Reovirus core inner layer, 1ej6_inner
Reovirus core inner layer, 1ej6_inner
Paramecium bursaria chlorella virus type 1, 1m4x28
Paramecium bursaria chlorella virus type 1, 1m4x28

Figure 5. Automatically generated images of 100 virus capsids using data from the Virus Particle Explorer (ViPER) web site. Viruses are shown to scale. Different colors represent unique structural components (PDB chains).


Download

The capabilities described in this poster have been implemented as the Multiscale extension to UCSF Chimera. Chimera is available on Windows, Mac, Linux, SGI and HP/Alpha computers. It is free for non-commercial use and can be downloaded at

	http://www.cgl.ucsf.edu/chimera

The feature for calculating atomic contacts is not yet available as of our May 2004 release (Chimera 1.1951).

Acknowledgement

This work builds on the extensible UCSF Chimera molecular modeling system developed by Greg Couch, Tom Goddard, Dan Greenblatt, Conrad Huang, Elaine Meng, and Eric Pettersen. It is funded by the National Center for Research Resources grant P41 RR-01081.