[chimerax-users] AlphaFold model for large proteins

Tom Goddard goddard at sonic.net
Thu Sep 9 16:10:23 PDT 2021


Hi Yunsik,

  I've seen some Twitter messages that say AlphaFold might handle 2000 amino acids if you have say 4 GPUs and allow it to use memory from all of them and possibly do other tricks.  But the limitation is that graphics processors have only so much memory and AlphaFold wants a lot, and more for larger sequences.  So I don't think it is possible to compute a longer sequence as a single structure currently.  It certainly is possible to combine the AlphaFold 1400 amino acid chunks in more sensible ways trying to avoid clashes but I don't have time to pursue that.

  I figured you might not have wanted to mention your protein, but if you say it has 5005 amino acids then the cat is out of the bag.

	Tom

> On Sep 9, 2021, at 3:39 PM, Yunsik Kang <kangy at ohsu.edu> wrote:
> 
> Hi Tom,
>  
> Thank you so much! It is great that I inspired you to write this command.
>  
> Since I have you (and I am no expert in structural biology), do you think alphafold will soon update larger proteins? Is there still a limitation? On your Youtube videos you guys state after 700 aa the system crashes a lot.
>  
> If I know a protein that perhaps might have a similar structure to my protein would it help resolves these clashing problems?
>  
> As you can see I tried to hide what my protein of interest was but it seems there are not a lot of 5005 amino acid proteins =).
>  
> Thank you again!
> Yunsik
>  
>  
> From: Tom Goddard <goddard at sonic.net <mailto:goddard at sonic.net>> 
> Sent: Thursday, September 9, 2021 3:17 PM
> To: Yunsik Kang <kangy at ohsu.edu <mailto:kangy at ohsu.edu>>
> Cc: ChimeraX Users Help <chimerax-users at cgl.ucsf.edu <mailto:chimerax-users at cgl.ucsf.edu>>
> Subject: [EXTERNAL] AlphaFold model for large proteins
>  
> Hi Yunsik, 
>  
>   Inspired by your question and Tristan Croll's comment that the AlphaFold database has human proteins longer than 1400 amino acids computed as separate 1400 amino acid chunks I made a ChimeraX command "bigalpha" that loads and aligns these chunks.
>  
> https://rbvi.github.io/chimerax-recipes/big_alphafold/bigalpha.html <https://rbvi.github.io/chimerax-recipes/big_alphafold/bigalpha.html>
>  
> I attach two images and a PDB model made by running that command in ChimeraX for 5005 amino acid transmembrane protein
>  
> Q2LD37 | K1109_HUMAN Transmembrane protein KIAA1109 OS=Homo sapiens OX=9606 GN=KIAA1109 PE=1 SV=2
>  
> and also here are a few other examples I looked at yesterday.
>  
> https://twitter.com/UCSFChimeraX/status/1435870388043411458 <https://twitter.com/UCSFChimeraX/status/1435870388043411458>
>  
> https://twitter.com/UCSFChimeraX/status/1435760774824169474 <https://twitter.com/UCSFChimeraX/status/1435760774824169474>
>  
> https://twitter.com/UCSFChimeraX/status/1435859111053193216 <https://twitter.com/UCSFChimeraX/status/1435859111053193216>
>  
> Keep in mind that the pieces of these models are not aligned in a reliable way and probably clash badly with each other because AlphaFold did not compute them as part of one structure.  So these structures should only give you a very rough idea about the protein.
>  
>   Tom
>  
> This is the confidence coloring determined by AlphaFold, red low, blue high confidence.  To reproduce this color using ChimeraX daily build open the PDB model and type command "color bfactor #1 palette alphafold".  (The confidence value is saved as the bfactor in the PDB file).
> 
>  
> Different AlphaFold chunks have different colors and chain identifiers in the PDB model.
> 
>  
> 
> 
> On Sep 8, 2021, at 2:45 PM, Yunsik Kang via ChimeraX-users <chimerax-users at cgl.ucsf.edu <mailto:chimerax-users at cgl.ucsf.edu>> wrote:
>  
> Hello,
>  
> My name is Yunsik Kang, and I am a postdoc in Marc Freeman’s lab at the Vollum Institute.
>  
> I would love to use ChimeraX to predict the structure of my protein of interest. I watched all the YouTube videos and tried to run the program.
>  
> Unfortunately, my protein is 5005 amino acids in humans and 2958 aa in yeast. I get a message “Please use the full AlphaFold system for long sequences.”
>  
> My question is what is the best way to approach this problem? Should I cut the protein in half and run the program? In one of the videos, it mentions after 700 aa it will have problems. Will it work if I get Colab-Pro? Or would the server crash no matter what.
>  
> I am not a structural biologist, but I hope the structure can help be predict me with my research.  
>  
> Thank you,
> Yunsik
>  
>  
> _______________________________________________
> ChimeraX-users mailing list
> ChimeraX-users at cgl.ucsf.edu <mailto:ChimeraX-users at cgl.ucsf.edu>
> Manage subscription:
> https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users <https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users>
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://plato.cgl.ucsf.edu/pipermail/chimerax-users/attachments/20210909/7e0d4d90/attachment.html>


More information about the ChimeraX-users mailing list