[chimerax-users] Could you please inform the best advice to solve a structure via colabfold when GPU is an issue? I tried some routes already.

Fri Apr 14 16:03:30 PDT 2023

Hi Josh,

  Predicting 1300 residues is at the limit of what the ChimeraX AlphaFold tool can do because it uses Google Colab GPUs which have only 12 or 16 GB of memory.  With Colab Pro ($10/month) I never got a GPU with more than 16 GB.  But maybe there is some way if you research how online.  A GPU with 24 GB would easily handle 1300 amino acids.

  You might try the ColabFold web page (not ChimeraX) since it is using AlphaFold 2.3 which is supposed to handle larger structures with the same memory.  ChimeraX uses an older ColabFold version using AlphaFold 2.2 or 2.1 but will be updated in coming months.

  Setting up your own AlphaFold on a Linux machine with a GPU with more than 16 GB is some trouble because it uses a few TBytes of databases.

  In the future I plan to allow ChimeraX to run AlphaFold jobs on your own paid cloud GPU account (e.g. AWS, Google Cloud, Lambda Cloud) and you could just pay to use a GPU with more memory.  But that is not likely to be working until then end of 2023.

  If ChimeraX or ColabFold does not predict your 1300 residue structure with reasonable confidence in 2 pieces it is very unlikely to do better run as a single piece.

	Tom

> On Apr 14, 2023, at 3:17 PM, Elaine Meng via ChimeraX-users <chimerax-users at cgl.ucsf.edu> wrote:
> 
> Hi Josh,
> I'm no expert in this area but here are a few thoughts:
> 
> (1) make sure that your protein is not already available in the AlphaFold Database, e.g. did you try the Fetch option on the tool?
> <https://rbvi.ucsf.edu/chimerax/docs/user/tools/alphafold.html#match>
> 
> (2) for parameter settings, you might consider asking the ColabFold developers (see authors on the ColabFold publication in the ChimeraX AlphaFold documentation), 
> <https://rbvi.ucsf.edu/chimerax/docs/user/tools/alphafold.html#predict>
> 
> ...but perhaps some of these parameters are not adjustable via the ChimeraX tool.  If so, you'd need to run ColabFold directly instead.  I don't know if the UCSF professor you're collaborating with is in one of the groups with access to the UCSF Wynton cluster, but there is information about setup and running on the Wynton cluster here:
> <https://www.rbvi.ucsf.edu/chimerax/data/singularity-apr2022/afsingularity.html>
> 
> (3) as mentioned in the ChimeraX AlphaFold documentation (top link above), you can pay Google for Colab accounts that provide greater computing resources.  Sounds like you already tried that, but with high predicted error.  It is unclear, however, whether this structure would have high predicted error regardless of whether you solved it in multiple pieces or a single piece.  Some proteins may simply have lots of disordered/unstructured regions, and/or maybe there is not much sequence data available for homologs of this protein to help constrain the problem.
> 
> (4) consider trying ESMFold, for which there is also a tool in ChimeraX analogous to the AlphaFold one.  There are database fetch and new-prediction options.  Still, I don't know if this approach would work in your specific case either.
> <https://rbvi.ucsf.edu/chimerax/docs/user/tools/esmfold.html>
> 
> I hope this helps,
> Elaine
> -----
> Elaine C. Meng, Ph.D.                       
> UCSF Chimera(X) team
> Department of Pharmaceutical Chemistry
> University of California, San Francisco
> 
>> On Apr 14, 2023, at 2:52 PM, Josh Gutierrez via ChimeraX-users <chimerax-users at cgl.ucsf.edu> wrote:
>> 
>> Hello, 
>> 
>> I recently submitted a job through ChimeriaX and used the "prediction" task for it to be solved via Alphafold2-- the sequence is about 1300 amino acids in length. However, the protein could not be solved by alphafold2 and the colabfold reports the GPU consumption is too high for the structure to be solved. To resolve the problem, I tried breaking the sequence into two to solve this expected bimodal protein individually (into two separate structural parts) and even in combination purchased "compute units" for more memory and storage to do this, but the structure can't be solved without high predicted error. 
>> 
>> Please let me know what you think may  resolve this issue. Currently, I am working with an external professor who may be able to connect me with a collaborator on this issue at UCSF if there is a cluster or core we can use to expedite the process to solve this structure I am addressing. Please let me know your thoughts at your earliest convenience. Please also let me know if you need any more information!  
>> 
>> Many thanks, 
>> Josh Gutierrez
> 
> 
> _______________________________________________
> ChimeraX-users mailing list
> ChimeraX-users at cgl.ucsf.edu
> Manage subscription:
> https://www.rbvi.ucsf.edu/mailman/listinfo/chimerax-users
>