Tom Goddard
goddard@sonic.net
November 18, 2022
updated to AlphaFold 2.3.2 on January 21, 2024
Tip : Instead of the instructions on this page I suggest you use AlphaFold 3 on Wynton for faster better quality prediction of proteins, nucleic acids, ligands, and ions or ColabFold on Wynton which makes predictions 5 times faster than AlphaFold 2 with similar quality. |
Here is how UCSF researchers can run AlphaFold 2.3.2 on the UCSF Wynton cluster. You will need a Wynton account.
Login to wynton and submit the AlphaFold prediction job using the command
$ qsub /wynton/home/ferrin/goddard/alphafold_singularity/run_alphafold232.py --fasta_paths=seq_7p8x_A.fasta Your job 171002 ("alphafold") has been submitted
To check if the jobs has started use the Wynton qstat command
$ qstat job-ID priority name user state submit/start at queue slots ja-task-ID -------------------------------------------------------------------------------------------------------------------- 171002 0.60943 alphafold goddard r 11/18/2022 11:20:22 gpu.q@qb3-atgpu24 1
Output and error log files will appear in files named by the job id in the directory where you submitted the job.
alphafold.e171002 alphafold.o171002
The job will take 1 hour to 30 hours depending on sequence length (100 residues - 3000), example run times here. Output will appear in a directory that AlphaFold creates called "output" in the directory where you ran the qsub job submission. It produces 5 energy predicted structures (unrelaxed_model_*.pdb) using 5 differently trained neural networks and energy minimized versions (relaxed_model*.pdb) and predicted aligned error files (result_model*.pkl) that can be viewed in ChimeraX.
$ ls output/seq_7p8x_A features.pkl msas ranked_0.pdb ranked_1.pdb ranked_2.pdb ranked_3.pdb ranked_4.pdb ranking_debug.json relaxed_model_1_ptm_pred_0.pdb relaxed_model_2_ptm_pred_0.pdb relaxed_model_3_ptm_pred_0.pdb relaxed_model_4_ptm_pred_0.pdb relaxed_model_5_ptm_pred_0.pdb result_model_1_ptm_pred_0.pkl result_model_2_ptm_pred_0.pkl result_model_3_ptm_pred_0.pkl result_model_4_ptm_pred_0.pkl result_model_5_ptm_pred_0.pkl timings.json unrelaxed_model_1_ptm_pred_0.pdb unrelaxed_model_2_ptm_pred_0.pdb unrelaxed_model_3_ptm_pred_0.pdb unrelaxed_model_4_ptm_pred_0.pdb unrelaxed_model_5_ptm_pred_0.pdb
The maximum sequence length that can be predicted is about 3500 residues and is limited by GPU memory. The Wynton job will use an Nvidia A40 GPU with 48 Gbytes of GPU memory.
The FASTA file seq_78px_A.fasta has these contents
>7P8X_1|Chain A|Leucotoxin LukEv|Staphylococcus aureus (1280) MSVGLIAPLASPIQESRANTNIENIGDGAEVIKRTEDVSSKKWGVTQNVQFDFVKDKKYNKDALIVKMQGFINSRTSFSDVKGSGYELTKRMIWPFQYNIGLTTKDPNVSLINYLPKNKIETTDVGQTLGYNIGGNFQSAPSIGGNGSFNYSKTISYTQKSYVSEVDKQNSKSVKWGVKANEFVTPDGKKSAHDRYLFVQSPNGPTGSAREYFAPDNQLPPLVQSGFNPSFITTLSHEKGSSDTSEFEISYGRNLDITYATLFPRTGIYAERKHNAFVNRNFVVRYEVNWKTHEIKVKGHNKHHHHHH
And here is an example running a multimer prediction with two proteins
$ qsub /wynton/home/ferrin/goddard/alphafold_singularity/run_alphafold232.py --fasta_paths=seq_6z03.fasta --model_preset=multimer
with the two sequences in FASTA file seq_6z03.fasta containing
>6Z03_1|Chains A|DNA topoisomerase I|Caldiarchaeum subterraneum (311458) MVKWRTLVHNGVALPPPYQPKGLSIKIRGETVKLDPLQEEMAYAWALKKDTPYVQDPVFQKNFLTDFLKTFNGRFQDVTINEIDFSEVYEYVERERQLKADKEYRKKISAERKRLREELKARYGWAEMDGKRFEIANWMVEPPGIFMGRGNHPLRGRWKPRVYEEDITLNLGEDAPVPPGNWGQIVHDHDSMWLARWDDKLTGKEKYVWLSDTADIKQKRDKSKYDKAEMLENHIDRVREKIFKGLRSKEPKMREIALACYLIDRLAMRVGDEKDPDEADTVGATTLRVEHVKLLEDRIEFDFLGKDSVRWQKSIDLRNEPPEVRQVFEELLEGKKEGDQIFQNINSRHVNRFLGKIVKGLTAKVFRTYIATKIVKDFLAAIPREKVTSQEKFIYYAKLANLKAAEALNHKRAPPKNWEQSIQKKEERVKKLMQQLREAESEKKKARIAERLEKAELNLDLAVKVRDYNLATSLRNYIDPRVYKAWGRYTGYEWRKIYTASLLRKFKWVEKASVKHVLQYFAEKLAKDVDKGMQVKAAV >6Z03_2|Chains B|DNA topoisomerase I|Caldiarchaeum subterraneum (311458) MVKWRTLVHNGVALPPPYQPKGLSIKIRGETVKLDPLQEEMAYAWALKKDTPYVQDPVFQKNFLTDFLKTFNGRFQDVTINEIDFSEVYEYVERERQLKADKEYRKKISAERKRLREELKARYGWAEMDGKRFEIANWMVEPPGIFMGRGNHPLRGRWKPRVYEEDITLNLGEDAPVPPGNWGQIVHDHDSMWLARWDDKLTGKEKYVWLSDTADIKQKRDKSKYDKAEMLENHIDRVREKIFKGLRSKEPKMREIALACYLIDRLAMRVGDEKDPDEADTVGATTLRVEHVKLLEDRIEFDFLGKDSVRWQKSIDLRNEPPEVRQVFEELLEGKKEGDQIFQNINSRHVNRFLGKIVKGLTAKVFRTYIATKIVKDFLAAIPREKVTSQEKFIYYAKLANLKAAEALNHKRAPPKNWEQSIQKKEERVKKLMQQLREAESEKKKARIAERLEKAELNLDLAVKVRDYNLATSLRNYIDPRVYKAWGRYTGYEWRKIYTASLLRKFKWVEKASVKHVLQYFAEKLAKDVDKGMQVKAAV
The prediction is using AlphaFold 2.3.2 that I packaged as a singularity container. It uses a Python script run_alphafold232.py that loads the singularity image alphafold232.sif that are located in my home directory.
/wynton/home/ferrin/goddard/alphafold_singularity
You can use the version directly in my home directory or you can copy one or both files to your own directory. The only reason to copy them would be to modify the Python script to change how it runs AlphaFold. The comment lines at the top of the Python script set parameters for the Wynton queing system such as what type of GPU, how much memory, how long to allow the job to run. If you copy the singularity image alphafold232.sif, you will need to edit the Python script to use the path to your copy.
I made the AlphaFold singularity image on a different Linux computer where I had root access following instructions here.
AlphaFold is using 2 Tbytes of sequence databases to compute multiple sequence alignments that are installed on Wynton in directory
/wynton/group/databases/alphafold_CASP14_v2.3.0