Tom Goddard
goddard@sonic.net
December 4, 2024
Here is how UCSF researchers can run AlphaFold 3 on the UCSF Wynton cluster. You will need a Wynton account. You can also run AlphaFold 2 on Wynton.
Login to wynton (host log1.wynton.ucsf.edu) and create an AlphaFold 3 input file named nipah_zmr.json that contains the protein sequence and ligand 3-letter Chemical Component Dictionary code. This example includes two zanamivir molecules, chain names B and C.
{ "name": "nipah_zmr", "modelSeeds": [1], "sequences": [ {"protein": {"id":"A", "sequence": "MPTESKKVRFENTASDKGKNPSKVIKSYYGTMDIKKINEGLLDSKILSAFNTVIALLGSIVIIVMNIMIIQNYTRCTDNQAMIKDALQSIQQQIKGLADKIGTEIGPKVSLIDTSSTITIPANIGLLGSKISQSTASINENVNEKCKFTLPPLKIHECNISCPNPLPFREYKPQTEGVSNLVGLPNNICLQKTSNQILKPKLISYTLPVVGQSGTCITDPLLAMDEGYFAYSHLEKIGSCSRGVSKQRIIGVGEVLDRGDEVPSLFMTNVWTPSNPNTVYHCSAVYNNEFYYVLCAVSVVGDPILNSTYWSGSLMMTRLAVKPKNNGESYNQHQFALRNIEKGKYDKVMPYGPSGIKQGDTLYFPAVGFLVRTEFTYNDSNCPIAECQYSKPENCRLSMGIRPNSHYILRSGLLKYNLSDEENSKIVFIEISDQRLSIGSPSKIYDSLGQPVFYQASFSWDTMIKFGDVQTVNPLVVNWRDNTVISRPGQSQCPRFNKCPEVCWEGVYNDAFLIDRINWISAGVFLDSNQTAENPVFTVFKDNEVLYRAQLASEDTNAQKTITNCFLLKNKIWCISLVEIYDTGDNVIRPKLFAVKIPEQCT" } }, {"ligand": {"id": ["B", "C"], "ccdCodes":["ZMR"] } } ], "dialect": "alphafold3", "version": 1 }
Submit the AlphaFold 3 jobs to the Wynton GPU queue
$ qsub /wynton/home/ferrin/goddard/alphafold_singularity/run_alphafold3.py --json_path=nipah_zmr.json
To check if the jobs has started use the Wynton qstat command. "qw" means waiting to run and "r" means running. No output means job has completed.
$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 1118294 0.24570 alphafold goddard qw 12/04/2024 13:47:06 1
The job will take about 30 minutes to several hours after it starts running. Run time depends on the length of the sequences and also how the Wynton network file system is performing and the load from other jobs on the same compute node.
Unfortunately the AlphaFold 3 speed on Wynton is much worse than optimal because of the slow file system. In the above example the sequence search took 17 minutes and the structure calculations took 2 minutes (A40 gpu). On an Ubuntu 24 desktop with databases on an NVMe drive and Nvidia RTX 4090 toe sequence search took 5 minutes and structure calculations took 1 minute. If the ~400 Gbytes of sequence databases were in a RAM disk with fast DDR5 memory (32 GB/s) it should be possible to reduce the sequence search to 30 seconds so that the prediction would take 1.5 minutes, about 10 times faster than Wynton's 19 minutes. Here are notes on how to optimize MSA calculation speed.
Output and error log files will appear in files named by the job id in the directory where you submitted the job.
alphafold.e1118294 - AlphaFold 3 stderr log alphafold.o1118294 - AlphaFold 3 stdout log
Prediction output will appear in a subdirectory with the name specified the input JSON file.
$ ls nipah_zmr nipah_zmr_confidences.json - pLDDT and PAE per-residue and per-atom confidence scores nipah_zmr_data.json - Input specification with computed multiple sequence alignment added nipah_zmr_model.cif - Best scoring atomic model prediction nipah_zmr_summary_confidences.json - Summary confidence scores PTM, chain pair iPTM and mininum PAE, ... ranking_scores.csv - Ranking score seed-1_sample-0/ - First of 5 predictions confidences.json model.cif summary_confidences.json seed-1_sample-1/ - Second of 5 predictions seed-1_sample-2/ seed-1_sample-3/ seed-1_sample-4/
AlphaFold 3 can predict complexes of proteins, nucleic acids, ligands, ions and handle modified residues. See the AlphaFold 3 documentation for details about how to handle all these cases.
Here are a few common cases. To predict a complex with multiple proteins include a block for each different sequence of the form as shown in the example above.
{"protein": {"sequence":..., "id":...}}
To repeat a sequence list multiple chain ids within the single "protein" block for that sequence
{"protein": {"sequence":..., "id": ["A", "B"]}}
To add multiple different ligands repeat the ligand block for each new ligand
{"ligand": {"ccdCodes":..., "id":...}}
To add multiple copies of the same ligand specify multiple ids.
{"ligand": {"ccdCodes":..., "id":["D", "E", "F"]}
If a ligand does not have a chemical component dictionary 3-letter code (those are available only for ligands appearing in PDB structures), you can instead specify a ligand using a SMILES string (e.g. from PubChem).
{"ligand": {"smiles":"CC(=O)OC1=CC=CC=C1C(=O)O", "id":["G"]}
The AlphaFold 3 performance documentation says that on a GPU with 80 GB of graphics memory (Nvidia H100 or A100) it can predict 5120 tokens where a token is a standard residue or individual atom for ligands or ions or modified residues. Wynton has a few Nvidia A100 80GB GPUs, but has many A40 48 GB. The documentation says that a 40GB A100 can handle 4352 tokens. The run_alphafold3.py script on Wynton by default asks for 40GB of graphics memory.
The prediction is using AlphaFold 3 that I packaged as a singularity container. It uses a Python script run_alphafold3.py that uses the singularity image alphafold3_40gb.sif that are located in my home directory
/wynton/home/ferrin/goddard/alphafold_singularity
You can use the version directly in my home directory or you can copy one or both files to your own directory. You can copy the Python run_alphafold3.py script to change how it runs AlphaFold. The comment lines at the top of the Python script set parameters for the Wynton queueing system such as what type of GPU, how much memory, how long to allow the job to run. If you copy the singularity image alphafold3_40gb.sif, you will need to edit the Python script to use your copy.
Options to the run_alphafold3.py script which launches the singularity image can be listed using the --help option. Only the --json_path or --input_dir option is required.
$ /wynton/home/ferrin/goddard/alphafold_singularity/run_alphafold3.py --help usage: run_alphafold3.py [-h] [--json_path JSON_PATH] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR] [--model_dir MODEL_DIR] [--flash_attention_implementation {triton,cudnn,xla}] [--run_data_pipeline RUN_DATA_PIPELINE] [--run_inference RUN_INFERENCE] [--db_dir DB_DIR] [--jackhmmer_n_cpu JACKHMMER_N_CPU] [--nhmmer_n_cpu NHMMER_N_CPU] [--jax_compilation_cache_dir JAX_COMPILATION_CACHE_DIR] [--gpu_devices GPU_DEVICES] [--singularity_image_path SINGULARITY_IMAGE_PATH] [--use_a100_80gb_settings USE_A100_80GB_SETTINGS] Run AlphaFold structure prediction using singularity image. optional arguments: -h, --help show this help message and exit --json_path JSON_PATH Paths to the input JSON file --input_dir INPUT_DIR Paths to the directory containing input JSON files --output_dir OUTPUT_DIR Paths to a directory where the results will be saved --model_dir MODEL_DIR Path to the model to use for inference. --flash_attention_implementation {triton,cudnn,xla} Flash attention implementation to use. 'triton' and 'cudnn' uses a Triton and cuDNN flash attention implementation, respectively. The Triton kernel is fastest and has been tested more thoroughly. The Triton and cuDNN kernels require Ampere GPUs or later. 'xla' uses an XLA attention implementation (no flash attention) and is portable across GPU devices. --run_data_pipeline RUN_DATA_PIPELINE Whether to run the data pipeline on the fold inputs. --run_inference RUN_INFERENCE Whether to run inference on the fold inputs. --db_dir DB_DIR Path to the directory containing the databases. --jackhmmer_n_cpu JACKHMMER_N_CPU Number of CPUs to use for Jackhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup. --nhmmer_n_cpu NHMMER_N_CPU Number of CPUs to use for Nhmmer. Default to min(cpu_count, 8). Going beyond 8 CPUs provides very little additional speedup. --jax_compilation_cache_dir JAX_COMPILATION_CACHE_DIR Path to a directory for the JAX compilation cache. --gpu_devices GPU_DEVICES Comma separated list GPU identifiers to set environment variable CUDA_VISIBLE_DEVICES. --singularity_image_path SINGULARITY_IMAGE_PATH Path to the AlphaFold singularity image. --use_a100_80gb_settings USE_A100_80GB_SETTINGS Use AlphaFold 3 settings for A100 80 GB graphics. If not set use A100 40 GB settings.
I made the AlphaFold singularity image on a different Linux computer (quillian.cgl.ucsf.edu) where I had root access. I followed the AlphaFold 3 installation instructions to make a Docker image. Then I built a singularity image from the Docker image in the same way as for AlphaFold 2 using these commands:
$ cd alphafold3 $ sudo docker build -t alphafold3 -f docker/Dockerfile . >& docker_build.out $ sudo docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE alphafold3 latest 25d42630b205 3 minutes ago 9.06GB $ sudo docker save 25d42630b205 -o alphafold3_docker.tar $ sudo singularity build alphafold3.sif docker-archive://alphafold3_docker.tar
AlphaFold 3 uses uses 394 Gbytes of sequence databases to compute multiple sequence alignments and 234 Gbytes of PDB mmcif files as template models that are installed on Wynton in directory
/wynton/group/databases/alphafold3
Alphafold 3 predictions on Wynton issue a warning that the Nvidia PTX compiler version 12.6.77 that Alphafold 3 wants to use (via JAX) is not supported by Wynton's older Nvidia 12.4 driver. This does not effect the predictions, but may slow the structure inference stage. The AlphaFold 3 install requires PyPi package jax[cuda12] which currently includes Nvidia 12.6 driver support independent of which Nvidia drivers are on the installation machine. (I tried Nvidia 12.2 and 12.4 drivers and it made no difference).
Here is an example of the warning.
2024-11-20 11:29:47.497048: W external/xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.4 which is older than the PTX compiler version 12.6.77. Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.