Tom Goddard
April 6, 2022
Here is how I made an AlphaFold singularity image for the UCSF Wynton cluster. The Wynton cluster does not allow running Docker images because of security issues, so we run it using a singularity image.
The AlphaFold github repository has scripts to build a Docker image for running AlphaFold. We build that, then convert it to a Singularity image.
The UCSF Wynton cluster does not support Docker and does not allow building singularity images since both require root privileges. So I do these steps on a desktop Ubuntu 20.04 system where I have root access.
$ cd ~/ucsf $ git clone git@github.com:deepmind/alphafold.git $ cd ~/ucsf/alphafold $ sudo docker build -f docker/Dockerfile -t alphafold220 . Successfully built 7f7b50fd42c9 Successfully tagged alphafold220:latest $ cd ~/ucsf $ sudo docker save 7f7b50fd42c9 -o alphafold220_docker.tar $ sudo singularity build alphafold220.sif docker-archive://alphafold220_docker.tar $ rsync -av alphafold220.sif plato.cgl.ucsf.edu:alphafold_singularity
The AlphaFold github code repository contains a run_docker.py script for running the AlphaFold Docker image. I wrote a similar script run_alphafold220.py for running the AlphaFold Singularity image.
The script is designed for submitting jobs queued on the Wynton cluster using the Sun Grid Engine (SGE) queueing system. The comment lines at the top of the script give default SGE qsub command options to run in the GPU queue for up to 48 hours on an A40 or A100 GPU.
My script was written starting with the run_docker.py and replacing the docker image invocation with the equivalent singularity image invocation. I made many other changes to the AlphaFold run_docker.py script to improve ease of use:
Here is how to queue an AlphaFold jobs on Wynton.
cd /wynton/home/ferrin/goddard/alphafold_singularity qsub run_alphafold220.py --fasta_paths=seq_7p8x_A.fasta
with FASTA file seq_78px_A.fasta as given here
>7P8X_1|Chain A|Leucotoxin LukEv|Staphylococcus aureus (1280) MSVGLIAPLASPIQESRANTNIENIGDGAEVIKRTEDVSSKKWGVTQNVQFDFVKDKKYNKDALIVKMQGFINSRTSFSDVKGSGYELTKRMIWPFQYNIGLTTKDPNVSLINYLPKNKIETTDVGQTLGYNIGGNFQSAPSIGGNGSFNYSKTISYTQKSYVSEVDKQNSKSVKWGVKANEFVTPDGKKSAHDRYLFVQSPNGPTGSAREYFAPDNQLPPLVQSGFNPSFITTLSHEKGSSDTSEFEISYGRNLDITYATLFPRTGIYAERKHNAFVNRNFVVRYEVNWKTHEIKVKGHNKHHHHHH
And here is an example running a multimer prediction with two proteins
qsub run_alphafold220.py --fasta_paths=seq_6z03.fasta --model_preset=multimer
with the two sequences in FASTA file seq_6z03.fasta containing
>6Z03_1|Chains A|DNA topoisomerase I|Caldiarchaeum subterraneum (311458) MVKWRTLVHNGVALPPPYQPKGLSIKIRGETVKLDPLQEEMAYAWALKKDTPYVQDPVFQKNFLTDFLKTFNGRFQDVTINEIDFSEVYEYVERERQLKADKEYRKKISAERKRLREELKARYGWAEMDGKRFEIANWMVEPPGIFMGRGNHPLRGRWKPRVYEEDITLNLGEDAPVPPGNWGQIVHDHDSMWLARWDDKLTGKEKYVWLSDTADIKQKRDKSKYDKAEMLENHIDRVREKIFKGLRSKEPKMREIALACYLIDRLAMRVGDEKDPDEADTVGATTLRVEHVKLLEDRIEFDFLGKDSVRWQKSIDLRNEPPEVRQVFEELLEGKKEGDQIFQNINSRHVNRFLGKIVKGLTAKVFRTYIATKIVKDFLAAIPREKVTSQEKFIYYAKLANLKAAEALNHKRAPPKNWEQSIQKKEERVKKLMQQLREAESEKKKARIAERLEKAELNLDLAVKVRDYNLATSLRNYIDPRVYKAWGRYTGYEWRKIYTASLLRKFKWVEKASVKHVLQYFAEKLAKDVDKGMQVKAAV >6Z03_2|Chains B|DNA topoisomerase I|Caldiarchaeum subterraneum (311458) MVKWRTLVHNGVALPPPYQPKGLSIKIRGETVKLDPLQEEMAYAWALKKDTPYVQDPVFQKNFLTDFLKTFNGRFQDVTINEIDFSEVYEYVERERQLKADKEYRKKISAERKRLREELKARYGWAEMDGKRFEIANWMVEPPGIFMGRGNHPLRGRWKPRVYEEDITLNLGEDAPVPPGNWGQIVHDHDSMWLARWDDKLTGKEKYVWLSDTADIKQKRDKSKYDKAEMLENHIDRVREKIFKGLRSKEPKMREIALACYLIDRLAMRVGDEKDPDEADTVGATTLRVEHVKLLEDRIEFDFLGKDSVRWQKSIDLRNEPPEVRQVFEELLEGKKEGDQIFQNINSRHVNRFLGKIVKGLTAKVFRTYIATKIVKDFLAAIPREKVTSQEKFIYYAKLANLKAAEALNHKRAPPKNWEQSIQKKEERVKKLMQQLREAESEKKKARIAERLEKAELNLDLAVKVRDYNLATSLRNYIDPRVYKAWGRYTGYEWRKIYTASLLRKFKWVEKASVKHVLQYFAEKLAKDVDKGMQVKAAV
We run AlphaFold on an Ubuntu 20.04 machine in the lab with Nvidia RTX 3090 graphics minsky.cgl.ucsf.edu. For simpler use we modify the AlphaFold run_docker.py script to provide better default values as follows af220_minsky.py