Oldenburg bioinformatics workshop 2017
Deren Eaton
Create a file called .ssh/config
and enter your host and user names:
## here is what mine looks like
cat ~/.ssh/config
Host tinus HostName tinus.eeb.yale.internal User deren Host farnam HostName farnam.hpc.yale.edu User de243 Host oldenburg HostName carl.hpc.uni-oldenburg.de User adta5102
You can now connect to the login node where you can submit jobs to the cluster scheduler to be distributed on compute nodes. Before connecting with SSH you may need to use a VPN if you are off-campus.
## connect to the cluster with ssh
ssh oldenburg
You can specify specific users with the -u
flag.
### Check the queue on the Cluster
squeue -u adta5102
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2635677 carl.p test adta5102 R 56:54 1 mpcl009
2635709 carl.p bash adta5102 R 29:02 1 mpcl009
This is very useful for debugging. Select the most available queue and enter a short walltime. It should connect quickly. Install software or run short tests on your scripts to ensure they work before submitting a long running job.
srun -p carl.p -t 1:00:00 -N 1 -n 4 --pty /bin/bash
On a shared cluster there is a typically software that is installed system-wide by an administrator. Although you cannot install system-wide software, you can ask the administrator to do it for you. The following commands are useful to find and load system software.
## shows all available software
module avail
## load a module
module load OpenMPI/2.0.1-GCC-6.2.0
## download and install conda for linux
curl -O https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash -b Miniconda2-latest-Linux-x86_64.sh
## source the installation (puts it in your $PATH)
source ~/.bashrc
## test that conda is installed.
conda info
Current conda install: platform : linux-64 conda version : 4.3.16 conda is private : False conda-env version : 4.3.16 conda-build version : 2.1.10 python version : 2.7.13.final.0 requests version : 2.12.4 root environment : /home/deren/miniconda2 (writable) default environment : /home/deren/miniconda2 envs directories : /home/deren/miniconda2/envs /home/deren/.conda/envs package cache : /home/deren/miniconda2/pkgs /home/deren/.conda/pkgs channel URLs : https://repo.continuum.io/pkgs/free/linux-64 https://repo.continuum.io/pkgs/free/noarch https://repo.continuum.io/pkgs/r/linux-64 https://repo.continuum.io/pkgs/r/noarch https://repo.continuum.io/pkgs/pro/linux-64 https://repo.continuum.io/pkgs/pro/noarch config file : /home/deren/.condarc offline mode : False user-agent : conda/4.3.16 requests/2.12.4 CPython/2.7.13 Linux/4.4.0-75-generic debian/stretch/sid glibc/2.23 UID:GID : 1000:1000
Google conda + the software you are looking for and you will probably find a recipe. One repository with a lot of useful bioinformatics software is called bioconda
.
## install some recipes from the bioconda channel
conda install raxml -c bioconda
## install some recipes from the ipyrad channel
conda install bpp -c ipyrad
carl
¶This is just a test to confirm that our software can be found from the job scheduler.
#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 24
#SBATCH --exclusive
#SBATCH --time 4:00:00
#SBATCH --mem-per-cpu 4000
#SBATCH --job-name test
#SBATCH --output test-conda-%J.txt
## show the location of my software
which conda
which raxml
which bpp
Some HPC systems do not store the user's $PATH in the job scheduler. If you see an error in which your software is not being found then simply add the source
call to your sbatch script.
#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 24
#SBATCH --exclusive
#SBATCH --time 4:00:00
#SBATCH --mem-per-cpu 4000
#SBATCH --job-name test
#SBATCH --output test-conda-%J.txt
## re-source the $PATH
source /user/adta5102/.bashrc
## show the location of my software
which conda
which raxml
which bpp
A powerful way to work on an HPC cluster is through Jupyter notebooks, which allow you to work interactively while also keeping a detailed record of your work. Submit this script below and follow the instructions to launch an SSH tunnel to connect to a remote jupyter server from your laptop.
#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 24
#SBATCH --exclusive
#SBATCH --time 24:00:00
#SBATCH --mem-per-cpu 4000
#SBATCH --job-name tunnel
#SBATCH --output jupyter-log-%J.txt
## get tunneling info
XDG_RUNTIME_DIR=""
ipnport=$(shuf -i8000-9999 -n1)
ipnip=$(hostname -i)
## print tunneling instructions to jupyter-log-{jobid}.txt
echo -e "
Paste this ssh command in a terminal on local host (i.e., laptop)
-----------------------------------------------------------------
ssh -N -L $ipnport:$ipnip:$ipnport {user@host}
Open this address in a browser on local host; see token below.
-----------------------------------------------------------------
localhost:$ipnport (prepend with https:// if using a password)
"
## launch a jupyter server on the specified port & ip
jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
Once the job starts check the log file which it produces named jupyter-log-{xxx}.txt
. It should look something like the output below. On your local computer (i.e., laptop) open a terminal and paste in the ssh tunneling command. Replace {user@host} with your credentials.
## submit the job
sbatch slurm_jupyter.sbatch
## check the lob
cat jupyter-log-2637903.txt
Follow the instructions. Paste the ssh command into your local terminal and open the localhost address in a browser.
Paste this ssh command in a terminal on local host (i.e., laptop)
-----------------------------------------------------------------
ssh -N -L 8506:10.151.9.5:8506 {user@host}
Open this address in a browser on local host; see token below.
-----------------------------------------------------------------
localhost:8506 (prepend with https:// if using a password)