SSH server connection tricks

Oldenburg bioinformatics workshop 2017
Deren Eaton

Faster & easier login

Create a file called .ssh/config and enter your host and user names:

In [1]:
## here is what mine looks like
cat ~/.ssh/config
Host tinus
    HostName tinus.eeb.yale.internal
    User deren

Host farnam
    HostName farnam.hpc.yale.edu
    User de243

Host oldenburg
    HostName carl.hpc.uni-oldenburg.de
    User adta5102

Simple connection

You can now connect to the login node where you can submit jobs to the cluster scheduler to be distributed on compute nodes. Before connecting with SSH you may need to use a VPN if you are off-campus.

In [ ]:
## connect to the cluster with ssh
ssh oldenburg

Check the queue

You can specify specific users with the -u flag.

In [ ]:
### Check the queue on the Cluster
squeue -u adta5102
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  2635677    carl.p     test adta5102  R      56:54      1 mpcl009
  2635709    carl.p     bash adta5102  R      29:02      1 mpcl009

Connect to a compute node interactively

This is very useful for debugging. Select the most available queue and enter a short walltime. It should connect quickly. Install software or run short tests on your scripts to ensure they work before submitting a long running job.

In [ ]:
srun -p carl.p -t 1:00:00 -N 1 -n 4 --pty /bin/bash

Access system-wide software

On a shared cluster there is a typically software that is installed system-wide by an administrator. Although you cannot install system-wide software, you can ask the administrator to do it for you. The following commands are useful to find and load system software.

In [ ]:
## shows all available software
module avail

## load a module
module load OpenMPI/2.0.1-GCC-6.2.0

Install local software

Alternatively, you can install software locally, in which case, I recommend using conda. This is the default mode by which ipyrad should be installed, and is useful for other software which you want to update frequently.

In [ ]:
## download and install conda for linux
curl -O https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash -b Miniconda2-latest-Linux-x86_64.sh 

## source the installation (puts it in your $PATH)
source ~/.bashrc
In [2]:
## test that conda is installed. 
conda info
Current conda install:

               platform : linux-64
          conda version : 4.3.16
       conda is private : False
      conda-env version : 4.3.16
    conda-build version : 2.1.10
         python version : 2.7.13.final.0
       requests version : 2.12.4
       root environment : /home/deren/miniconda2  (writable)
    default environment : /home/deren/miniconda2
       envs directories : /home/deren/miniconda2/envs
                          /home/deren/.conda/envs
          package cache : /home/deren/miniconda2/pkgs
                          /home/deren/.conda/pkgs
           channel URLs : https://repo.continuum.io/pkgs/free/linux-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/linux-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/linux-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : /home/deren/.condarc
           offline mode : False
             user-agent : conda/4.3.16 requests/2.12.4 CPython/2.7.13 Linux/4.4.0-75-generic debian/stretch/sid glibc/2.23
                UID:GID : 1000:1000

Once conda is installed almost anything is available to you

Google conda + the software you are looking for and you will probably find a recipe. One repository with a lot of useful bioinformatics software is called bioconda.

In [ ]:
## install some recipes from the bioconda channel
conda install raxml -c bioconda

## install some recipes from the ipyrad channel
conda install bpp -c ipyrad

Write a SLURM (sbatch) submission script for carl

This is just a test to confirm that our software can be found from the job scheduler.

In [ ]:
#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 24
#SBATCH --exclusive
#SBATCH --time 4:00:00
#SBATCH --mem-per-cpu 4000
#SBATCH --job-name test
#SBATCH --output test-conda-%J.txt

## show the location of my software
which conda
which raxml
which bpp

If not, just add a source command

Some HPC systems do not store the user's $PATH in the job scheduler. If you see an error in which your software is not being found then simply add the source call to your sbatch script.

In [ ]:
#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 24
#SBATCH --exclusive
#SBATCH --time 4:00:00
#SBATCH --mem-per-cpu 4000
#SBATCH --job-name test
#SBATCH --output test-conda-%J.txt

## re-source the $PATH
source /user/adta5102/.bashrc

## show the location of my software
which conda
which raxml
which bpp

Start a jupyter notebook server

A powerful way to work on an HPC cluster is through Jupyter notebooks, which allow you to work interactively while also keeping a detailed record of your work. Submit this script below and follow the instructions to launch an SSH tunnel to connect to a remote jupyter server from your laptop.

In [ ]:
#!/bin/bash
#SBATCH --partition carl.p
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 24
#SBATCH --exclusive
#SBATCH --time 24:00:00
#SBATCH --mem-per-cpu 4000
#SBATCH --job-name tunnel
#SBATCH --output jupyter-log-%J.txt

## get tunneling info
XDG_RUNTIME_DIR=""
ipnport=$(shuf -i8000-9999 -n1)
ipnip=$(hostname -i)

## print tunneling instructions to jupyter-log-{jobid}.txt
echo -e "
   Paste this ssh command in a terminal on local host (i.e., laptop)
   -----------------------------------------------------------------
   ssh -N -L $ipnport:$ipnip:$ipnport {user@host}      

   Open this address in a browser on local host; see token below.
   -----------------------------------------------------------------
   localhost:$ipnport  (prepend with https:// if using a password)
   "

## launch a jupyter server on the specified port & ip
jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip

Submit the script to the queue

Once the job starts check the log file which it produces named jupyter-log-{xxx}.txt. It should look something like the output below. On your local computer (i.e., laptop) open a terminal and paste in the ssh tunneling command. Replace {user@host} with your credentials.

In [ ]:
## submit the job
sbatch slurm_jupyter.sbatch
In [ ]:
## check the lob
cat jupyter-log-2637903.txt

The jupyter log file

Follow the instructions. Paste the ssh command into your local terminal and open the localhost address in a browser.

Paste this ssh command in a terminal on local host (i.e., laptop)
   -----------------------------------------------------------------
   ssh -N -L 8506:10.151.9.5:8506 {user@host}      

   Open this address in a browser on local host; see token below.
   -----------------------------------------------------------------
   localhost:8506  (prepend with https:// if using a password)

Connecting to the notebook server (video tutorial)