General Resources

pqph

You can check the current queue resources and staus here: pqph queue status
Currently, pqph consists mainly of 40 proc/124GB nodes and a couple of 48 proc/256GB nodes in pqph.

Express queue

We can also now submit jobs to the Express queue
Use this for anytime our pqph is looking full or if you have a job you think will take more than a day or longer than 3 days

To run express jobs, use the command line input:

qsub -q express -P exp-00034 -lselect=1:ncpus=48:mem=126gb -lwalltime=72:00:00

Or use this inside a PBS submit script:

# batch processing commands
#PBS -l walltime=72:00:00
#PBS -lselect=1:ncpus=48:mem=126000MB
#PBS -j oe
#PBS -q express -P exp-00034

Don’t forget to call less memory in the gaussian com file say 125GB

Gaussian jobs

Recommended job specifications

For running Gaussian jobs on pqph it is recommended to just use two job sizings. The sizings mean that either a full 40 proc node will be used or just half of the node, allowing for a second job to be run on the other half of the node. These are only applicable to Gaussian jobs which can't be run across nodes. You may want to use multiple nodes/alternate job sizings for codes which are parallelised.

Small/medium jobs:

Run jobs using half of a 40 processor node and half the memory allowance (64GB).
PBS script input:

#PBS -l walltime=72:00:00

#PBS -lselect=1:ncpus=20:mem=64000MB

Gaussian .com file input:

%nprocs=20

%mem=60000MB

Medium/large jobs:

Run jobs using a full 40 processor node and the full the memory allowance (128GB).
PBS script input:

#PBS -l walltime=72:00:00

#PBS -lselect=1:ncpus=40:mem=128000MB

Gaussian .com file input:

%nprocs=40

%mem=122000MB

If you need to use the larger (48 proc) nodes for more expensive calculations:

Run jobs using a full 48 processor node and the full the memory allowance (256GB).
PBS script input:

#PBS -l walltime=72:00:00

#PBS -lselect=1:ncpus=48:mem=25000MB

Gaussian .com file input:

%nprocs=48

%mem=256000MB

Runscripts

Standard job script

An example gaussian runscript for a 20 processor job:

#!/bin/sh

# Submit jobs to the queue with this script using the following command:
#
# qsub -N jobname -v in=name rs20
#
# Where: rs20 is the name of this runscript
# jobname is a name you will see in the qstat command
# name is the actual file minus .com etc it is passed into this script as ${in%.com}

# batch processing commands
#PBS -l walltime=72:00:00
#PBS -lselect=1:ncpus=20:mem=64000MB
#PBS -j oe
#PBS -q pqph
#PBS -m a

# Load relevant modules

module load gaussian/g16-a03

# Check for a checkpoint file to copy to the temp directory
# variable PBS_O_WORKDIR=directory from which the job was submitted.
  if [[ -e $PBS_O_WORKDIR/${in%.com}.chk  ]]
  then
    echo "$PBS_O_WORKDIR/${in%.com}.chk located"
    cp $PBS_O_WORKDIR/${in%.com}.chk $TMPDIR/.
  else
    echo "no checkpoint file $PBS_O_WORKDIR/${in%.com}.chk"
  fi

# Execute Gaussian
#
  g16 $PBS_O_WORKDIR/${in}

# Once job is finished copy across the .chk file
cp $TMPDIR/${in%.com}.chk /$PBS_O_WORKDIR/.

# exit

Edit the PBS lines to create runscripts for other job specifications.
If you are not sure what the PBS commands are or on what the runscript does then check out the introduction to the hpc page: Getting Started on the HPC
If you need to copy back any other output files you can either run the job in ${EPHERMAL} instead of ${TMPDIR} as all output files will remain in ephermal for a period of time. Or, if you know the extension of the additional desired output file you can use a modified version of the code:

# Check for the existence of other possible output files and copy if located
  if [[ -e $TMPDIR/*.extension ]]
    then
    cp $TMPDIR/*.extension /$PBS_O_WORKDIR/${in%.com}_*.extension
  fi

Replace *.extension with the correct file extension that you want to copy back.

Array jobs

If you have a large number of small jobs which are only slightly different e.g. optimising a large number of conformers of a molecule/system that only vary in the input structure, then you should use an array job.

An example array job runscript for a 20 processor job is:

#!/bin/sh

# batch processing commands
#PBS -l walltime=72:00:00
#PBS -lselect=1:ncpus=20:mem=64000MB
#PBS -J 1-X
#PBS -j oe
#PBS -q pqph
#PBS -m a
#PBS -N arrayJobName

in=$( sed -n ${PBS_ARRAY_INDEX}p ${PBS_O_WORKDIR}/inputFiles.txt)

echo ${in}
# Load relevant modules
module load gaussian/g16-a03

# Check for a checkpoint file to copy to the temp directory
# variable PBS_O_WORKDIR=directory from which the job was submitted.
  if [[ -e $PBS_O_WORKDIR/${in%.com}.chk  ]]
  then
    echo "$PBS_O_WORKDIR/${in%.com}.chk located"
    cp $PBS_O_WORKDIR/${in%.com}.chk $TMPDIR/.
  fi

# Execute Gaussian
  g16 $PBS_O_WORKDIR/${in}

# Once job is finished copy across the .chk file
cp $TMPDIR/${in%.com}.chk /$PBS_O_WORKDIR/.

# exit

To use the script:

Set up all your input .com files in the same directory
Edit the line in the runscript that sets the number of jobs in the array: #PBS -J 1-X. Change X to the number of input files you have to run.
The runscript works by running X separate jobs within the array. For each job, there is a PBS variable set (PBS_ARRAY_INDEX) which is the jobs number within the array. E.g. for the first job to run, PBS_ARRAY_INDEX = 1.
Change the job name using the -N flag in the script or by the command line option
Save your changes to the runscript and exit
Create a text file with the names of all the input .com files. An easy way to do this is by the command line:

ls *.com > inputFiles.txt

You will notice that the file inputFiles.txt is called in the line in the runscript which sets the variable in. It uses the array job number (PBS_ARRAY_INDEX) as an index to reference the correct line in the text file, so each job will call a different input file.
Submit the array job using the command:

qsub rs_ja

The job runs as a single job on the queue and gets a single job id number (e.g. 1096738), each of the separate jobs within the array job are then given an index (e.g. 1096738[4] for job 4 of the array)
The qstat information for the array job now tells you how many jobs are in the array, how many are queued and how many are finished.

Extra information/troubleshooting

add tmpspace=400 only for large disk jobs to ensure you are put on a node with enough disk!!
Note that this requires you to include maxdisk=400gb in your gaussian input.

NOTE the queuing system does not check disk allocations. When requesting large disk jobs remember to request all of the processors on a node even if you are not using all of the processors. For large jobs the maximum disk space you can request is 800GB on the 12 processor nodes.

Memory needed to run

Gaussian is greedy and will exceed the allocated memory

Each proc needs a gaussian executable, which takes about 8MW (or 12 for MP2 frequencies)

MW is megaword which is the unit gaussian allocates memory

1MW is about 8.4MB

so each proc needs 1*8*8.4 approximately 68MB just to run

so 12 proc jobs require 12*68=816MB just to run

so 16 proc jobs require 16*68=1088MB just to run

so 20 proc jobs require 20*68=1360MB just to run

so 24 proc jobs require 24*68=1632MB just to run

so 40 proc jobs require 40*68=2720MB just to run

so 48 proc jobs require 48*68=3264MB just to run

so when allocating memory inside the gaussian job you must reduce the memory by at least this amount

thus best to reduce the memory by about 100MB*no.processors inside the gaussian script
you also need some overhead within the PBS script

the memory can be given in binary such as 251 GB (binary) is really 251 GB =251000*1,048,576 Bytes =264GB (decimal):newer notes

Larger jobs, a good rule of thumb for >50 atoms or >500 basis functions is 4GB minimum per processor

so 20proc is 80GB minimum

for mp2 frequency and ccsd you should leave enough memory to buffer the large disk files

so only give the gaussian job 50-70% of the total memory

More details for if you seem to be having memory or disk issues

normal jobs

will need 2*N^2 W *8.4 to get B (1,048,576B =1MB)

so 300 basis functions will need 180000W =0.18MW =1.5MB in addition to the above requirements

require 2ON^2 W of disk to run where O=number of occupied orbitals, N=number of basis functions

MP2 jobs

work best with %mem and maxdisk defined

in-core requires N^4/4 divided by 1,000,000 MW memory

so 400 basis functions will need 6400MW=53760MB=54GB memory per node, which is unlikely!

semi-direct requires 2*O(N^2) memory and N^3 disk

so N=476 basis functions O=56 occupied orbitals will need

25.4MW=214MB of memory

and 108MW=906MB disk (this is not actually true it will need much more probably around 1800MB disk per processor!)

so total memory for MP2 freq 8proc will be

12*8*8.4=807MB to run and 8*214=1712MB for calcs and some extra 400MB=3019MB=3.3GB

gaussian does not like GB directive so give %mem in MB

Checkpoint and other files

checkpoint files should be exactly the same name as the input file name

for jobs that may exceed the wall time specify the full path of the checkpoint file, for example

%chk=/work/phunt/tmp/filename.chk

this means the checkpoint file will be written into your personal work directory, it may slow the job down

this is also the reason /work is sometimes very slow on CX1 so only do this as an exception!

Extra links

The Imperial Research Computing service have a hpc wiki which has useful information including an intro to shell scripting, modules and job management information:

https://wiki.imperial.ac.uk/display/HPC/High+Performance+Computing

The RCS also run several courses throughout the year, including intro to Linux, HPC, python and more advanced topics. Upcoming courses can be viewed from:

https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/training/

Next steps

Mount Alias shortcut for logging in Keypair page Once you are comfortable and understand the job submission process then the automatic job script which ... can be used

Other information (may be out of date)

3.1 CPMD:

https://www.ch.ic.ac.uk/wiki/index.php/Image:Runcpmd_md.sh

3.2 DL-POLY:

https://www.ch.ic.ac.uk/wiki/index.php/Image:Mpirun.sh

Note: You´ll not be able to see the output until the job finishes : the directory /tmp/pb.XXX isn´t accessible to you because it is on the private disk of the node running the job.

To get DLPOLY to terminate before the job hits the walltime limit and killed, you need to run it through a program called pbsexec, for example:

pbsexec mpiexec DLPOLY.X

This will kill DLPOLY 15 minutes before the walltime limit, giving your script time to transfer files back to $work.