This project contains step-by-step instructions of how to calculate ion pair dimer structures, and how to analyse their similarity. The instructions are divided in two parts:

optimising some initial guess structures
calculating the RMSD as a measure of similarity.

The required tasks are spread amongst many little scripts, which might be useful by themselves or even just serve as an inspiration or 'skeleton' for your own modifications. These are the languages used, together with an example:

Fortran code needs to be compiled before use, e.g. with gfortran center_of_mass_withinput.f90 -o center.out, and you can then directly invoke the produced executable. In this case, an input filename is required: ./center.out example.xyz
Python scripts are usually interpreted python remove_lowfreq.py. If you haven't already done so, you should set up your own python environment to play around with for this project.
Bash is a frowned upon, but nevertheless useful language - especially to handle files, e.g. to them or perform simple file operations. Bash scripts are invoked as follows: bash ./add_header.sh
Please use a PBS script to submit tedious jobs to HPC using qsub gau_general.pbs - otherwise you'll block the login nodes.

All the scripts and files described in this page can be found in the group's gitlab repository [1].

Part 1: Optimising the structures

You need a number of 'initial guess' structures, for example ion pairs in random orientation to each other. I used 512 close contact ion pairs I dumped from a classical MD simulation trajectory using the prealpha tool.
Once you have these initial guess structures, you need to put them in a common working directory, ideally a temporary location. They need to be in xyz format, but without the 'number of atoms' and 'comment' lines, i.e. just cartesian coordinates.
Prepare the input for gaussian. All you need to change are the HEADER.gjf and FOOTER.gjf files. In the example, the first two lines of the Header file instructs gaussian to use 32 CPUs and 50 gb of memory (always specify less than you have available, since gaussian sometimes uses a bit more). The important part is the 'route section', starting with a '#'. This line specifies an optimisation at the B3LYP-GD3BJ/6-311+g(d,p) level of theory with tightened convergence criteria and integration grid. Importantly, an SMD solvent model is used here.
The solvent model is further specified in the footer file. the parameters given in the example are those for [BMIM][NTf2], which is exactly the system studied here (J. Phys. Chem. B 2012, 116, 9122–9129).
Header, Footer and xyz files are now combined to gaussian input files using the add_header.sh script, by typing 'bash add_header.sh' in the command line. Note that this script also changes the 'XXXX' placeholders to the names of the xyz files. Important: if you are using your own system, you need to change the number of atoms ('40' in my case).
It is now time to submit the gaussian calculation to the HPC. This can be done using the pbs script for an array job, in the example for 10 files. Conveniently, add_header.sh also writes a 'list' file with all the input files, so the number of array elements N in the job (specified in the line #PBS -J 1-N) must match the number of lines in your 'list' file.
Sometimes things go wrong, and we now need to find the successful calculation that we would like to keep. Make a new directory, such as 'xyz_initial' for the first round, and move all xyz files in there (e.g. with <nowiwki>mv *.xyz ./xyz_initial/</nowiki>).
Make sure there are no xyz files in your working directory, and also make a new directory called 'successful'.
Type <nowiwki>bash new_dimer_round.sh</nowiwki>. The bash script 'new_dimer_round.sh' identifies those calculations with a normal termination, and copies the corresponding structures (.xyz) and log files (.log) into separate directories. You will notice that line 54 in 'new_dimer_round.sh' is commented out. You can use this line to subtract the center of mass of your xyz files, for example using the center of mass Fortran script. Note that you will need to compile first, e.g. with <nowiwki>gfortran center_of_mass_withinput.f90 -o center </nowiki>
Now you can copy the log and xyz files to a permanent directory, if you like.
So far, we have a collection of stationary points, but some of those structures may have imaginary frequencies, which you might want to remove. To this end, first collect the lowest frequencies: <nowiwki>grep "Frequencies --" dimer_around_type-index_1-*.log -m1 | sed '{s/.log: Frequencies --//g}' > ../FREQUENCIES </nowiki>
Then remove the lowest ones with <nowiwki>python remove_lowfreq.py</nowiki>. You need to create the target directories, or rename them in remove_lowfreq.py

Part 2: Identifying similar structures

The instructions from part 1 gave me 389 dimer structures. First, read out the energies with the get_energies.sh bash script. You will need to change '389' in the example to the number of dimers in your case.
If your xyz files are still in the gaussian format (which they will be if you use newzmat), then you need to add the first two lines with make_xyz_readable.sh.
Now we would like to calculate the RMSD for each of our possible combinations to see which ones are similar. We will use an external python script, which is invoked with the PBS script python_array_parallel.pbs. There is a separate section explaining how to set up the python environment.
Now, we can get a matrix of all combinations, their energy difference, and their RMSD using the combine_RMSD_DELTAE.f90 Fortran script (if necessary, amend ndimers and filename in the fortran script).
The fortran script also writes a file double.dat based on those pairs that have similar energy and RMSD. You can print the matrix for combinations, energy difference, and RMSD without the double ones with the unique.py python script.
You can also get a file with just the unique energies using the unique_energy.py python script. Useful to see the range of interaction.

Preparation: Installing the RMSD python script

We want to use the external RMSD tool. It can be set up relatively easily in a new conda environment following the following steps:

conda create --name RMSD
# type 'y' for 'proceed'
source activate RMSD
pip install rmsd

Then, you can just test it with the two example structures: (don't forget to add the first two lines - atom number and comment)

~/rmsd/calculate_rmsd --no-hydrogen --use-reflections --reorder example.xyz example2.xyz

... which should give you an RMSD of 1.893, irrespective what xyz file you specify first.