Jump to content

Building ML potentials with AML

From ChemWiki

This page includes a short tutorial on how to install the AML package on Imperial's cluster and set up all the necessary tools to run it.

n2p2 is the main software used by the AML package to build the neural network potentials, which has to be downloaded and compiled correctly first. However, the compilation requires quite a few tools, and it can be cumbersome to download and install them individually. To make this task easier, we can install CP2K first which comes with a lot of tools that can be used by n2p2, so we don't have to install them manually.

Installation of CP2K version 8.1

Installation of OpenMPI

The first step is to install OpenMPI. Here is a fantastic tutorial on how to do that. After step 7 in that tutorial, run this command together with the other 2 as well.

echo "OMPI_HOME=$HOME/opt/openmpi/" >> ~/.bashrc

This should set up OpenMPI correctly. To use OpenMPI (or, to let the system know that you have OpenMPI), run this every time when a new terminal session starts.

source ~/.bashrc

Installation of CP2K

Now we can install CP2K. The first step is to clone the GitHub repository.

git clone -b support/v8.1 --recursive https://github.com/cp2k/cp2k.git cp2k

Now we have already cloned it into a directory called cp2k, we need to go to that directory and run the following command

cd tools/toolchain/
./install_cp2k_toolchain.sh --no-check-certificate --with-elpa=no

After these commands there will be instructions (and/or ERRORs) at the end. Follow the instructions, if the above command is successful the instruction will ask you to do something like this

source install/setup
cp /install/arch/* ~/cp2k/arch/
cd ~/cp2k
make ARCH=local VERSION="ssmp sdbg psmp pdbg" &> make.log

This whole process can take hours so be patient. With CP2K installed and compiled, we can use the tools included to build n2p2.

Installation of n2p2

This documentation of n2p2 gives some instruction on how to build the software. However, to actually build the software, we will need some of the packages we installed with CP2K. We do this by including the tools in the ~/.bashrc file using the same 'echo' command as before.

For example, when compiling we need the header file gsl/gsh_rng.h, which is located in the CP2K toolchain directory. The path to the folder containing gsl/gsl_rng.h on my computer is

/rds/general/user/yl4619/home/cp2k-8.1/tools/toolchain/install/gsl-2.6/include

So I need to add this to the CPATH variable by doing

echo "export CPATH=$CPATH:/rds/general/user/yl4619/home/cp2k-8.1/tools/toolchain/install/gsl-2.6/include" >> $HOME/.bashrc

We will need to link several other libraries/binaries/header files too, so just try to compile and read the error message which will tell you which package you are lacking, and search for that in the CP2K toolchain directory and link it using the method above. There will be several types of files that need to linked: use $PATH to link bin folders, $LIBRARY_PATH to link lib folders, and $CPATH to link include folders.

One special one is the OpenBLAS package. Some of the components of n2p2 requires BLAS when compiling and cannot recognise OpenBLAS as a variant of BLAS. So for this we need to manually change the makefile file in n2p2/src/application: change $(PROJECT_LDFLAGS_BLAS) in the file to -lopenblas -lgslcblas, which should solve the problem.

All components in the n2p2 package should compile successfully after this. For the compilation of both CP2K and n2p2, the compiled applications might not be usable on specific instruction sets, so make sure when you are submitting jobs you submit to the right queue that uses compatible instruction sets. For example, I compiled on the login node of the cluster which uses Rome, so when submit jobs I specify the CPU type with

#PBS cpu_type=rome

Installation of AML

The AML package can be smoothly installed following the instructions on its GitHub repository

Running the AML package

Follow the example given in the AML package.