CP2K Tutorial

The main purpose for this tutorial is to provide a brief introduction on how to perform simulations, especially electrochemistry simulations, with the software package CP2K. This tutorial will contain sections of introducing the basic physics background of CP2K, the general settings for input files for setting up calculations, the interpretation of output files, the methods of processing the output files, making diagrams, convergence test for various systems, and some sample calculations.

CP2K background

CP2K is an implementation of the Quickstep method.

Quickstep

Quickstep (QS) is an implementation of Gaussian plane wave (GPW) method, which is a combination of Gaussian-type function and plane wave. CP2K is developed from QS. Gaussian-type functions and plane waves are used for expanding the Kohn-Sham orbitals. Plane waves are independent from atomic positions and it is possible to use fast Fourier transform method for quick calculation of Hartree potentials. However, the strong variation of wave functions close to the atomic nuclei requires large number of plane waves; thus, it is necessary to use pseudopotentials to alleviate the problem. Also, the homogeneous plane waves are sometimes inefficient for describing low density systems. Gaussian-type functions are atomic centered so there is no need to use pseudopotentials. However, Gaussian-type functions require many multi-centered integrals and consist basis set superposition errors. By combining these two method, it is possible to construct the Kohn-Sham operator matrix with a computational cost scaling linearly for a growing system size. As a result, CP2K is fast and is suitable for computing large systems.

Gaussian and plane wave method

The energy functional for a molecular or crystalline system in the framework of GPW using the Kohn-Sham formulation of density functional theory is defined as

E [n] = E^{T} [n] + E^{V} [n] = E^{H} [n] + E^{X C} [n] + E^{I I}

where E^T[n] is the kinetic energy

E^{T} [n] = \sum_{μ v} P_{μ v} ⟨ ϕ_{μ} (\vec{r}) | - \frac{1}{2} \nabla^{2} | ϕ_{v} (\vec{r}) ⟩

E^v[n] is the electronic interaction with the ionic cores. This interaction is described by norm-conserving pseudopotentials with a potential split in a local part V_{loc}^{PP}(r) and a fully non-local part V_{nl}^{PP}(r, r'):

E^{V} [n] = \sum_{μ v} P_{μ v} ⟨ ϕ_{μ} (\vec{r}) | V_{l o c}^{P P} (r) | ϕ_{v} (\vec{r}) ⟩ + \sum_{μ v} P_{μ v} ⟨ ϕ_{μ} (\vec{r}) | V_{n l}^{P P} (r) | ϕ_{v} (\vec{r}) ⟩

E^{H}[n] is the electronic Hartree (Coulomb) energy:

Failed to parse (syntax error): {\displaystyle E^H[n]=2\pi\Omega\sum_{'''G'''}\frac{\ñ^{\ast}('''G''')ñ('''G''')}{'''G'''^2} ''E^{XC}[n]'' is the exchange-correlation energy: : <math> E^{XC}[n] = \int\bar{n}(\vec{r})\epsilon_{XC}[\bar{n}]d\vec{r} }

and the interaction energies of the ionic cores with charges Z_I and position R_I is denoted by E^{II}:

Failed to parse (syntax error): {\displaystyle E^{II}= \frac{1}{2}\sum_{I≠J}\frac{Z_IZ_J}{|R_I-R_J|} }

The electronic density:

n (\vec{r}) = \sum_{μ v} P) μ v ϕ_{μ} (\vec{r}) ϕ_{v} (\vec{r})

is expanded in a set of contracted Gaussian functions:

ϕ_{μ} (\vec{r}) = \sum_{i} d_{i} μ g_{i} (\vec{r})

where P_{\mu v} is a density matrix element, g_i(\vec{r}) is a primitive Gaussian function, and d_{i\mu} is the corresponding contraction coefficient. The density matrix P fulfills normalization and idempotency condition:

T r (^{'''''} P S^{'''''}) = N

and

P S^{'''''} = (^{'''''} P S^{'''''}) (^{'''''} P S^{'''''})

where S is the overlap matrix of the Gaussian basis functions

S_{μ v} = ⟨ ϕ_{μ} (\vec{r}) | | ϕ_{v} (\vec{r}) ⟩

and N is the number of electrons.

Pseudopotential

Considering that constructing electronic wave functions close to the atomic nuclei using plane waves are costly and the fact that most chemical reactions only involve valence electrons, it is possible to replace the core electrons' wave functions with an approximation. Pseudopotential is such an implementation.

File:Sketch Pseudopotentials.png

Comparison of a wavefunction in the Coulomb potential of the nucleus (blue) to the one in the pseudopotential (red). The real and the pseudo wavefunction and potentials match above a certain cutoff radius

r_{c}

.

The first pseudopotential implementation on CP2K is the Goedecker, Teter, and Hutter (GTH) pseudopotential. The separable dual-space GTH pseudopotentials consist of a local part

V_{l o c}^{P P} (r) = - \frac{Z_{i o n}}{r} e r f (α^{P P} r) + \sum_{i = 1}^{4} C_{i}^{P P} (\sqrt{2} α^{P P} r)^{2 i - 2} e x p [- (α^{P P} r)^{2}]

with

α^{P P} = \frac{1}{\sqrt{2} r_{l o c}^{P P}}

and a non-local part

V_{n l}^{P P} (\vec{r}, \vec{r^{'}}) = \sum_{l m} \sum i j ⟨ \vec{r} | p_{i}^{l m} ⟩ h_{i j}^{l} ⟨ p_{j}^{l m} | \vec{r^{'}} ⟩

with the Gaussian-type projectors

⟨ \vec{r} | p_{i}^{l m} ⟩ = N_{i}^{l} Y^{l m} (\hat{r}) r^{l + 2 i - 2} e x p [- \frac{1}{2} (\frac{r}{r_{l}})^{2}]

For GTH potentials in CP2K, the number of valence electrons is denoted in the form of GTH-XXXX-qN, where N is the number of valence electrons.

Basis sets

Installing CP2K

In general, it is fun but a demanding task for one to install CP2K on a personal PC (if it is by one's own wish, it is recommended to try first with a Unix operating system, such as any computer that runs Linux or a MacBook), since some configurations will be hard on a Windows machine. In general, most of the calculations, other than the example files that are small, are computational resource demanding. Even when performing on cluster, one calculation could take hours to days. As a result, most calculations will be performed not on a personal PC. CP2K is readily installed on most clusters, including CX1 of Imperial College and ARCHER from the National Supercomputing Service; however, their CP2K could be outdated. Thus, it is recommended for one to install the latest version of CP2K on their own home directory.

For installation, gitclone the installation package to your home directory. Before installation, add permission for the installation package using chmod 777 install_cp2k.sh. Then, create a directory and name it as 'SOFTWARE' using mkdir SOFTWARE to create a new directory under your home directory. Copy the installation command to the newly created directory using cp install_cp2l.sh SOFTWARE. Then, run the command ./install_cp2k.sh. After this, a new directory named 'cp2k' should appear. Check the contents of the directory to see if everything is correctly installed.

Input files

There are three basic files for setting up a calculation: an input file to specify all parameters that are associated with the calculation, a coordinate file of the structure that is subjected to simulation, and a shell script to run the calculation on any cluster. An input file contains several sections and each section contains several subsections and few keywords. The complete section and keyword list can be found in the CP2K manual, which is also a good tool for learning and understanding the meaning of each specific keyword. This section will be divided into several subsections corresponding to each of the important keywords and what to expect. This section is not designed for introducing the meaning of each keyword, since that can be readily found under the CP2K manual. The purpose of this section is to show which keywords are expected to be changed during the simulation and some important features that are associated with some specific clusters.

GLOBAL

GLOBAL is the section of specifying the overall type of calculation RUN_TYPE, the amount of time allowed for the calculation WALLTIME, and some other specifications. In principal, there is not much to change within this section; however, one need to pay attention to the time set under WALLTIME, since this is not the only 'WALLTIME' one will encounter through the setup process. There is another 'WALLTIME' under the shell script that is used for submitting the calculation. It is necessary to set the 'WALLTIME' in the input file smaller than the one used in the shell script since the shell script is directly talking to the cluster for how long the machine is asked for running the code. Once the 'WALLTIME' specified in the shell script is reached, the cluster will kill the process and any unfinished process won't be saved. However, if the 'WALLTIME' within the input is less than that in the shell script, the program will save everything before it closes the program. It is helpful if one calculation requires to be submitted several times (when it is long).

FORCE EVALUATION

This section in the input is for specifying parameters needed to calculate energy and forces and describe the system one want to analyze.

In the DFT subsection, one needs to specify the locations of the basis set file and the pseudopotential file (after the installation of CP2K, they can be found under 'cp2k/data'). It is quite important to choose the desire basis sets for the calculation. In principal, higher level basis sets provides a better accuracy; however, they are also very costly.^{Citation Needed} For example, the energy calculation for a single water molecule using different basis set shows that ... .

CP2K performs self-consistent field (SCF) calculation. SCF calculation in CP2K is divided into two loops: the outer SCF loop and the inner SCF loop. The per-conditioner for orbital transformation is used only apply at the first SCF calculation at the start of each outer SCF loop. Applying pre-conditioner could be quite costly. The amount of time required for applying pre-conditioner could be 2 to 3 times more than the time needed for a regular SCF calculation. As a result, one may converge the number of inner SCF calculations to have the optimized computational efficiency. EPS_SCF sets the threshold of convergence. Typically, for molecular dynamics (MD) calculation, EPS_SCF is set for 1E-06 and for static energy calculation, EPS_SCF is set for 1E-07. Please be aware that for CP2K to converge from 1E-06 to 1E-07 could be quite difficult and requires a very good preconditioner to be applied. SCF calculation could start from ATOMIC or RESTART. RESTART calculation uses the previous saved wavefunction to apply the preconditioner. This is useful when one wishes to perform hybrid functional calculation with CP2K. This will be mentioned later in the tutorial.

Preconditioner

Orbital transformation

Multigrid

Hybrid functional

Traditional density functional calculation (DFT) is known for suffering for electron delocalization problem, which result in an underestimation of HOMO-LUMO band gap and unreliable fractional charge. In principle, if we know the exact exchange-correlation energy, we can know the exact ground-state energy of a system. However, the exact exchange-correlation energy is not obtained so far. As a result, we need to use a series of approximations for the exchange-correlation energy term. As mentioned, traditional DFT underestimates the fractional charges; however, exact Hartree-Fock energy always overestimate the fractional charges. As a result, a combination of traditional DFT (GGA level calculation) with exact Hartree-Fock could in principle result in an proper description of the total energy. This results in the hybrid functional.

One of the earliest and most commonly used hybrid functional is the PBE0 functional, which takes a mixing (1/4) of the Hartree-Fock exchange energy and exchange energy from PBE,

E_{x c}^{P B E 0} = a E_{x}^{H F} + (1 - a) E_{x}^{P B E} + E_{c}^{P B E}

where the mixing coefficient a=1/4 is determined by perturbation theory. Although PBE0 has proved to be good, it is still inefficient in metallic systems since the convergence of HF exchange-correlation energy with distance in metallic system is very slow and full convergence of HF calculation is extremely hard to achieve. As a result, hybrid functionals that use a truncated short-range potential come to rescue. In our project, we use HSE (Heyd, Scuseria, Ernzerhof) functional for calculating metal-water interfaces.

The HSE functional uses a screened Coulomb potential to the exchange interaction to screen the hard-to-converge long-range HF exchange term. The Coulomb operator is split into short-range (SR) and long-rage (LR) components:

\frac{1}{r} = \frac{e r f c (ω r)}{r} + \frac{e r f (ω r)}{r}

where the first part is the SR and the second part is the LR, erfc(/omega r)=1-erf(\omega r), /omega is an adjustable parameter. The choice of error function is somewhat arbitrary since it can be integrate analytically when using Gaussian basis functions. As a result, the exchange-correlation energy of HSE takes the following form:

E_{x c}^{H S E} = a E_{x}^{H F, S R} (ω) + (1 - a) E_{x}^{P B E, S R} (ω) + E_{x}^{P B E, L R} (ω) + E_{c}^{P B E}

In HSE functional is equivalent to PBE0 for ω=0 and asymptotically reaches PBE for ω approaches infinity. The computational effort necessary to calculate the SR HF exchange decreases drastically with increasing ω. However, in order to reproduce reliable values for the band gap in semiconducting solid, it is necessary to choose ω value smaller than 0.15. This should be taken special attention.

Submitting Calculations

Sample convergence

Outputs and postprocessing

PDOS

There is a set of already-written algorithm for analyzing the projected density of states (PDOS) from CP2K calculation. The required files are get_smearing_pdos.py, smear_sort.py and a sample PDOS file. These files can be pulled from the Github repository. Make sure you have at least / python 2.0 installed on your PC.

Optimization of parameters: convergence tests