Introduction to Molecular Dynamics Simulation

Numerical Integration

TASK: Open the file HO.xls. In it, the velocity-Verlet algorithm is used to model the behaviour of a classical harmonic oscillator. Complete the three columns "ANALYTICAL", "ERROR", and "ENERGY": "ANALYTICAL" should contain the value of the classical solution for the position at time $t$ , "ERROR" should contain the absolute difference between "ANALYTICAL" and the velocity-Verlet solution (i.e. ERROR should always be positive -- make sure you leave the half step rows blank!), and "ENERGY" should contain the total energy of the oscillator for the velocity-Verlet solution. Remember that the position of a classical harmonic oscillator is given by $x (t) = A \cos (ω t + ϕ)$ (the values of $A$ , $ω$ , and $ϕ$ are worked out for you in the sheet).

TASK: For the default timestep value, 0.1, estimate the positions of the maxima in the ERROR column as a function of time. Make a plot showing these values as a function of time, and fit an appropriate function to the data.

Exact analytical solutions for the atomic position can only be found in simple systems such as the classic harmonic oscillator. For many-body systems with chaotic dynamics, numerical integration methods such as the velocity-Verlet algorithm are necessary, and only an approximation can be made for the atomic position at time $t$ .

On the chart below, both the classic harmonic oscillator solution and the velocity-Verlet algorithm are shown, with the classical result, $x (t) = A \cos (ω t + ϕ)$ using the values of $A = 1$ , $ω = 1$ and $ϕ = 0$ .

**Displacement as a function of time, "ANALYTICAL" being the classical solution and $x (t)$ the velocity-Verlet algorithm**

The total energy for the velocity-Verlet solution is a sum of the potential and kinetic energy components, $E = U + K$ , where $U = \frac{1}{2} k x^{2}$ and $V = \frac{1}{2} m v^{2}$ , and values of $k = m = 1$ are used. (Remember that $x (t)$ and $v (t)$ are obtained from the velocity-Verlet algorithm.)

The absolute difference between the classic harmonic oscillator solution and the velocity-Verlet Algorithm for the atomic position is given below, with an additional line plotted through the estimated error maxima.

**Absolute error between the classical solution and the velocity-Verlet Solution**

TASK: Experiment with different values of the timestep. What sort of a timestep do you need to use to ensure that the total energy does not change by more than 1% over the course of your "simulation"? Why do you think it is important to monitor the total energy of a physical system when modelling its behaviour numerically?

As the timestep is increased from 0.1, the amplitude and frequency of the total energy for the velocity-Verlet solution is increased (greater and more frequent fluctuations), as the $x (t)$ and $v (t)$ approximations obtained from the numerical-integration are now weaker as $\partial t$ is greater. The total energy does not fluctuate by more than 1% (0.005) when a timestep of 0.2 is used - the plot of the energy using this timestep is given below. When modelling a physical system numerically as done here, it is important to monitor the total energy throughout the simulation to check if it stays constant, as the simulation should obey the law of conservation of energy, that energy cannot be created or destroyed - only transformed.

**Energy as a function of time using the timestep 0.2, where the energy does not fluctuate by more than 1%**

Atomic Forces

TASK: For a single Lennard-Jones interaction, $ϕ (r) = 4 ϵ (\frac{σ^{12}}{r^{12}} - \frac{σ^{6}}{r^{6}})$ , find the separation, $r_{0}$ , at which the potential energy is zero. What is the force at this separation? Find the equilibrium separation, $r_{e q}$ , and work out the well depth ( $ϕ (r_{e q})$ ). Evaluate the integrals $\int_{2 σ}^{\infty} ϕ (r) d r$ , $\int_{2.5 σ}^{\infty} ϕ (r) d r$ , and $\int_{3 σ}^{\infty} ϕ (r) d r$ when $σ = ϵ = 1.0$ .

The separation, $r_{0}$ , at which the potential energy is zero:	The force at this separation $r_{0}$ :	The equilibrium separation, $r_{e q}$ :	The well depth ( $ϕ (r_{e q})$ ):
$ϕ (r_{0}) = 0 = 4 ϵ (\frac{σ^{12}}{r_{0}^{12}} - \frac{σ^{6}}{r_{0}^{6}})$ $0 = \frac{σ^{12}}{r_{0}^{12}} - \frac{σ^{6}}{r_{0}^{6}}$ $0 = σ^{6} - r_{0}^{6}$ $r_{0}^{6} = σ^{6}$ $r_{0} = σ$	$F_{0} = - \frac{d ϕ (r^{N})}{d r_{0}}$ Remembering $ϕ (r_{0}) = 4 ϵ (\frac{σ^{12}}{r_{0}^{12}} - \frac{σ^{6}}{r_{0}^{6}})$ $F_{0} = - 4 ϵ (- 12 \frac{σ^{12}}{r_{0}^{13}} + 6 \frac{σ^{6}}{r_{0}^{7}})$ $F_{0} = 48 ϵ \frac{σ^{12}}{r_{0}^{13}} - 24 \frac{σ^{6}}{r_{0}^{7}}$ As $r_{0} = σ$ $F_{0} = \frac{48 ϵ}{σ} - \frac{24 ϵ}{σ}$ $F_{0} = \frac{24 ϵ}{σ}$	A system is in equilibrium when the net force acting on it is 0. $- \frac{d ϕ (r^{N})}{d r_{e q}} = 0$ $48 ϵ \frac{σ^{12}}{r_{e q}^{13}} - 24 \frac{σ^{6}}{r_{e q}^{7}} = 0$ $48 ϵ \frac{σ^{12}}{r_{e q}^{13}} = 24 \frac{σ^{6}}{r_{e q}^{7}}$ $2 σ^{6} = r_{e q}^{6}$ $r_{e q} = 2^{1 / 6} σ$	$ϕ (r_{e q}) = 4 ϵ (\frac{σ^{12}}{r_{e q}^{12}} - \frac{σ^{6}}{r_{e q}^{6}})$ As $r_{e q} = 2^{1 / 6} σ$ : $ϕ (r_{e q}) = 4 ϵ (\frac{σ^{12}}{{(2^{1 / 6} σ)}^{12}} - \frac{σ^{6}}{{(2^{1 / 6} σ)}^{6}})$ $ϕ (r_{e q}) = 4 ϵ (\frac{1}{4} - \frac{1}{2})$ $ϕ (r_{e q}) = 4 ϵ (- \frac{1}{4})$ $ϕ (r_{e q}) = - ϵ$

Evaluate $\int_{2 σ}^{\infty} ϕ (r) d r$ when $σ = ϵ = 1.0$	Evaluate $\int_{2.5 σ}^{\infty} ϕ (r) d r$ when $σ = ϵ = 1.0$	Evaluate $\int_{3 σ}^{\infty} ϕ (r) d r$ when $σ = ϵ = 1.0$
$\int_{2 σ}^{\infty} 4 ϵ (\frac{σ^{12}}{r^{12}} - \frac{σ^{6}}{r^{6}}) d r$ $= 4 ϵ {[- \frac{σ^{12}}{11 r^{11}} + \frac{σ^{6}}{5 r^{5}}]}_{2 σ}^{\infty}$ $= 4 ϵ (0) - 4 ϵ (- \frac{σ^{12}}{11 {(2 σ)}^{11}} + \frac{σ^{6}}{5 {(2 σ)}^{5}})$ $= 0 - 4 (- \frac{1}{11 {(2)}^{11}} + \frac{1}{5 {(2)}^{5}})$ $= - 0.0248$	$\int_{2.5 σ}^{\infty} 4 ϵ (\frac{σ^{12}}{r^{12}} - \frac{σ^{6}}{r^{6}}) d r$ $= 4 ϵ {[- \frac{σ^{12}}{11 r^{11}} + \frac{σ^{6}}{5 r^{5}}]}_{2.5 σ}^{\infty}$ $= 4 ϵ (0) - 4 ϵ (- \frac{σ^{12}}{11 {(2.5 σ)}^{11}} + \frac{σ^{6}}{5 {(2.5 σ)}^{5}})$ $= 0 - 4 (- \frac{1}{11 {(2.5)}^{11}} + \frac{1}{5 {(2.5)}^{5}})$ $= - 0.00818$	$\int_{3 σ}^{\infty} 4 ϵ (\frac{σ^{12}}{r^{12}} - \frac{σ^{6}}{r^{6}}) d r$ $= 4 ϵ {[- \frac{σ^{12}}{11 r^{11}} + \frac{σ^{6}}{5 r^{5}}]}_{3 σ}^{\infty}$ $= 4 ϵ (0) - 4 ϵ (- \frac{σ^{12}}{11 {(3 σ)}^{11}} + \frac{σ^{6}}{5 {(3 σ)}^{5}})$ $= 0 - 4 (- \frac{1}{11 {(3)}^{11}} + \frac{1}{5 {(3)}^{5}})$ $= - 0.00329$

TASK: Estimate the number of water molecules in 1ml of water under standard conditions. Estimate the volume of $10000$ water molecules under standard conditions.

Estimation of the number of water molecules in 1 mL under standard conditions	Estimation of the volume of 10000 water molecules under standard conditions
$ρ_{H_{2} O} = 0.9982 g m L^{- 1}$ Mass of $H_{2} O$ in 1 mL = $ρ \times V = 1 g m L^{- 1} \times 1 m L = 0.9982 g$ Molar mass of $H_{2} O$ = $18.0153 g m o l^{- 1}$ Number of moles of $H_{2} O$ in 1mL = $\frac{m}{M_{r}}$ = $\frac{0.9982 g}{18.0153 g m o l^{- 1}} = 0.05540845837 m o l$ Number of $H_{2} O$ molecules in 1mL = $n \times N_{A}$ = $0.05540845837 m o l \times 6.022 \times 1 0^{23} m o l^{- 1} = 3.3 \times 1 0^{22} m o l e c u l e s$	Number of $H_{2} O$ molecules in moles = $\frac{N}{N_{A}}$ = $\frac{10000}{6.022 \times 1 0^{23} m o l^{- 1}}$ = $1.660577881 \times 1 0^{- 20} m o l$ Mass of 10000 $H_{2} O$ molecules = $n \times M_{r}$ = $1.660577881 \times 1 0^{- 20} m o l \times 18.0153 g m o l^{- 1}$ = $2.99158087 \times 1 0^{- 19} g$ Volume of 10000 $H_{2} O$ molecules = $\frac{m}{ρ}$ = $\frac{2.99158087 \times 1 0^{- 19} g}{0.9982 g m L^{- 1}}$ = $3.00 \times 1 0^{- 19} m L$

TASK: Consider an atom at position $(0.5, 0.5, 0.5)$ in a cubic simulation box which runs from $(0, 0, 0)$ to $(1, 1, 1)$ . In a single timestep, it moves along the vector $(0.7, 0.6, 0.2)$ . At what point does it end up, after the periodic boundary conditions have been applied?.

The new position of the atom in a cubic simulation box that runs from $(0, 0, 0)$ to $(1, 1, 1)$ is:

 $(0.5, 0.5, 0.5)$  +  $(0.7, 0.6, 0.2)$  =  $(0.2, 0.1, 0.7)$  after the periodic boundary conditions have been applied.

TASK: The Lennard-Jones parameters for argon are $σ = 0.34 n m, ϵ / k_{B} = 120 K$ . If the LJ cutoff is $r^{*} = 3.2$ , what is it in real units? What is the well depth in ${k J m o l}^{- 1}$ ? What is the reduced temperature $T^{*} = 1.5$ in real units?

*If the LJ cutoff is $r^{} = 3.2$ , what is it in real units?**	What is the well depth in ${k J m o l}^{- 1}$ ?	*What is the reduced temperature $T^{} = 1.5$ in real units?**
$r^{} = \frac{r}{σ}$ $r = r^{} \times σ = 3.2 \times 0.34 \times 1 0^{- 9} = 1.088 \times 1 0^{- 9} m = 1.088 n m$	$\frac{ϵ}{k_{B}} = 120 K$ $ϵ = k_{B} \times 120 K = 1.381 \times 1 0^{- 23} J K^{- 1} \times 120 K = 1.6572 \times 1 0^{- 21} J$ ${k J m o l}^{- 1}$ : $ϵ = \frac{1.6572 \times 1 0^{- 21} J \times 6.022 \times 1 0^{23} m o l^{- 1}}{1000}$ = $0.998 {k J m o l}^{- 1}$	$T^{*} = \frac{k_{B} T}{ϵ}$ $T = \frac{1.5 \times ϵ}{k_{B}} = \frac{1.5 \times 1.6572 \times 1 0^{- 21} J}{1.381 \times 1 0^{- 23} J K^{- 1}} = 180 K$

Equilibration

TASK: Why do you think giving atoms random starting coordinates causes problems in simulations? Hint: what happens if two atoms happen to be generated close together?

If atoms are given random starting coordinates, it is possible the atoms could be in the repulsive regime of the Lennard-Jones Potential, as well as virtually on top of each other(superimposed), which is physically impossible. The small values of the separation r that could be randomly generated also give rise to unrealistic potential energies $ϕ_{r}$ .

TASK: Satisfy yourself that this lattice spacing corresponds to a number density of lattice points of $0.8$ . Consider instead a face-centred cubic lattice with a lattice point number density of 1.2. What is the side length of the cubic unit cell?

Proof that a lattice spacing of 1.07722 in a simple cubic unit cell corresponds to a lattice point number density of 0.8	What is the side length of a face centred cubic unit cell when the lattice point number density is 1.2
Volume occupied by 1 simple cubic unit cell = $1.077 2^{3}$ = $1.24993962$ In a simple cubic lattice, there is 1 lattice point per unit cell. Therefore 1 lattice point occupies a volume of 1.24993962. Lattice point number density = (Unit Volume / Volume occupied by 1 lattice point) $\times$ No' of lattice points in unit cell = $\frac{1}{1.249939632} \times 1$ = $0.8$	Lattice point number density = (Unit Volume / Volume occupied by 1 lattice point) $\times$ No' of lattice points in unit cell There are 4 lattice points in a FCC unit cell. $1.2 = 4 \times \frac{1}{V}$ $V = \frac{4}{1.2}$ Side length of unit cell = ${(\frac{4}{1.2})}^{1 / 3} = 1.49380$

TASK: Consider again the face-centred cubic lattice from the previous task. How many atoms would be created by the create_atoms command if you had defined that lattice instead?

If the initial lattice commands in the input file had lattice fcc written instead of lattice sc 4000 atoms would be created, as there are 4 lattice points per FCC unit cell, and the simulation has been instructed in the succeeding script to run as a box made from (10x10x10) unit cells.

region box block 0 10 0 10 0 10
create_box 1 box

The output log file from the simulation will contain the script:

Created 4000 atoms

TASK: Using the LAMMPS manual, find the purpose of the following commands in the input script:

mass 1 1.0
pair_style lj/cut 3.0
pair_coeff * * 1.0 1.0

mass 1 1.0	pair_style lj/cut 3.0	*pair_coeff * 1.0 1.0**
This command defines the mass for all atoms, of all types, that will be used in the simulation. In this particular instance, the command instructs that all atoms of type 1 in this simulation, will have a mass of 1.0; the first number defining the atom type, and the second the mass in reduced units.	This command tells LAMMPS how to compute the interaction energies in the simulation by defining the type of interactions between pairs of atoms, and the distance in which these types of interactions can exist in (the cut-off distance). All other possible pairwise interactions that are not mentioned in this command are ignored by LAMMPS, as well as 'long range' interactions of the specified type (interactions that exceed the cut off distance), making the computation easier. In the command written here, it is instructed that in this simulation there will be a Lennard-Jones potential between pairs of atoms that are within a distance of 3.0 of each other.	The pairwise force field coefficient is another predefined property required by LAMMPS to compute the energy in the simulation. The number and meaning of the coefficients depends on the pair style. The script pair_coeff * 1.0 1.0* means that the pairwise forcefield coefficient between all types of atoms is 1.0.

TASK: Given that we are specifying $x_{i} (0)$ and $v_{i} (0)$ , which integration algorithm are we going to use?

As we are defining the initial position and velocity, we are using the velocity-Verlet algorithm for this integration.

TASK: Look at the lines below.

### SPECIFY TIMESTEP ###
variable timestep equal 0.001
variable n_steps equal floor(100/${timestep})
variable n_steps equal floor(100/0.001)
timestep ${timestep}
timestep 0.001

### RUN SIMULATION ###
run ${n_steps}
run 100000

The second line (starting "variable timestep...") tells LAMMPS that if it encounters the text ${timestep} on a subsequent line, it should replace it by the value given. In this case, the value ${timestep} is always replaced by 0.001. In light of this, what do you think the purpose of these lines is? Why not just write:

timestep 0.001
run 100000

In the second script, the script writer will have to calculate the required number of simulation runs by his or herself, and input this value every time the timestep is changed, in order to keep the total simulation time constant. The first script is the more useful script to use, because it allows the script writer to leave the mathematics to LAMMPS, which automatically adjusts the number of simulation runs when the timestep is changed through the variable n_steps, to keep the total simulation time constant.

TASK: make plots of the energy, temperature, and pressure, against time for the 0.001 timestep experiment (attach a picture to your report). Does the simulation reach equilibrium? How long does this take? When you have done this, make a single plot which shows the energy versus time for all of the timesteps (again, attach a picture to your report). Choosing a timestep is a balancing act: the shorter the timestep, the more accurately the results of your simulation will reflect the physical reality; short timesteps, however, mean that the same number of simulation steps cover a shorter amount of actual time, and this is very unhelpful if the process you want to study requires observation over a long time. Of the five timesteps that you used, which is the largest to give acceptable results? Which one of the five is a particularly bad choice? Why?

**Plots of Energy, Temperature and Pressure vs Time respectively using a Timestep of 0.001**

The plots above do show that equilibrium has been reached, as the energy minimizes and reaches a value around which it fluctuates and at the same time, temperature and pressure maximise and reach a value around which they fluctuate. The time at which this occurs, (approximately 0.4) is the point of equilibration. The synchronized timing of energy and temperature reaching a fluctuating value in particular is an indicator of equilibration, as it implies fulfillment of the Clausius inequality for the Helmholtz free energy, dA=0.

The timestep that minimizes the energy to the lowest value within the set simulation time (100) is the most accurate, as the lower the energy value in the simulation, the closer the simulation is to physical reality. Therefore, it can be seen from the plots above that simulation accuracy increases with timestep shortening. However, with shorter timesteps, computing the simulation takes longer, therefore a compromise is logical. In the graph above, as the plot for the 0.0025 timestep is almost superimposed on the 0.001 timestep plot (the shortest timestep and lowest energy output), it is feasible to say that 0.0025 is the largest timestep to give acceptable results. 0.015 is a particularly bad choice as the energy output from this timestep never actually reaches a fluctuating value within the set simulation time, and most importantly is continuously increasing not decreasing.

Running Simulations under Specific Conditions

TASK: Choose 5 temperatures (above the critical temperature $T^{*} = 1.5$ ), and two pressures (you can get a good idea of what a reasonable pressure is in Lennard-Jones units by looking at the average pressure of your simulations from the last section). This gives ten phase points — five temperatures at each pressure. Create 10 copies of npt.in, and modify each to run a simulation at one of your chosen $(p, T)$ points. You should be able to use the results of the previous section to choose a timestep. Submit these ten jobs to the HPC portal. While you wait for them to finish, you should read the next section.

Temperatures of 1.75, 2.0, 2.25, 2.5 and 2.75 and pressures of 2.6 and 2.8 were chosen. The pressures chosen were based around the average value for equilibration found in the previous section of this report (approximately 2.6). The timestep chosen for all these simulations was 0.0025, based also on the findings in the previous section of this report, of looking for a balance between accuracy of results and time taken for computation when choosing a timestep.

TASK: We need to choose $γ$ so that the temperature is correct $T = 𝔗$ if we multiply every velocity $γ$ . We can write two equations:

$\frac{1}{2} \sum_{i} m_{i} v_{i}^{2} = \frac{3}{2} N k_{B} T$

$\frac{1}{2} \sum_{i} m_{i} {(γ v_{i})}^{2} = \frac{3}{2} N k_{B} 𝔗$

Solve these to determine $γ$ .

$\frac{1}{2} \sum_{i} m_{i} v_{i}^{2} = \frac{3}{2} N k_{B} T$

$\sum_{i} m_{i} v_{i}^{2} = 3 N k_{B} T$

$\frac{1}{2} \sum_{i} m_{i} {(γ v_{i})}^{2} = \frac{3}{2} N k_{B} 𝔗$

$(\frac{γ^{2}}{2}) \sum_{i} m_{i} v_{i}^{2} = \frac{3}{2} N k_{B} 𝔗$

$(\frac{γ^{2}}{2}) 3 N k_{B} T = \frac{3}{2} N k_{B} 𝔗$

$γ^{2} T = 𝔗$

$γ = \pm \sqrt{\frac{𝔗}{T}}$

TASK: Use the manual page to find out the importance of the three numbers 100 1000 100000. How often will values of the temperature, etc., be sampled for the average? How many measurements contribute to the average? Looking to the following line, how much time will you simulate?

The three numbers 100 1000 100000 represent the three arguments Nevery Nrepeat Nfreq which specify on what timesteps the input values of the simulation will be used to used to contribute to the final average.

100 i.e. Nevery

This number specifies the jump between timesteps used to compute the time average.

1000 i.e. Nrepeat

This number specifies the number of timesteps used to compute the time average.

100000 i.e. Nfreq

This number defines how often time averages will be taken.

Therefore every 100 timesteps, values of properties such as temperature and pressure will be sampled, and a 1000 measurements will contribute to the average. As the average will be calculated after 100000 timesteps, and the timestep being used is 0.0025, the time the simulation will run for is 250.

TASK: When your simulations have finished, download the log files as before. At the end of the log file, LAMMPS will output the values and errors for the pressure, temperature, and density $(\frac{N}{V})$ . Use software of your choice to plot the density as a function of temperature for both of the pressures that you simulated. Your graph(s) should include error bars in both the x and y directions. You should also include a line corresponding to the density predicted by the ideal gas law at that pressure. Is your simulated density lower or higher? Justify this. Does the discrepancy increase or decrease with pressure?

**Simulation Density Output vs Ideal Gas Law Density Output**

The simulated density is lower than the density generated from the Ideal Gas Law, as the Ideal Gas Law ignores any interactions between atoms. Therefore as there are no repulsive forces according to the Ideal Gas Law, the atoms can get in very close proximity to each other, and hence the density predicted by it is considerably higher than that of real physical systems. The LAMMPS simulation density is lower and closer to reality because it includes Lennard-Jones pair potentials between atoms within a certain proximity.

The graph above also shows that the discrepancy slightly increases with pressure. This could be due to repulsive potentials being greater in the simulation, when the atoms are pushed closer together due to a higher pressure; this in turn causes further deviations from the Ideal Gas Law, which predicts a linear increase in density with an increase in pressure.

Calculating Heat Capacities using Statistical Physics

TASK: As in the last section, you need to run simulations at ten phase points. In this section, we will be in density-temperature $(ρ^{*}, T^{*})$ phase space, rather than pressure-temperature phase space. The two densities required at $0.2$ and $0.8$ , and the temperature range is $2.0, 2.2, 2.4, 2.6, 2.8$ . Plot $C_{V} / V$ as a function of temperature, where $V$ is the volume of the simulation cell, for both of your densities (on the same graph). Is the trend the one you would expect? Attach an example of one of your input scripts to your report.

**$\frac{C_{v}}{V}$ output in the Density-Temperature Phase Space**

The heat capacity $\frac{d U}{d T}$ decreases with increasing temperature because at higher temperatures, you are populating states that are closer in energy to each other; as you go higher in energy, the density of states increases and therefore $d U$ becomes smaller. So for the same $d T$ at higher temperatures, $C_{v}$ is smaller than that at lower temperatures.

The heat capacity increasing with density can be purely explained by using statistical mechanics; it can be derived that the internal energy $U$ at constant volume as in the system here = $\frac{3}{2} N k_{B} T$ .

$\frac{d U}{d T}$ = $\frac{3}{2} N k_{B}$

Thus the heat capacity increases with the number of atoms in the system, N and therefore with density $\frac{N}{V}$ .

A more physical description would be that as the total energy is constant, in a simulation box with more atoms (larger N), the internal energy per atom is lower, and also there are greater energy losses due to more degrees of freedom, therefore to raise the temperature by a certain amount $d T$ requires a greater input of internal energy $d U$ , causing the heat capacity of the system $\frac{d U}{d T}$ to be higher for a system with higher N.

The input script for the simulation with density 0.2 and temperature 2.0 is given below as an example.

###DEFINE DENSITY###
variable density equal 0.2

### DEFINE SIMULATION BOX GEOMETRY ###
lattice sc ${density}
region box block 0 15 0 15 0 15
create_box 1 box
create_atoms 1 box

### DEFINE PHYSICAL PROPERTIES OF ATOMS ###
mass 1 1.0
pair_style lj/cut/opt 3.0
pair_coeff 1 1 1.0 1.0
neighbor 2.0 bin

### SPECIFY THE REQUIRED THERMODYNAMIC STATE ###
variable T equal 2.0
variable timestep equal 0.0025

### ASSIGN ATOMIC VELOCITIES ###
velocity all create ${T} 12345 dist gaussian rot yes mom yes

### SPECIFY ENSEMBLE ###
timestep ${timestep}
fix nve all nve

### THERMODYNAMIC OUTPUT CONTROL ###
thermo_style custom time etotal temp press
thermo 10

### RECORD TRAJECTORY ###
dump traj all custom 1000 output-1 id x y z

### SPECIFY TIMESTEP ###

### RUN SIMULATION TO MELT CRYSTAL ###
run 10000
reset_timestep 0
unfix nve

### BRING SYSTEM TO REQUIRED STATE ###
variable tdamp equal ${timestep}*100
fix nvt all nvt temp ${T} ${T} ${tdamp}
run 10000
reset_timestep 0
unfix nvt
fix nve all nve

### MEASURE SYSTEM STATE ###
thermo_style custom step etotal temp density atoms
variable temper equal temp
variable dens equal density
variable n equal atoms
variable energytotal equal etotal
variable energytotal2 equal etotal*etotal
fix aves all ave/time 100 1000 100000 v_temper v_dens v_energytotal v_energytotal2
run 100000

variable avetemp equal f_aves[1]
variable avedens equal f_aves[2]
variable heatcapacoverv equal ${n}*${dens}*(f_aves[4]-f_aves[3]*f_aves[3])/(f_aves[1]*f_aves[1])

print "Averages"
print "--------"
print "Temperature: ${avetemp}"
print "Density: ${avedens}"
print "Heat Capacity over v: ${heatcapacoverv}"

Structural Properties and the Radial Distribution Function

TASK: perform simulations of the Lennard-Jones system in the three phases. When each is complete, download the trajectory and calculate $g (r)$ and $\int g (r) d r$ . Plot the RDFs for the three systems on the same axes, and attach a copy to your report. Discuss qualitatively the differences between the three RDFs, and what this tells you about the structure of the system in each phase. In the solid case, illustrate which lattice sites the first three peaks correspond to. What is the lattice spacing? What is the coordination number for each of the first three peaks?

**$g (r)$ *(left)* and $\int g (r) d r$ *(right)* for a solid, liquid and gas**

The radial distribution functions above provide us with the probability of finding an atom at a particular distance r from the reference atomic centre at r=0; the peaks visible in the radial distribution functions correspond to coordination shells (i.e. atomic centres). At a particular value of r, $\int g (r) d r$ gives the total number of atoms within that radius. It is clear that the solid has the greatest long range order with the atoms packed closely together by strong interactions, due to it having the highest frequency and amplitude of peaks out of the three RDF plots as well as the highest integral at the end of the simulation. The particularly sharp nature of the peaks in the solid RDF is indicative of the solid's well packed structure; the regions of very low atomic density in between the peaks is because the atoms are packed so efficiently and regularly to fill the space, it is highly unlikely atoms will fit into these specific regions. The presence of some peaks in the liquid RDF at small values of r gives evidence of some short range order; the smoother nature and more curvature of the peaks for the liquid RDF compared to that for the solid RDF indicate that atomic density is less fixed at certain positions and therefore the atoms have more freedom to move around in a liquid. Meanwhile, just a single maximum in the gas RDF highlights how this phase is completely disordered with virtually no bonding interactions, supported by the fact that its integral increases extremely slowly over the course of the simulation, indicating that very few atoms are found over long range distances from the reference atom as well.

The lattice sites a, b and c correspond to the first three peaks on the solid RDF -a is the nearest neighbour to the reference atom and corresponds to the first peak, c is the third nearest neighbour to the reference atom and corresponds to the third peak

As the face centred cubic lattice is two interpenetrating simple cubic lattices, the lattice spacing is defined by the distance between the reference atom and the second nearest neighbour, b - this equals 1.525 and can be seen approximately by the position of the second peak on the solid RDF. If we consider 8 unit cells, 4 on top and 4 on bottom, then it can be understood that the coordination to a lattice sites is 12 as there are $\frac{3}{2}$ a coordination sites per unit cell and there are 8 unit cells. Similarly, the coordination to lattice site b is 6 as there are $\frac{3}{4}$ b coordination sites per unit cell and 8 unit cells, and the coordination to c is 24 as there are 3 coordination sites per unit cell and 8 unit cells.

Dynamical properties and the diffusion coefficient

Mean Squared Displacement

TASK: make a plot for each of your simulations (solid, liquid, and gas), showing the mean squared displacement (the "total" MSD) as a function of timestep. Are these as you would expect? Estimate $D$ in each case. Be careful with the units! Repeat this procedure for the MSD data that you were given from the one million atom simulations.

**Total MSD as a function of Timestep**
	Solid	Liquid	Gas
Small System
Large System (1 million atoms)

The MSD is higher for phases with greater disorder as expected, with the gas phase having the highest MSD at the end of the simulation, and the solid phase the lowest. Interestingly, the gas phase MSD initially climbs exponentially, but then shifts to a more linear behaviour towards the end of the simulation, suggesting that a limiting value of the diffusion coefficient has been reached in the simulation.

The diffusion coefficient is given by: $D = \frac{1}{6} \frac{\partial ⟨ r^{2} (t) ⟩}{\partial t}$

The liquid phase MSD holds a linear relationship throughout the simulation, therefore the limiting value for D is reached very quickly in this phase.

The sharp initial increase and then drop to an average value for the solid phase MSD, can be explained by the atoms initially exchanging positions and rearranging, in order to find the most energetically favourable lattice arrangement; once this arrangement is reached, the atoms' degrees of freedom are limited (mostly to vibrations) and the MSD drops slightly and reaches the value around which it fluctuates.

The liquid phase MSD plots for the large and small system are very similar, but there are differences between large and small for the solid and gas phase; for the solid phase, there are no fluctuations about an average value in the large system - the MSD effectively reaches a constant here (possible outlier effects are reduced in this system due to a larger population being sampled), and the gas phase MSD plot is much more linear for the large system, meaning the limiting value of D is reached much sooner here than in the small system.

Diffusion coefficients follow suit, with the more disordered phase having the higher D value, as diffusion originates from random motion:

**Estimated Diffusion Coefficients**
	Solid	Liquid	Gas
Small System	$4.167 \times 1 0^{- 6}$	0.0833	4.658
Large System (1 million atoms)	$4.167 \times 1 0^{- 6}$	0.0833	2.542

Velocity Autocorrelation Function

TASK: In the theoretical section at the beginning, the equation for the evolution of the position of a 1D harmonic oscillator as a function of time was given. Using this, evaluate the normalised velocity autocorrelation function for a 1D harmonic oscillator (it is analytic!):

$C (τ) = \frac{\int_{- \infty}^{\infty} v (t) v (t + τ) d t}{\int_{- \infty}^{\infty} v^{2} (t) d t}$

Be sure to show your working in your writeup. On the same graph, with x range 0 to 500, plot $C (τ)$ with $ω = 1 / 2 π$ and the VACFs from your liquid and solid simulations. What do the minima in the VACFs for the liquid and solid system represent? Discuss the origin of the differences between the liquid and solid VACFs. The harmonic oscillator VACF is very different to the Lennard Jones solid and liquid. Why is this? Attach a copy of your plot to your writeup.

$C (τ) = \frac{\int_{- \infty}^{\infty} v (t) v (t + τ) d t}{\int_{- \infty}^{\infty} v^{2} (t) d t}$

The equation for the evolution of the position of a 1D harmonic oscillator as a function of time is $x (t) = A \cos (ω t + ϕ)$

Therefore the velocity for a 1D harmonic oscillator is $v (t) = \frac{d x (t)}{d t} = - ω A \sin (ω t + ϕ)$

$C (τ) = \frac{\int_{- \infty}^{\infty} v (t) v (t + τ) d t}{\int_{- \infty}^{\infty} v^{2} (t) d t}$

$= \frac{\int_{- \infty}^{\infty} (- ω A \sin (ω t + ϕ)) (- ω A \sin (ω (t + τ) + ϕ)) d t}{\int_{- \infty}^{\infty} {(- ω A \sin (ω t + ϕ))}^{2} d t}$

$= \frac{\int_{- \infty}^{\infty} (- ω A \sin (ω t + ϕ)) (- ω A \sin (ω t + ω τ + ϕ)) d t}{\int_{- \infty}^{\infty} {(- ω A \sin (ω t + ϕ))}^{2} d t}$

$= \frac{\int_{- \infty}^{\infty} \sin (ω t + ϕ) (s i n (ω t + ϕ) \cos (ω t) + c o s (ω t + ϕ) \sin (ω t)) d t}{\int_{- \infty}^{\infty} \sin^{2} (ω t + ϕ) d t}$

$\frac{\cos (ω t) \int_{- \infty}^{\infty} \sin^{2} (ω t + ϕ)}{\int_{- \infty}^{\infty} \sin^{2} (ω t + ϕ)} + \frac{\sin (ω t) \int_{- \infty}^{\infty} (\sin (ω t + ϕ) \cos (ω t + ϕ))}{\int_{- \infty}^{\infty} \sin^{2} (ω t + ϕ)}$

The integral, $\int_{- \infty}^{\infty} (\sin (ω t + ϕ) \cos (ω t + ϕ)) d t = 0$

Therefore,

$C (τ) = \cos (ω t)$

**Comparing VACFs of the Harmonic Oscillator, solid phase, and liquid phase**

The VACF for the Harmonic Oscillator oscillates around 0 due to the cosine function. The Harmonic Oscillator assumes periodicity and ignores the possibility of collisions with particles over a long range, that can cause significant changes from the initial velocity. It can be seen that the liquid phase's VACF minimizes to 0 quite quickly and becomes constant (flattens out), within 100 timesteps, without any oscillation around 0, indicating that there is not much long range order in the liquid phase. On the other hand, the solid phase VACF continues to oscillate around 0 up to 350 timesteps, where it eventually also flattens out and hits a constant value of 0, indicating a much greater extent of long range order in this state. The Harmonic Oscillator is not an accurate representation of real physical systems, and the deviations from the model increase as the phases become more disordered and random collisions become more likely.

TASK: Use the trapezium rule to approximate the integral under the velocity autocorrelation function for the solid, liquid, and gas, and use these values to estimate $D$ in each case. You should make a plot of the running integral in each case. Are they as you expect? Repeat this procedure for the VACF data that you were given from the one million atom simulations. What do you think is the largest source of error in your estimates of D from the VACF?

**Comparing integrals under the VACF**
	Solid	Liquid	Gas
Small System
Large System (1 million atoms)

**Estimated Diffusion Coefficients using the VACF integral**
	Solid	Liquid	Gas
Small System	$6.992 \times 1 0^{- 4}$	0.0991	8.24
Large System (1 million atoms)	$1.25 \times 1 0^{- 3}$	0.0913	3.270

Like seen with the MSD estimate, the diffusion coefficients estimated here also agree with physical reality, with the coefficients increasing with more disordered phases. For the solid and liquid phases, as their VACF integral plots appear to flatten, it suggests that the diffusion coefficient limit has been reached for these phases within the simulation time. The largest source of error in the estimates of D from the VACF, is the usage of the Trapezium Rule as the method for integration, as at each dt, it could over and/or underestimate the true area under the function, particularly when the VACF has curvature. These errors are cumulative, and could lead to huge deviations from the true integral value at the end of the integration; by observing the curvature of the various VACFs plotted above, it can be seen how the VACF integral and thereby diffusion coefficient of the gas phase will be most affected by this error.