AndyForester
HBDI Calculations
Wed 21st Oct 2009
- Started project work
- Gaussview and Gaussian working
- Working out how to use the command inputs
- Trying to sort VPN out
- Working to get the project proposal written
Tue 27th Oct 2009
Drew up the HBDI molecule in Gaussview, the molecule will be optimised at the HF 3-21G level on the mac (locally).
DFT B3LYP 6-31G (Becke threeparameter hybrid exchange functional5 and the Lee-Yang-Parr correlation functional) method optimisation was run on the resulting file, the MP2 6-31G method optimisation. Each of these had a corresponding freq calculation to check that minima were reached. All calculations were successful, with no negative vibrations.
Wed 28th Oct 2009
Ran DFT B3YLP 6-311G calculations on HBDI (Opt and Freq) since these were the calculations used in a published study of HBDI.
Optimisation job finshed successfully, the frequency calculation did not finish succesfully. Error was:
****Warning!!: The largest alpha MO coefficient is 0.17588883D+02 Symmetrizing basis deriv contribution to polar: IMax=3 JMax=2 DiffMx= 0.00D+00 G2DrvN: will do 29 centers at a time, making 1 passes doing MaxLOS=1. Calling FoFCou, ICntrl= 3107 FMM=F I1Cent= 0 AccDes= 0.00D+00. FoFDir/FoFCou used for L=0 through L=1. End of G2Drv Frequency-dependent properties file 721 does not exist. End of G2Drv Frequency-dependent properties file 722 does not exist. IDoAtm=1111111111111111111111111111 Differentiating once with respect to electric field. with respect to dipole field. Differentiating once with respect to nuclear coordinates. Keep R1 ints in memory in canonical form, NReq=447529180.
Thurs 29th Oct 2009
Learning to use ONIOM
The example on chem wiki was attempted. Drawn up in Gaussview and optimised initially using molecular mechanics AMBER force field.
The ONIOM calculation was set up in Gaussview using DFT B3LYP 6-311G(d):AMBER methods with the benzene ring as the high level and chain at low level. Electronic embedding was NOT used in this initial calculation.
This conformation is NOT the lowest energy conformation, the initial AMBER MM calculation did not find the global minimum probably due to the drawn up starting geometry.
The molecule was drawn up again and DFT B3LYP 6-311G(d):AMBER and pure B3LYP 6-311G(d) calculations run. Absolute energy differences between the 2 conformations was very small at the ONIOM(DFT B3LYP 6-311G(d):AMBER) level. -232.27532671 vs -232.27315709 au respectively, a 0.00271au (4.359 744 17×10-18 J = 1au) difference.
It has already become obvious that using ONIOM decreases calculation time considerably. The table below shows a "CPU time" comparison on the above molecule using the different approaches.
Calculation | Time (1st attempt) | Time (2nd attempt) |
ONIOM(DFT B3LYP 6-311G(d):AMBER) | 4 minutes 22.0 seconds | 8 minutes 5.5 seconds |
DFT 3LYP 6-311G(d) | 2 hours 12 minutes 21.9 seconds | 2 hours 30 minutes 0.8 seconds |
Mon 2nd Nov 2009
Having obtained optimized structures of HBDI at various theory levels and basis set variations, these structures will now be compared to previous literature. Extensive studies have been performed using HF and B3LYP methods previously. It has been noted in Jaspers recent review that HF does not model frequencies very accurately - my computations agree HF 6-31G frequencies are very far out. B3LYP frequencies are much closer to the experimental values.
Tues 3rd Nov 2009
Tabulated frerquency comparison
A.Esposito HF 6-31G(d) | A.Esposito B3LYP 6-31G(d) | X.He B3LYP 6-31G(d) | HF 6-31G(d) | B3LYP 6-31G(d) | |
C=O | 1987 | 1802 | 1802 | 1986 | 1801 |
C=C | 1880 | 1705 | 1704 | 1880 | 1704 |
Phenol | 1809 | 1665 | 1663 | 1812 | 1666 |
Phenol | 1781 | 1635 | 1631 | 1777 | 1635 |
C=N | 1844 | 1620 | 1618 | 1844 | 1618 |
Phenol | 1696 | 1562 | 1558 | 1694 | 1562 |
C-N | 1607 | 1483 | N/A | 1606 | 1482 |
Phenol | 1602 | 1487 | N/A | 1603 | 1488 |
Total Energy(au) | -720.043 | -724.465 | |||
Thurs 5th Nov 2009
Having got Jaspers input files, several things were noticed:
- Solvent models were used - should be in vacum.... I have re run these in G09 with No solvent on both DFT 6-311G(d,p) and HF 6-31+G(d) methods/basis sets for comparison
- The models in his Biophys paper using other residues in contact COO- for example are not present. He has not reported the pure HBDI calculation results.
- More than just HBDI is likely to be required, meaning larger than expected high level region in the GFP calculation. (3 layer ONIOM DFT:HF:AMBER)?
Other things to do:
- Work on a guide to loading .pdb files into Gaussview 5 - how much information is maintained?
- Are residues defined still in Gaussview input files
- AMBER MM requires definitions of charges etc... how easy is it to import from .pdb and get a file that can be run in opt etc...
- Find a model system which is larger than HBDI but incorporates this molecule <=500 atoms (Limit of size for pure DFT calculation)
- Possibilities include a system alread developed by someone else online
- A "custom" system made by me to model this (time intensive)
- "strip" back the GFP pdb file to 500 atoms surrounding the HBDI chromophore.
Fri 6th Nov 2009
Yesterday started by trying to follow the tutorial on loading the GFP molecule from .pdb, this resulted in a problem with Gaussviews addition of hydrogens. It over protonated the nitrogens and carbon in the chromophore leading to over valent parts. I then tried several other molecules (PYP, a scorpion toxin and a strand of RNA) which had no problems with hydrogen addition.
I tried to use ChemBio 3D to load the .pdb and correctly assign hydrogens/connectivity. This resulted in the same protonation problems!
Mon 9th Nov 2009
TINKER - http://dasher.wustl.edu/tinker/ was installed with the Force Field Explorer (FFE) interface on the mac. This was used to load the same .pdb file and resulted in the same protonation around the chromophore. So far the SAME protonation pattern (incorrect) around the chromophore has been found using:
- Gaussview
- ChemBio 3D
- TINKER
Looking into how the .pdb file format works, the molecule is built from a series of residues, it becomes clear as you look at the sequence that is building the GFP protein chromophore (residues 65,66,67 in 1GFL) are not then reproducing the internal cyclisation properly leaving protons attached and thus atoms hypervalent. This accounts for the consistent error loading from the .pdb. As a result must all molecules expressed as a .pdb file must be pure protein structures...?
Comparison of the 2 available wild type GFP structures
The 2 structures being considered here are:
- 1GFL - Yang et al. -http://www.pdb.org/pdb/explore/explore.do?structureId=1GFL
- 1W7S - Jaspers stucture - http://www.pdb.org/pdb/explore/explore.do?structureId=1W7S
These 2 structures differ in that 1GFL contains 2 GFP molecules, whereas 1W7S contains 4. These were both stripped down to 1 unit of GFP, using the "pdb secondary structure" function in Gaussview 5. 1GFL had chain B removed and 1W7S had chains B,C and D removed. Next the residues were investigated that created the chromophore. The results are not consistent because GFL uses 3 residues (65,66,67) to form the chromophore. 1W7S has residue 66 (labelled CSY) as the entire chromophore! Both get protonated the same (incorrectly) which is odd considering how they molecules are built.
The next test was to "modify bond" in Gaussview 5 and create a "cleaned" chromophore structure (no bonding information from the residue) by manually changing the bonds. The chromophore in each .pdb file was altered to look like HBDI in terms of double single bonds and the hydrogens taken out (makes no difference as these are created each time the .pdb is loaded). When the new .pdb files (1GFL_cleaned.pdb and 1W7S_cleaned.pdb) were loaded they were correctly protonated! Success.
1W7S improvement
Chain A (not cleaned) of the 1W7S .pdb was submitted to run a AMBER calculation, as expected it failed with the lines:
Missing atomic parameters for atom 474 IAtTyp= 20000000 Missing atomic parameters.
This error says that the atom type of atom 474 is not defined. One way to counter this is to locate the atom and then change its definition manually (as shown in the example https://www.ch.ic.ac.uk/wiki/index.php/PYP_:_Photoactive_yellow_protein - time intensive!)
Instead the molecule was cleaned (manually changing the bonding in the chromophore), after the bond updates the structure was re submitted, again failing but with a different message:
Include all MM classes Bondstretch undefined between atoms 473 478 CC-N* [H,H] Bondstretch undefined between atoms 475 476 CC-CC [H,H] Bondstretch undefined between atoms 475 486 CC-CM [H,H] Bondstretch undefined between atoms 476 477 CC-O [H,H] Bondstretch undefined between atoms 476 478 CC-N* [H,H] Bondstretch undefined between atoms 486 2559 CM-HC [H,H] Angle bend undefined between atoms 473 474 475 CC-NB-CC [H,H,H] Angle bend undefined between atoms 473 478 479 CC-N*-CT [H,H,H] Angle bend undefined between atoms 473 478 476 CC-N*-CC [H,H,H] Angle bend undefined between atoms 473 482 483 CC-CT-N [H,H,H] Angle bend undefined between atoms 473 482 3857 CC-CT-H1 [H,H,H] Angle bend undefined between atoms 474 475 476 NB-CC-CC [H,H,H] Angle bend undefined between atoms 474 475 486 NB-CC-CM [H,H,H] Angle bend undefined between atoms 474 473 478 NB-CC-N* [H,H,H] Angle bend undefined between atoms 475 476 477 CC-CC-O [H,H,H] Angle bend undefined between atoms 475 476 478 CC-CC-N* [H,H,H] Angle bend undefined between atoms 475 486 487 CC-CM-CA [H,H,H] Angle bend undefined between atoms 475 486 2559 CC-CM-HC [H,H,H] Angle bend undefined between atoms 476 478 479 CC-N*-CT [H,H,H] Angle bend undefined between atoms 476 475 486 CC-CC-CM [H,H,H] Angle bend undefined between atoms 477 476 478 O-CC-N* [H,H,H] Angle bend undefined between atoms 478 479 480 N*-CT-C [H,H,H] Angle bend undefined between atoms 478 473 482 N*-CC-CT [H,H,H] Angle bend undefined between atoms 486 487 488 CM-CA-CA [H,H,H] Angle bend undefined between atoms 486 487 489 CM-CA-CM [H,H,H] Angle bend undefined between atoms 487 486 2559 CA-CM-HC [H,H,H] Angle bend undefined between atoms 488 487 489 CA-CA-CM [H,H,H] Angle bend undefined between atoms 490 492 491 CA-C-CM [H,H,H] Angle bend undefined between atoms 491 492 493 CM-C-OH [H,H,H] MM function not complete
This is a better error message since all the atoms are now defined, only parameters are missing. For parameters that involve only atoms in the chromophore at the high level in the ONIOM calculation, the values can be set to 0 in the input file. This is because the ONIOM extrapolation will cancel MM from this high level region, To do this the keyword "Amber=Hardfirst" and wildcards for the missing parameters which are zero valued are placed in the input file.
The above screeshot shows highlighted the atoms listed in the error... most of them if not all will be involved in the high level calculation and thus may not require parameters defining (other than to be set to 0). The reason the parameters are undefined is that the AMBER force field is very specific to protein bonds and thus the organic chromophore has bonds that are not found in this parameter set.
The atoms in the chromophore were frozen in the atom group editor and the input file had a parameter set added to the end setting chromophore parameters to 0:
HrmStr1 CC N* 0.0 0.0 HrmStr1 CC CC 0.0 0.0 HrmStr1 CC CM 0.0 0.0 HrmStr1 CC O 0.0 0.0 HrmStr1 CM HC 0.0 0.0 HrmBnd1 CC NB CC 0.0 0.0 HrmBnd1 CC N* CC 0.0 0.0 HrmBnd1 CC CT N 0.0 0.0 HrmBnd1 CC CT H1 0.0 0.0 HrmBnd1 NB CC CC 0.0 0.0 HrmBnd1 NB CC CM 0.0 0.0 HrmBnd1 NB CC N* 0.0 0.0 HrmBnd1 CC CC O 0.0 0.0 HrmBnd1 CC CC N* 0.0 0.0 HrmBnd1 CC CM CA 0.0 0.0 HrmBnd1 CC CM HC 0.0 0.0 HrmBnd1 CC CC CM 0.0 0.0 HrmBnd1 O CC N* 0.0 0.0 HrmBnd1 N* CT C 0.0 0.0 HrmBnd1 N* CC CT 0.0 0.0 HrmBnd1 CM CA CA 0.0 0.0 HrmBnd1 CA CM HC 0.0 0.0 HrmBnd1 CA C CM 0.0 0.0 HrmBnd1 CM C OH 0.0 0.0 HrmBnd1 CC N* CT 0.0 0.0 HrmBnd1 CM CA CM 0.0 0.0
This file then ran successfully for an AMBER single point energy calculation. It was then submitted for AMBER optimisation using "amber=hardfirst geom=connectivity" and returned a successful minimum with Forces and Displacements converged:
Item Value Threshold Converged? Maximum Force 0.000089 0.000450 YES RMS Force 0.000004 0.000300 YES Maximum Displacement 0.000089 0.001800 YES RMS Displacement 0.000004 0.001200 YES Predicted change in Energy=-1.156135D-07 Optimization completed. -- Stationary point found.
Cleaned .pdb files and TINKER
- The files that had been cleaned were opened in TINKER, it is necessary to convert the .pdb to a .xyz file to run any calculations within TINKER. This is currently failing due to an "atom missing on SER-228" - checking this there is no SER residue at 228 (in fact a GLU) so a bit puzzled. This is the 1W7S cleaned version.
- PYP was loaded in its .pdb format to TINKER and was immediately successful in being converted to .xyz then a MINIMISE algorithm ran successfully with the energy at its X-ray crystal structure as 261KJ/mol reduced to -3670.5355KJ/mol in 1645 iterations - this took a while (45mins locally)!
- Interestingly the .xyz output from the minimization can be converted back to .pdb (loses hydrogens minimization) and this PYP file was loaded into Gaussview 5 - the file then ran successfully for an AMBER single point energy calculation! cf. the first attempt to get PYP to run was unsuccessful, and the tutorial underwent much more hassle to run this calculation.
What .pdb information is retained?
The only molecule that has been run with AMBER opt (GFP) has lost its secondary structure information but RETAINED its residue information.
Fri 13th Nov 2009
Conversion from.xyz to .com
A protein with 2 residues (Glu-Asp) was built using TINKER, saved as:
- .xyz
- .pdb
- .com
The initial format was .xyz (created in TINEKR as a native format), converted to .pdb using TINKER and then loaded into Gaussview and saved as a .com.
.xyz
36 Protein built with TINKER/Force Field Explorer 1 CT 0.000000 0.000000 0.000000 340 2 4 5 6 2 C 0.000000 0.000000 1.510000 342 1 3 7 3 O 1.028938 0.000000 2.165506 343 2 4 HC 1.056270 0.000000 -0.341166 341 1 5 HC -0.497647 0.931694 -0.341166 341 1 6 HC -0.497647 -0.931694 -0.341166 341 1 7 N -1.236201 0.000000 2.027114 232 2 8 11 8 CT -1.446961 0.000000 3.471822 233 7 9 12 13 9 C -2.509557 -0.992751 3.878558 234 8 10 22 10 O -3.569672 -1.098634 3.284129 236 9 11 H -2.025518 0.000000 1.381066 235 7 12 H1 -0.503221 -0.281395 3.983946 237 8 13 CT -1.910491 1.397334 3.923704 238 8 14 18 19 14 CT -2.132799 1.397334 5.447574 240 13 15 20 21 15 C -2.585014 2.781238 5.848089 242 14 16 17 16 O2 -2.816736 2.973609 7.061266 243 15 17 O2 -2.693194 3.629064 4.935955 243 15 18 HC -2.861794 1.653482 3.412322 239 13 19 HC -1.132567 2.145528 3.664589 239 13 20 HC -1.170306 1.191123 5.960588 241 14 21 HC -2.946052 0.685840 5.701535 241 14 22 N -2.153997 -1.723291 4.944151 210 9 23 26 23 CT -3.047986 -2.739703 5.491225 211 22 24 27 28 24 C -3.120370 -2.663723 6.997574 212 23 25 34 25 O -2.128241 -2.535414 7.695870 214 24 26 H -1.239994 -1.547921 5.361576 213 22 27 H1 -4.072297 -2.585746 5.092246 215 23 28 CT -2.526745 -4.139104 5.114946 216 23 29 32 33 29 C -3.476741 -5.157722 5.698062 218 28 30 31 30 O2 -3.212872 -6.362275 5.493304 219 29 31 O2 -4.454666 -4.718803 6.341112 219 29 32 HC -1.530243 -4.298013 5.577366 217 28 33 HC -2.540086 -4.252948 4.010880 217 28 34 N -4.374888 -2.752228 7.460123 344 24 35 36 35 H -5.140865 -2.858197 6.794954 345 34 36 H -4.534830 -2.711583 8.466684 345 34
.pdb
HEADER Protein built with TINKER/Force Field Explorer COMPND SOURCE ATOM 1 CH3 ACE 1 0.000 0.000 0.000 ATOM 2 C ACE 1 0.000 0.000 1.510 ATOM 3 O ACE 1 1.029 0.000 2.166 ATOM 4 H1 ACE 1 1.056 0.000 -0.341 ATOM 5 H2 ACE 1 -0.498 0.932 -0.341 ATOM 6 H3 ACE 1 -0.498 -0.932 -0.341 ATOM 7 N GLU 2 -1.236 0.000 2.027 ATOM 8 CA GLU 2 -1.447 0.000 3.472 ATOM 9 C GLU 2 -2.510 -0.993 3.879 ATOM 10 O GLU 2 -3.570 -1.099 3.284 ATOM 11 CB GLU 2 -1.910 1.397 3.924 ATOM 12 CG GLU 2 -2.133 1.397 5.448 ATOM 13 CD GLU 2 -2.585 2.781 5.848 ATOM 14 OE1 GLU 2 -2.817 2.974 7.061 ATOM 15 OE2 GLU 2 -2.693 3.629 4.936 ATOM 16 H GLU 2 -2.026 0.000 1.381 ATOM 17 HA GLU 2 -0.503 -0.281 3.984 ATOM 18 HB2 GLU 2 -2.862 1.653 3.412 ATOM 19 HB3 GLU 2 -1.133 2.146 3.665 ATOM 20 HG2 GLU 2 -1.170 1.191 5.961 ATOM 21 HG3 GLU 2 -2.946 0.686 5.702 ATOM 22 N ASP 3 -2.154 -1.723 4.944 ATOM 23 CA ASP 3 -3.048 -2.740 5.491 ATOM 24 C ASP 3 -3.120 -2.664 6.998 ATOM 25 O ASP 3 -2.128 -2.535 7.696 ATOM 26 CB ASP 3 -2.527 -4.139 5.115 ATOM 27 CG ASP 3 -3.477 -5.158 5.698 ATOM 28 OD1 ASP 3 -3.213 -6.362 5.493 ATOM 29 OD2 ASP 3 -4.455 -4.719 6.341 ATOM 30 H ASP 3 -1.240 -1.548 5.362 ATOM 31 HA ASP 3 -4.072 -2.586 5.092 ATOM 32 HB2 ASP 3 -1.530 -4.298 5.577 ATOM 33 HB3 ASP 3 -2.540 -4.253 4.011 ATOM 34 N NH2 4 -4.375 -2.752 7.460 ATOM 35 H1 NH2 4 -5.141 -2.858 6.795 ATOM 36 H2 NH2 4 -4.535 -2.712 8.467 END
.com
%chk=File_Test_Glu-Asp.chk # hf/3-21g geom=connectivity Protein built with TINKER/Force Field Explorer 0 1 C(PDBName=CH3,ResName=ACE,ResNum=1) 0.00000000 0.00000000 0.00000000 C(PDBName=C,ResName=ACE,ResNum=1) 0.00000000 0.00000000 1.51000000 O(PDBName=O,ResName=ACE,ResNum=1) 1.02900000 0.00000000 2.16600000 H(PDBName=H1,ResName=ACE,ResNum=1) 1.05600000 0.00000000 -0.34100000 H(PDBName=H2,ResName=ACE,ResNum=1) -0.49800000 0.93200000 -0.34100000 H(PDBName=H3,ResName=ACE,ResNum=1) -0.49800000 -0.93200000 -0.34100000 N(PDBName=N,ResName=GLU,ResNum=2) -1.23600000 0.00000000 2.02700000 C(PDBName=CA,ResName=GLU,ResNum=2) -1.44700000 0.00000000 3.47200000 C(PDBName=C,ResName=GLU,ResNum=2) -2.51000000 -0.99300000 3.87900000 O(PDBName=O,ResName=GLU,ResNum=2) -3.57000000 -1.09900000 3.28400000 C(PDBName=CB,ResName=GLU,ResNum=2) -1.91000000 1.39700000 3.92400000 C(PDBName=CG,ResName=GLU,ResNum=2) -2.13300000 1.39700000 5.44800000 C(PDBName=CD,ResName=GLU,ResNum=2) -2.58500000 2.78100000 5.84800000 O(PDBName=OE1,ResName=GLU,ResNum=2) -2.81700000 2.97400000 7.06100000 O(PDBName=OE2,ResName=GLU,ResNum=2) -2.69300000 3.62900000 4.93600000 H(PDBName=H,ResName=GLU,ResNum=2) -2.02600000 0.00000000 1.38100000 H(PDBName=HA,ResName=GLU,ResNum=2) -0.50300000 -0.28100000 3.98400000 H(PDBName=HB2,ResName=GLU,ResNum=2) -2.86200000 1.65300000 3.41200000 H(PDBName=HB3,ResName=GLU,ResNum=2) -1.13300000 2.14600000 3.66500000 H(PDBName=HG2,ResName=GLU,ResNum=2) -1.17000000 1.19100000 5.96100000 H(PDBName=HG3,ResName=GLU,ResNum=2) -2.94600000 0.68600000 5.70200000 N(PDBName=N,ResName=ASP,ResNum=3) -2.15400000 -1.72300000 4.94400000 C(PDBName=CA,ResName=ASP,ResNum=3) -3.04800000 -2.74000000 5.49100000 C(PDBName=C,ResName=ASP,ResNum=3) -3.12000000 -2.66400000 6.99800000 O(PDBName=O,ResName=ASP,ResNum=3) -2.12800000 -2.53500000 7.69600000 C(PDBName=CB,ResName=ASP,ResNum=3) -2.52700000 -4.13900000 5.11500000 C(PDBName=CG,ResName=ASP,ResNum=3) -3.47700000 -5.15800000 5.69800000 O(PDBName=OD1,ResName=ASP,ResNum=3) -3.21300000 -6.36200000 5.49300000 O(PDBName=OD2,ResName=ASP,ResNum=3) -4.45500000 -4.71900000 6.34100000 H(PDBName=H,ResName=ASP,ResNum=3) -1.24000000 -1.54800000 5.36200000 H(PDBName=HA,ResName=ASP,ResNum=3) -4.07200000 -2.58600000 5.09200000 H(PDBName=HB2,ResName=ASP,ResNum=3) -1.53000000 -4.29800000 5.57700000 H(PDBName=HB3,ResName=ASP,ResNum=3) -2.54000000 -4.25300000 4.01100000 N(PDBName=N,ResName=NH2,ResNum=4) -4.37500000 -2.75200000 7.46000000 H(PDBName=H1,ResName=NH2,ResNum=4) -5.14100000 -2.85800000 6.79500000 H(PDBName=H2,ResName=NH2,ResNum=4) -4.53500000 -2.71200000 8.46700000 1 2 1.0 4 1.0 5 1.0 6 1.0 2 3 2.0 7 1.5 3 4 5 6 7 16 1.0 8 1.0 8 9 1.0 11 1.0 17 1.0 9 10 2.0 22 1.5 10 11 18 1.0 12 1.0 19 1.0 12 21 1.0 13 1.0 20 1.0 13 14 2.0 15 2.0 14 15 16 17 18 19 20 21 22 23 1.0 30 1.0 23 26 1.0 24 1.0 31 1.0 24 34 1.5 25 2.0 25 26 27 1.0 33 1.0 32 1.0 27 28 2.0 29 2.0 28 29 30 31 32 33 34 35 1.0 36 1.0 35 36
The issue going from .xyz straight to .com is that ALL protein information would be lost ie. residue information that is written into the .pdb at .xyz to .pdb comversion.
PYP files as downloaded compared to run through TINKER
- TINKER removes much of the header information
- TINKER only preserves the 1st model of a .pdb file (may be multiple models)
- TINKER removes far right 3 columns of information (not essential to the .pdb file)
- A virtual residue in the original .pdb comprising the chromophore was omitted from the TINKER output (number 126)
What each TINKER search algorithm does
There are several programs within TINKER that may be able to help locate a global minimum (not just one near the published crystal structure) by way of searching the potential energy surface. All of these are MM methods for use in TINKER.
SCAN
A program for general conformational search of an entire potential energy surface via a basin hopping method. The program takes as input a TINKER .xyz coordinates file which is then minimized to find the first local minimum for a search list. A series of activations along various normal modes from this initial minimum are used as seed points for additional minimizations. Whenever a previously unknown local minimum is located it is added to the search list. When all minima on the search list have been subjected to the normal mode activation without locating additional new minima, the program terminates. The individual local minima are written to cycle files as they are discovered. While the SCAN program can be used on standard undeformed potential energy surfaces, we have found it to be most useful for quickly ‘‘scanning’’ a smoothed energy surface to enumerate the major basins of attraction spaning the entire surface.
SNIFFER
A program that implements the Sniffer global optimization algorithm of Butler and Slaminka, a discrete version of Griewank’s global search trajectory method. The program takes an input TINKER .xyz coordinates file and shakes it vigorously via a modified dynamics trajectory before, hopefully, settling into a low lying minimum. Some trial and error is often required as the current implementation is sensitive to various parameters and tolerances that govern the computation. At present, these parameters are not user accessible, and must be altered in the source code. However, this method can do a good job of quickly optimizing conformation within a limited range of convergence.
PSS
Implements our version of a potential smoothing and search algorithm for the global optimization of molecular conformation. An initial structure in .xyz format is first minimized in Cartesian coordinates on a series of increasingly smoothed potential energy surfaces. Then the smoothing procedure is reversed with minimization on each successive surface starting from the coordinates of the minimum on the previous surface. A local search procedure is used during the backtracking to explore for alternative minima better than the one found during the current minimization. The final result is usually a very low energy conformation or, in favorable cases, the global energy minimum conformation. The minimum energy coordinate sets found on each surface during both the forward smoothing and backtracking procedures are placed in sequentially numbered cycle files.
TINKER also contains programs that are able to perform MM optimisations:
MINIMIZE
The MINIMIZE program performs a limited memory L-BFGS minimization of an input structure over Cartesian coordinates using a modified version of the algorithm of Jorge Nocedal. The method requires only the potential energy and gradient at each step along the minimization pathway. It requires storage space proportional to the number of atoms in the structure. The MINIMIZE procedure is recommended for preliminary minimization of trial structures to an rms gradient of 1.0 to 0.1 kcal/mole/Å. It has a relatively fast cycle time and is tolerant of poor initial structures, but converges in a slow, linear fashion near the minimum. The user supplies the name of the TINKER .xyz coordinates file and a target rms gradient value at which the minimization will terminate. Output consists of minimization statistics written to the screen or redirected to an output file, and the new coordinates written to updated .xyz files or to cycle files.
OPTIMIZE
The OPTIMIZE program performs a optimally conditioned variable metric minimization of an input structure over Cartesian coordinates using an algorithm due to William Davidon. The method does not perform line searches, but requires computation of energies and gradients as well as storage for an estimate of the inverse Hessian matrix. The program operates on Cartesian coordinates from a TINKER .xyz file. OPTIMIZE will typically converge somewhat faster and more completely than MINIMIZE. However, the need to store and manipulate a full inverse Hessian estimate limits its use to structures containing less than a few hundred atoms on workstation class machines. As with the other minimizers, OPTIMIZE needs input coordinates and an rms gradient cutoff criterion. The output coordinates are saved in updated .xyz files or as cycle files.
NEWTON
A truncated Newton minimization method which requires potential energy, gradient and Hessian information. This procedure has significant advantages over standard Newton methods, and is able to minimize very large structures completely. Several options are provided with respect to minimization method and preconditioning of the Newton equations. The default options are recommended unless the user is familiar with the math involved. This program operates in Cartesian coordinate space and is fairly tolerant of poor input structures. Typical algorithm iteration times are longer than with nonlinear conjugate gradient or variable metric methods, but many fewer iterations are required for complete minimization. NEWTON is usually the best choice for minimizations to the 0.01 to 0.000001 kcal/mole/Å level of rms gradient convergence. Tests for directions of negative curvature can be removed, allowing NEWTON to be used for optimization to conformational transition state structures (this only works if the starting point is very close to the transition state). Input consists of a TINKER .xyz coordinates file; output is an updated set of minimized coordinates and minimization statistics.
Tues 17th Nov 2009
Comparing 2 .pdb files:
- As downloaded from PDB
- Converted to .xyz in TINKER then back to .pdb
These files were then cleaned up to compare in a terminal using the "diff" command:
- Header information and end information removed
- 3 columns to the right in PDB version were removed to become consistent with the TINEKR file
- The first column "ATOM" was removed effectively making the line numbers equal to the atom numbers in both files
Wed 18th Nov 2009
1st diff was on line 52, a HYDROGEN ATOM:
52 HE2 HIS A 3 13.349 -10.803 9.527
added to the TINKER output, not in PDB file.
2nd diff was on line 657, a HYDROGEN ATOM:
657 HE2 GLU A 46 -0.867 -5.216 -0.822
added to the PDB file, not in TINKER output
3rd diff was on line 1007, a HYDROGEN ATOM (number 1008):
5HG CYS A 69 -8.909 -5.194 5.526
added to the TINKER output, not in PDB file.
4th diff was on line 1621, a HYDROGEN ATOM (number 1623):
HE2 HIS A 108 4.663 7.459 6.879
added to the TINKER output, not in PDB file.
Following these reconciliations, both files were exactly the same with 1911 atoms present in each. 3 atoms were removed from the TINKER file and 1 was removed from the PDB file.
- NOTE - loading the original .pdb from PDB to Gaussview then converting to a .gjf results in an error "1 or more PDB atom names has a ' or * character which can cause problems for Gaussian" - this is because the PDB file has extra atoms on the end of the file to define a highlighted chromophore... After these were removed in a txt editor the .gjf save went smoothly. Infact after this removal the file will run an AMBEr single point energy calculation fine - cf running when the extra info is in the bottom.
All differences in files were due to protonation states of atoms... not the AA backbone.
TINKER vs PDB
The highlighted fragments represent where the respective file version is protonated and the other is not.
Fri 20th Nov 2009
In an attempt to load the 1W7S structure to TINKER and successfully convert it it a .xyz format the following procedure was undertaken:
Mon 23rd Nov 2009
Protonation of GFP at various pH conditions
With 230 amino acid residues, each possessing a unique pKa in GFP it is obvious that there is a huge spectrum of protonation states available at a single solvation pH. Since protons cannot be detected using X-ray diffraction, protonation states must be uncovered via a difference method. Several questions arise when thinking about protonation of the protein:
- Which residues have the ability to be protonated/deprotonated
- Is the PDB structure protonated for a specific pH (Bio pH)
- How does the protonation of the protein change through a variety of solvation pHs
- Does the protonation state affect the chromophore photophysics (electronic embedding)
Answering these questions:
- Polar amino acids are those that can be protonated/deprotonated, these are identified by Gaussview and shown in the atom list of the molecule with the option of setting them to either protonated/deprotonated.
- Dont know if the current structure is protonated to any specific pH... this is unclear until protonate at certain pHs
- The protonation will be tested across a range of pHs to see how protonation changes
Tues 24th Nov 2009
The pKa value of a residue will indicate protonation state as a function of pH. If you know the pH of your solution then if the pKa of a given functional group is lower than this pH then the functional group will be unprotonated. If the pKa is higher than the pH then the functional group will be protonated. Beware that pKa values can be distorted by local conformational effects. The structure is protonated as though it were in pH 7 as it is loaded into Gaussview. Since hydrogens are defined in the PDB file it could be that the author added protons under these conditions.
Histadine has 2 protonation points on the imidizole R chain:
Yes-Yes | Yes-No | No-Yes | ||||||
The imidazole sidechain of histidine has a pKa of approximately 6, and overall, the amino acid has a pKa of 7.6. This means that at physiologically relevant pH values, relatively small shifts in pH will change its average charge. Below a pH of 6, the imidazole ring is mostly protonated as described by the Henderson–Hasselbalch equation. When protonated, the imidazole ring bears two NH bonds and has a positive charge. The positive charge is equally distributed between both nitrogens and can be represented with two equally important resonance structures.
Protonation
At pH 7 the PDB file protonations are correct for all residues. Moving to pH 5 the histadine residues become protonated (pKa 6)- this is the only change. At more alkaline pHs it would take pH 9.1 to make any effect (deprotonating CYS and then TYR at 9.7)
Mini-Biomolecule
To test ONIOM calculations vs pure DFT methods and other calculation types on a bio system that is smaller than GFP (computationally less expensive, can streamline the real calcs)
Trying a molecule analogous to the example given in the ONIOM tutorial...
Thurs 26th Nov 2009
Checking the GFP input will run
Procedure to get 1WS7 to run an AMBER energy calculation:
- Load the PDB file without water
- Try opt amber but will fail due to missing parameters (all in the chromophore)
- Set these parameters to 0 since they cancel in ONIOM
- Freeze the atotms involving missing paramters
- Energy AMBER will now run for the file - NO CHARGES - 9.07881632au
- Open the Atom Editor, Edit, MM charges, Assign all atoms
- Manually edit any missing charges
- Energy AMBER will now run including charges (basic AMBER94 charges) - 0.260391165au
Charges in this file did not sum to an integer number - the only residue with a non integer charge was no.64 with a charge of 0.61160000
MM sanity checks: All charges sum to: -5.38840000 Charges of atoms sum to: -5.38840000 Layer 1 (S) has 34 atoms and charge 0.61160000 sum= 0.61160000 Layer 2 (R) adds 3574 atoms and charge -6.00000000 sum= -5.38840000
WARNING -- The following centers have non-zero partial charges, but zero Vanderwaals parameters. These centers can collapse onto other centers with non-zero charge Center Charge 1833 0.423900 1890 0.410200 2029 0.427500 2043 0.427500 2080 0.410200 2089 0.399200 2123 0.410200 2179 0.410200 2186 0.410200 2260 0.410200 2287 0.410200 2294 0.410200 2347 0.427500 2369 0.399200 2469 0.427500 2508 0.399200 2551 0.410200 2619 0.399200 2639 0.410200 2714 0.410200 2924 0.399200 2939 0.399200 2950 0.427500 2981 0.399200 3180 0.427500 3233 0.399200 3262 0.410200 3361 0.399200 3377 0.427500 3384 0.410200 3397 0.427500 3418 0.427500 3562 0.410200 3593 0.410200 3608 0.410200
- oniom(pm6:amber=hardfirst) geom=connectivity was run successfully, resulting in an energy of 0.210085730788au
This still resulted in some charge problems:
MM sanity checks: All charges sum to: -5.38840000 Charges of atoms sum to: 0.79320000 Residue 63 PDB Number 64_A PHE has 1 atoms, and a charge of 0.59730000 Residue 64 PDB Number 66_A CSY has 34 atoms, and a charge of 0.61160000 Residue 65 PDB Number 68_A VAL has 1 atoms, and a charge of -0.41570000 WARNING -- The following centers have non-zero partial charges, but zero Vanderwaals parameters. These centers can collapse onto other centers with non-zero charge Center Charge 3608 0.410200 MMInit generated parameter data with length LenPar= 40470. Standard basis: Dummy (5D, 7F) There are 36 symmetry adapted basis functions of A symmetry. Integral buffers will be 262144 words long. Raffenetti 1 integral format. Two-electron integral symmetry is turned on. 36 basis functions, 36 primitive gaussians, 36 cartesian basis functions 27 alpha electrons 27 beta electrons nuclear repulsion energy 200.9124124811 Hartrees. NAtoms= 3608 NActive= 36 NUniq= 36 SFac= 7.50D-01 NAtFMM= 80 NAOKFM=F Big=F AMBER calculation of energy. Energy= 0.110474559 NIter= 0. Dipole moment= 3.390901 5.889448 -2.505476 ONIOM: saving gridpoint 1 ONIOM: restoring gridpoint 3 ONIOM: calculating energy. ONIOM: gridpoint 1 method: low system: model energy: 0.110474559177 ONIOM: gridpoint 2 method: high system: model energy: 0.060169124949 ONIOM: gridpoint 3 method: low system: real energy: 0.260391165016 ONIOM: extrapolated energy = 0.210085730788 ONIOM: calculating electric field derivatives. ONIOM: Integrating ONIOM file 4 number 619 ONIOM: Dipole =-2.76174164D+01-1.34762322D+02 3.99239712D+01 ONIOM: Dipole moment (Debye): X= -70.1965 Y= -342.5316 Z= 101.4766 Tot= 364.0782 ONIOM: Integrating ONIOM file 5 number 695
An ONIOM calculation was run using a DFT method on the high level and AMBER on the low level: oniom(b3lyp/3-21g:amber=hardfirst) geom=connectivity which ran successfully...
Charges
Charges in the protein are important for the calculation that will be run on GFP eventually since they provide the electrostatic field or "atmosphere" which surrounds the chromophore, polarizing the electron density used by the DFT method to evaluate the chromophore. Gaussian offers 2 types of influence which the MM portion can exert on the QM part:
- Mechanical Embedding
- Electronic Embedding
Mechanical embedding does not use any electronic components to affect the central high region. Electronic embedding does use the charge generated in the MM region to polarize the high region.
At pH 7 the GFP molecule is in a 6- charge state according to assigning AMBER charges and summing the residue charges (available in the out file). 3 options were considered at this stage for treating charges:
- Ignore charges on the entire molecule
- Use AMBER94 assigned partial charges on the low region
- Use Qeq to assign charges based on an overall charge of -6
The first option can be discarded since the electronic atmosphere is of particular interest here. The second 2 options are to be investigated. See below (on a small Biomolecule)
ON a 4 residue chain (in mini biomolecule folder) - As in the GFP tutorial on the wiki, a energy calculation was submitted with the keyword amber=qeq to get back the charges across the molecule. (not entirely sure what this does yet). AS a test of the effect of charges on a molecule a test molecule was compiled in TINKER consisting of 4 residues with 0 overall charges....
No Charges | 0.01940311au |
Amber Charges | -0.18117151au |
Qeq Charges | -0.23038199au |
Clearly the Qeq shows the lowest energy - the considerable differences in energy show that charges are an important part of the input.
Testing Charges and EE on a small system
Energies of the optimisations using various charge approximations and with/without EE
Method | Mechanical Embedding | Electronic Embedding | Computational Time (mechanical embedding) | Computational Time (electronic embedding) |
---|---|---|---|---|
B3LYP 6-31G* | -1406.04 | N/A | 11 hours 18 minutes 44.7 seconds | 11 hours 18 minutes 44.7 seconds |
ONIOM B3LYP 6-31G*:AMBER (No Charges) | -783.2225 | -783.2223 | 2 hours 13 minutes 4.1 seconds | |
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) | ||||
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) | -783.2467 | 2 hours 6 minutes 8.6 seconds |
NB.
- B3LYP 6-31G* and ONIOM B3LYP 6-31G*:AMBER (No Charges) were from the same starting geometry.
- ONIOM B3LYP 6-31G*:AMBER (Amber Charges) and ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) were from the same starting geometry.
- All Electronic embedding is from the same starting point - Can compare times.
Dipole moments using various charge approximations and with/without EE
Method | Dipole moment (No EE) | Dipole moment (With EE) |
---|---|---|
B3LYP 6-31G* | 7.4914 | N/A |
ONIOM B3LYP 6-31G*:AMBER (No Charges) | 3.2328 | 3.2241 |
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) | 5.2794 | 5.2865 |
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) | 8.173 | 8.1654 |
2nd Dec 2009
A single point energy of 1w7s was run with the additional qeq keyword to return partial charges (overall charge set to -6) This new molecule was submitted for AMBER opt but failed after around 700 iterations... this could be due to the error:
WARNING -- The following centers have non-zero partial charges, but zero Vanderwaals parameters. These centers can collapse onto other centers with non-zero charge
When the atoms defined in this error had their partial charges set to 0.00, the result was a fully converged AMBER optimisation:
- Frozen chromophore.
- pH 7 giving a charge of -6
- Charges assigned using Qeq method
- Zero Van der Waals parameter atoms partial charges set to 0.00
Looking at what atoms are not defined in terms of Van der Waals paramters it is ALWAYS an aloholic proton that is not parameterised...
Tutorial used AMBER charges then ran Qeq for the missing charges... i think this is what she thought she did... gaussian "amber=qeq" will overwrite all charges.. need to use "amber=uncharged" so that we dont overwrite the amber charges...
In effect full Qeq charges were assigned in the tutorial. Also overall charge is 0 - will check this.
The GFP tutorial
- AMBER charges (AMBER94 charges) were assigned to the GFP molecule in the tutorial
- she notices correctly that some are missing in the chromophore and so decides to assign them via Qeq
- Runs Qeq on the GFP molecule with a 0 charge which is wrong! When run the AMBER charges and look at the log file we can see that the overall charge should be +4 for that protonation state.
- Since Qeq is an equilibrium method, partial charges are solved with the constraint that their sum must equal the overall charge.
- These Qeq charges should therefore not fit with an integer sum when pasted into the AMBER charges.
Qeq versus AMBER94 charges
Ogawa et al. state:
Geometry optimizations of a crambin protein were performed using force fields combined with the CQEq. The CQEq charges were found to be more comparable to the Mulliken charges obtained by the ab initio molecular orbital calculation than the fixed charges used in the AMBER force field. Furthermore, the CQEq charges give more realistic charge redistribution between amino acids in the crambin.
Mini Biomolecule calculations:
Method | Dipole moment (No EE) | Dipole moment (With EE) |
---|---|---|
B3LYP 6-31G* | 7.4914 | N/A |
ONIOM B3LYP 6-31G*:AMBER (No Charges) | 3.2328 | 3.2241 |
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) | 5.2794 | 5.2865 |
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) | 8.173 | 8.1654 |
Clearly the Qeq charges are in better agreement with the DFT method than no or AMBER charges.
These frequencies are calulated using mechanical embedding!
Method | C=O | C=C |
---|---|---|
B3LYP 6-31G* | 1799.65 | 1701.86 |
ONIOM B3LYP 6-31G*:AMBER (No Charges) | 1802.25 | 1703.63 |
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) | 1803.37 | 1701.67 |
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) | 1800.18 | 1702.86 |
As a result it seems obvious that a Qeq charge assignment in constraint by the correct total charge gives results most comparable to the pure QM method (both dipoles and frequencies)
3rd Dec 2009
ESP Charges
ESP calculates the expectation values of the electrostatic potential of a molecule on a uniform distribution of points. The resultant ESP surface is then fit to atom centered charges that best reproduce the electron distribution in a least-squares method. The set of points defining the default surface is generated according to the algorithm of Connolly.
ESP charges thus require a QM calculation to create the molecular electrostatic potential used to fit atom centered charges. The size of GFP means that this is not an efficient method for use in this case?
Qeq Charges
The charge equilibration (QEq) method treats the atomic charge variation through the equilibration of electronegativities at atomic sites. The advantage of the QEq method is that the polarization effect in MM is taken into account by variation of the charge distribution in the molecules. The atomic charge variation relates to the intramolecular geometrical changes.
AMBER charges
8th Dec 2009
A input creation page is now avaliable:
https://www.ch.ic.ac.uk/wiki/index.php/AndyForesterGFP
The final input file had protein information pasted in using text to columns in excel and now displays PDB residue information and PDB secondary structure information (nice cartoon pictures in ChemBio3D!)
Interesting is a paper that uses TD-DFT on GFP (in my papers folder on the mac), the group uses a 3 stage minimisation method from the 1GFL PDB structure:
- Freeze all atoms in the backbone of the protein and optimise after adding hydrogens
- Optimise the chromophore at a semi-empirical level
- They cut out a chromophore section but i could then perform a full QM/MM geometry optimisation
A more charged mini molecule to test electronic embedding
The molecule was submitted with 3 different job lines:
- b3lyp 6-31g(d)
- oniom(b3lyp 6-31g(d):amber=(hardfirst,qeq))
- oniom(b3lyp 6-31g(d):amber=(hardfirst,qeq))=embedcharge
Opt and Freq jobs were completed successfully. The charge breakdown input was:
- Low Real 2 1
- High Model 0 1
- Low Model 0 1
The aim of the calculations was to test electronic embedding versus the mechanical embedding. A paper suggests that there is a large difference in accuracy between the two methods. Qeq charge equilibration was used in both cases as it has already been proved to be much more effective than AMBER assigned charges. It has consistently overestimated the dipole moment though.
Method | Dipole moment |
---|---|
B3LYP 6-31G* | 8.02 |
ONIOM B3LYP 6-31G*:AMBER (ME) | 12.22 |
ONIOM B3LYP 6-31G*:AMBER (EE) | 12.23 |
These results are odd since we expect the EE to have some effect on the charge distribution in the molecule (it is polarising the QM region) comparing the frequencies as a further check:
Method | C=O | C=C |
---|---|---|
B3LYP 6-31G* | 1764 | 1636 |
ONIOM B3LYP 6-31G*:AMBER (ME) | 1797 | 1697 |
ONIOM B3LYP 6-31G*:AMBER (EE) | 1797 | 1697 |
Again the EE has made no difference to the frequency of various strong vibrations in the molecule. The conclusion is thus either:
- EE has no real effect on the moleucule
- EE is not working properly
Assuming the latter further investigation discovered that the meechanism of EE is determined using scaling parameters for MM charges during electronic embedding in the QM calculations. As a default the charges are scaled 555500 using the nomenclature ScaleCharge=ijklmn where n is closest to the QM region (massively anti intuitive). The input is read left to right and not all scaling factors need be defined. Further to this, the value given is then multiplied by 0.2 to arrive at the actual scaling coefficient. The result is thus for an atom:
- 1 bond from the QM region (n) = 0.2n Default is 0
- 2 bonds from the QM region (m) = 0.2m Default is 0
- 3 bonds from the QM region (l) = 0.2l Default is 5
- 4 bonds from the QM region (k) = 0.2k Default is 5
- 5 bonds from the QM region (j) = 0.2j Default is 5
- 6 bonds from the QM region (i) = 0.2i Default is 5
So all the atoms (and partial charges) in the MM region in the test molecule were being scaled to 0 when performing the EE. This explains the fact that no difference was uncovered in the previous calculation set.
A new calculation was submitted with the command scalecharge=555 so that all charges in the MM part will be scaled to 1 (0.2xn,m,l)
Charges...again
So since i deleted the charges on the atoms where they can collapse, the overall charge is now -19.6 from the original -6 minus the 13.6 in charge deleted. REALLY ANNOYING! This is a huge difference in charges so will have to be solved. One easy option would be to set the involved atoms to 0 then run QEQ on the atoms with unassigned charges... essentially distributing the -6 charge through all atoms except those with 0 VdW parameters.
Tues 15th Dec 2009
The more charged molecule failed once the charges were scaled to 1 since SCF convergence was not attained.
Mon 25th Jan 2010
TAO tool publication was very good read... has a good suggestion of an optimisation procedure:
- Take the crystal structure
- Add Hydrogens
- Determine protonation
- MM geometry optimisation
- (MD?)
- (Look at several samples of the MD)
- Optimise a final structure
Thurs 28th Jan 2010
Tao Tools requires preprocess within AMBER tools to run the scripts correctly. Does not have HIS on the residue database... Amber defines the protonation state within its residue naming scheme... HIS is either HIE, HIP, HID depending on its protonation state. [1]
Thurs 4th Jan 2010
Preprocessing
To get TAO tools and also for completeness it is necessary to preprocess the PDB file before attempting to run ONIOM calculations, this includes:
- Adding hydrogens
- Create a custom residue file for the chromophore
- Convert from PDB to gjf ONIOM input
Ambertools are extremely helpful with this preprocessing especially since amber will be the low level theory used throughout this project. Within ambertools are a number of useful programs:
- antechamber - has a parameterlookup function within amber and GAFF parameter library files.
- xLeap - builds the protein from its constituent residues - adds hydrogens according to library files - allows a parm file to be built to add custom residues.
I will be working on using these 2 programs to create a full MM input file which is fully defined and correct.
xLeap
To load xLeap we must use X11 to get the graphical interface and the ssh -Y command to allow tunneling back to X11. Also the command line to start xLeap should be: xleap -s -f /apps/ambertools/amber11/dat/leap/cmd/leaprc.ff03. This loads the ff03 parameter file and results in the start up of the graphical interface with:
Welcome to LEaP! Sourcing leaprc: /amber/dat/leap/cmd/leaprc Log file: ./leap.log Loading parameters: /amber/dat/leap/parm/parmME.dat Loading library: /amber/dat/leap/lib/all_nucleic94.lib Loading library: /amber/dat/leap/lib/all_aminoME.lib Loading library: /amber/dat/leap/lib/all_aminoctME.lib Loading library: /amber/dat/leap/lib/all_aminontME.lib Loading library: /amber/dat/leap/lib/ions94.lib Loading library: /amber/dat/leap/lib/water.lib
This shows that the residue libraries have been loaded. We are then free to load a pdb file which will be read by xLeap and built according to the residues! In addition to the residue names xleap also requires information about the chain connectivity. Residues can be connected to other residues from both sides, only one side or not at all. xleap is fairly simple in that it expects that all residues in the pdb file are connected in the order in which they are listed unless they are separated by a "TER" card. In most cases where a single protein chains is being investigated like here - the TER card should end the residue information.
To load a pdb use (ignore the <>) <name for use in xLeap>=loadpdb "<name of file>". Make sure that the file is in your PATH else fully define the path in the filename.
The pdb should then be read into xLeap and built. At the end of the output we get (I loaded ChainA of the 1W7S without any water or hydrogens):
Leap added 1765 missing atoms according to residue templates: 1 Heavy 1764 H / lone pairs The file contained 21 atoms not in residue templates
This shows that Leap added hydrogens automatically and also a missing "heavy" atom (one which is not a H or lp):
Added missing heavy atom: .R<CHIE 229>.A<OXT 18>
For amino acids, the traditional order of atom records within a residue is N, CA, C, O, followed by the sidechain atoms (CB, CG1, CG2 . . . ) in order first of increasing remoteness, and then branch. The extra oxygen at the carboxyl terminal is called " OXT", and appears after all sidechain atoms.
Now that we have established that there are 21 atoms that are in the non standard residue region we need to define a custom residue which describes this region (chromophore). This involves creating a prep file....
I first drew up the undefined section of GFP (chromophore) in Gaussview5 and saved as a.pdb then:
- Load in antechamber and convert to mol2 format using antechamber -i <input> -fi pdb -o <output> -fo mol2 -c bcc
- Use parmchk to make sure the molecule was defined in the GAFF force field - kicks out a .frcmod file with missing parameters (including dihedrals) - nice :D!
- Manually add the "1CSY" where a standalone "0" is in the mol2 file so Leap can read it
- Load the gaff parameter file to Leap using source leaprc.gaff
- Load the chromophore_CSY_full.mol2 to Leap using CSY=loadmol2 "ChromophoreGFP_CSY_full.mol2"
- Check the residue using check CSY - Notice missing parameters are there.
- Load the ChromophoreGFP_CSY_full.frcmod file created by parmchk using loadamberparams ChromophoreGFP_CSY_full.frcmod
- Now check the CSY residue again using check CSY - No Problems! woo hoo! Now I can create amber parameters :D
Tues 9th Feb 2010
RESP Fitting
To get charges that are like those of the standard amber force field an RESP fitting method has to be used. This involves 3 steps:
- First, the molecule studied is optimized to determine a stable minimum (using Gaussian).
- Then, this minimized structure is used to calculate a Molecular Electrostatic Potential (MEP) on a three-dimensional grid (using Gaussian).
- Finally, this grid is exported into the "RESP program" which is used to fit atom-centered charges to the MEP
To assist in this procedure - I am using a program called RED - [2] which can be used in all 3 stages...
- Ante_RED is used to prepare Gaussian input files for the RESP fit
- RED III is used to fit atom centered charges
The procedure used to create the CSY file for RESP fitting is as follows:
- Cut the CSY residue from the 1W7S protein structure and cap terminal ends with methyl groups
- Optimise this in Gaussian at PM6 level
- Run the optimised CSY geometry through ante_RED to create the Gaussian input file involving minimisation at HF level and calculation of the MEP - also get the .P2N file required for RED III
- Run this Gaussian input file
Input file for optimisation:
#P hf/6-31G* Opt=(Tight,CalcFC) Freq SCF(Conver=8) Test
This will run a HF level optimisation and frequency analysis. Opt=tight tightens the cutoffs on forces and step size that are used to determine convergence. An optimization with Opt=Tight will take several more steps than with the default cutoffs. For molecular systems with very small force constants (low frequency vibrational modes), this may be necessary to ensure adequate convergence and reliability of frequencies computed in a subsequent job step. This option can only be used with Berny optimizations. Opt = CalcFC causes the calculation of an analytical Hessian (in HF). SCF(Conver=8) sets the SCF convergence criterion to 10^-8.
Parameter files for use in Gaussian (Gaussian only uses amber94)
Using external parameter files. It is possible to include MM function entries within an external file by replacing the entries within the input file with the usual Gaussian include file mechanism: e.g., @oplsaa.prm. Parameter files for various force fields are included within the main Gaussian directory (i.e., $g09root/g09). If you want to specify the entire force field via an external file, use an input file like the following:
- UFF=SoftOnly
Read-in force field example
molecule specification
@$g09root/g09/oplsaa.prm
Note that the choice of a molecular mechanics keyword is arbitrary as the entire force field depends on the read-in functions and parameters and not on any internal ones (in other words, specifying Amber would yield the same result as UFF).
Tues 16th Feb 2010
RESP Fitting
Following optimisation of the fragment for RESP fitting as seen before. We now have to fit the RESP to a MEP, this involves a few calculations:
- ESP calcs in gaussian
- Fitting calcs by resp in ambertools
The result is a .mol2 containing the fitted charges. The process has 2 input files:
- .log
- .p2n
The p2n is created by ante_RED and controls the RESP fitting process. (See RED site). To get a fit that excludes the capping methyls we add a REMARK line:
REMARK INTRA-MCC 0.0 | 35 36 37 38 39 40 41 42 | R
This must be formatted EXACTLY as shown here (2 spaces between the numbers). The numbers correspond to atoms numbers, the 0.0 constrains these atoms to 0.0 charge and the R tells RESP to remove these atoms from the 2nd round fit and the subsequent mol2 output. Leaving a fragment.
The RED program can be run on CX1! I used the following Jobscript (REDjobscript_4):
#PBS -l ncpus=1 #PBS -l mem=6000mb #PBS -l walltime=01:00:00 #PBS -q pqchem #PBS -joe module load ambertools module load gaussian export GAUSS_SCRDIR=$TMPDIR echo $GAUSS_SCRDIR cd $HOME pwd perl /home/af706/RED-vIII.3.pl > /home/af706/RED-vIII.log
So here we are using the chemistry node pqchem:
- loading the ambertools (for RESP) and gaussian modules
- defining a scratch directory for the RESP program to read gaussian scratch
- TEMPDIR is used since the scratch must be empty upon calculation start. The temporary directory will be created fresh on each node as the calculation is started, and it's unlikely that two calculations will run on the same node.
- Moving to my HOME directory /home/af706 because the script must be run in the same directory as the input files (.log and .p2n)
- RUN!
The fitting process runs single point energy calcs and fits the RESP - overall it took 3mins for my fragment.
Going from xLeap out to Gaussian in
2 ways to do this:
- Load in Gaussview
- Use Pengs scripts
Gaussview
- The file loads and all residues are recognised (apart from the chromophore). The file contains the same number of atoms as the saved Leap file.
- Charges are assigned from amber94 - not the chromophore still
- Amber atom types are assigned correctly to ALL atoms
- All that remains is to add charges from the RESP fit.
- PDB information is retained in input
TAO
- Atoms are added so atom count is different to the file leaving Leap
- Fails to treat terminal residues correctly
- Lose residue information
- Charges are missing from chromophore and atoms in terminal residues
22nd Feb 2010
Links to HOW TO for Ambertools + Gaussview/ Ambertools + TAO tools + Gaussview
- Ambertools + Gaussview - https://www.ch.ic.ac.uk/wiki/index.php/AndyForester_Ambertools_Gaussview
- Ambertools + TAO tools + Gaussview - https://www.ch.ic.ac.uk/wiki/index.php/AndyForester_Ambertools_TAO
- Running Amber only Calculations - https://www.ch.imperial.ac.uk/wiki/index.php/AndyForesterAmber
Running ONIOM calculations - https://www.ch.ic.ac.uk/wiki/index.php/AndyForesterONIOM
26th Feb 2010
Picking out model Modes
Integers and integer ranges without a keyword are interpreted as mode numbers, although the [not]mode keywords may be used. The keywords atoms and notatoms can be used to define an atom list whose modes should be included/excluded (respectively). Atoms can also be specified by ONIOM layer via the [not]layer keywords, which accept these values: real for the real system, model for the model system in a 2-layer ONIOM, middle for the middle layer in a 3-layer ONIOM, and small for the model layer of a 3-layer ONIOM. Atoms may be similarly included/excluded by residue with residue and notresidue, which accept lists of residue names or numbers. Both keyword sets function as shorthand forms for atom lists.
TAO tools
The expected differences to the one without using his tools are:
- Different partial charges since his scripts use all_amino03 to assign charges,
Gaussview uses amber94 charges (Net charge is the same for both files - good thing)
- Potential for some changes in atom-types due to different versions of amber being used.
What actually happened:
- Difference in individual partial charges but same overal net charge
- Different atom type assignment led to some more undefined parameters.
- Defining these and running an energy led to significantly different energy than the non tools
method:
WITHOUT tools
AMBER calculation of energy. Energy per function type: Direct non-bonded 122152.193595037 Non-bonded pair -122162.319396395 Harmonic stretch I 0.354921205 Harmonic bend I 0.989669436 Amber torsion 1.587390468 Improper torsion 0.013641893 Energy per function class: Coulomb -9.200657698 Vanderwaals -0.925143661 Stretching 0.354921205 Bending 0.989669436 Torsion 1.587390468 Out-of-plane 0.013641893 Energy= -7.18017836 NIter= 0. Dipole moment= -25.836974 -115.466231 -34.635294
WITH TAO tools
AMBER calculation of energy. Energy per function type: Direct non-bonded 137343.707022741 Non-bonded pair -137347.284026475 Harmonic stretch I 0.354921205 Harmonic bend I 1.082982995 Amber torsion 1.589487307 Improper torsion 0.013645198 Energy per function class: Coulomb -6.890960120 Vanderwaals 3.313956385 Stretching 0.354921205 Bending 1.082982995 Torsion 1.589487307 Out-of-plane 0.013645198 Energy= -0.535967030 NIter= 0. Dipole moment= -29.988370 -117.918540 -32.409355
Obvious where the differences are.