Jump to content

AndyForester

From ChemWiki

HBDI Calculations

Wed 21st Oct 2009

- Started project work
- Gaussview and Gaussian working
- Working out how to use the command inputs
- Trying to sort VPN out
- Working to get the project proposal written


Tue 27th Oct 2009

Drew up the HBDI molecule in Gaussview, the molecule will be optimised at the HF 3-21G level on the mac (locally).
DFT B3LYP 6-31G (Becke threeparameter hybrid exchange functional5 and the Lee-Yang-Parr correlation functional) method optimisation was run on the resulting file, the MP2 6-31G method optimisation. Each of these had a corresponding freq calculation to check that minima were reached. All calculations were successful, with no negative vibrations.


Wed 28th Oct 2009

Ran DFT B3YLP 6-311G calculations on HBDI (Opt and Freq) since these were the calculations used in a published study of HBDI.
Optimisation job finshed successfully, the frequency calculation did not finish succesfully. Error was:

****Warning!!: The largest alpha MO coefficient is  0.17588883D+02
Symmetrizing basis deriv contribution to polar:
IMax=3 JMax=2 DiffMx= 0.00D+00
G2DrvN: will do    29 centers at a time, making    1 passes doing MaxLOS=1.
Calling FoFCou, ICntrl=  3107 FMM=F I1Cent=   0 AccDes= 0.00D+00.
FoFDir/FoFCou used for L=0 through L=1.
End of G2Drv Frequency-dependent properties file   721 does not exist.
End of G2Drv Frequency-dependent properties file   722 does not exist.
         IDoAtm=1111111111111111111111111111
         Differentiating once with respect to electric field.
               with respect to dipole field.
         Differentiating once with respect to nuclear coordinates.
         Keep R1 ints in memory in canonical form, NReq=447529180.


Thurs 29th Oct 2009

Learning to use ONIOM

The example on chem wiki was attempted. Drawn up in Gaussview and optimised initially using molecular mechanics AMBER force field.
The ONIOM calculation was set up in Gaussview using DFT B3LYP 6-311G(d):AMBER methods with the benzene ring as the high level and chain at low level. Electronic embedding was NOT used in this initial calculation.

This conformation is NOT the lowest energy conformation, the initial AMBER MM calculation did not find the global minimum probably due to the drawn up starting geometry.
The molecule was drawn up again and DFT B3LYP 6-311G(d):AMBER and pure B3LYP 6-311G(d) calculations run. Absolute energy differences between the 2 conformations was very small at the ONIOM(DFT B3LYP 6-311G(d):AMBER) level. -232.27532671 vs -232.27315709 au respectively, a 0.00271au (4.359 744 17×10-18 J = 1au) difference.

It has already become obvious that using ONIOM decreases calculation time considerably. The table below shows a "CPU time" comparison on the above molecule using the different approaches.

Calculation Time (1st attempt) Time (2nd attempt)
ONIOM(DFT B3LYP 6-311G(d):AMBER) 4 minutes 22.0 seconds 8 minutes 5.5 seconds
DFT 3LYP 6-311G(d) 2 hours 12 minutes 21.9 seconds 2 hours 30 minutes 0.8 seconds


Mon 2nd Nov 2009

Having obtained optimized structures of HBDI at various theory levels and basis set variations, these structures will now be compared to previous literature. Extensive studies have been performed using HF and B3LYP methods previously. It has been noted in Jaspers recent review that HF does not model frequencies very accurately - my computations agree HF 6-31G frequencies are very far out. B3LYP frequencies are much closer to the experimental values.

Tues 3rd Nov 2009

Tabulated frerquency comparison

A.Esposito HF 6-31G(d) A.Esposito B3LYP 6-31G(d) X.He B3LYP 6-31G(d) HF 6-31G(d) B3LYP 6-31G(d)
C=O 1987 1802 1802 1986 1801
C=C 1880 1705 1704 1880 1704
Phenol 1809 1665 1663 1812 1666
Phenol 1781 1635 1631 1777 1635
C=N 1844 1620 1618 1844 1618
Phenol 1696 1562 1558 1694 1562
C-N 1607 1483 N/A 1606 1482
Phenol 1602 1487 N/A 1603 1488
Total Energy(au) -720.043 -724.465

Thurs 5th Nov 2009

Having got Jaspers input files, several things were noticed:

  • Solvent models were used - should be in vacum.... I have re run these in G09 with No solvent on both DFT 6-311G(d,p) and HF 6-31+G(d) methods/basis sets for comparison
  • The models in his Biophys paper using other residues in contact COO- for example are not present. He has not reported the pure HBDI calculation results.
  • More than just HBDI is likely to be required, meaning larger than expected high level region in the GFP calculation. (3 layer ONIOM DFT:HF:AMBER)?


Other things to do:

  • Work on a guide to loading .pdb files into Gaussview 5 - how much information is maintained?
  • Are residues defined still in Gaussview input files
  • AMBER MM requires definitions of charges etc... how easy is it to import from .pdb and get a file that can be run in opt etc...


  • Find a model system which is larger than HBDI but incorporates this molecule <=500 atoms (Limit of size for pure DFT calculation)
  • Possibilities include a system alread developed by someone else online
  • A "custom" system made by me to model this (time intensive)
  • "strip" back the GFP pdb file to 500 atoms surrounding the HBDI chromophore.


Fri 6th Nov 2009

Yesterday started by trying to follow the tutorial on loading the GFP molecule from .pdb, this resulted in a problem with Gaussviews addition of hydrogens. It over protonated the nitrogens and carbon in the chromophore leading to over valent parts. I then tried several other molecules (PYP, a scorpion toxin and a strand of RNA) which had no problems with hydrogen addition.

I tried to use ChemBio 3D to load the .pdb and correctly assign hydrogens/connectivity. This resulted in the same protonation problems!

Mon 9th Nov 2009

TINKER - http://dasher.wustl.edu/tinker/ was installed with the Force Field Explorer (FFE) interface on the mac. This was used to load the same .pdb file and resulted in the same protonation around the chromophore. So far the SAME protonation pattern (incorrect) around the chromophore has been found using:

  • Gaussview
  • ChemBio 3D
  • TINKER

Looking into how the .pdb file format works, the molecule is built from a series of residues, it becomes clear as you look at the sequence that is building the GFP protein chromophore (residues 65,66,67 in 1GFL) are not then reproducing the internal cyclisation properly leaving protons attached and thus atoms hypervalent. This accounts for the consistent error loading from the .pdb. As a result must all molecules expressed as a .pdb file must be pure protein structures...?

Comparison of the 2 available wild type GFP structures

The 2 structures being considered here are:

These 2 structures differ in that 1GFL contains 2 GFP molecules, whereas 1W7S contains 4. These were both stripped down to 1 unit of GFP, using the "pdb secondary structure" function in Gaussview 5. 1GFL had chain B removed and 1W7S had chains B,C and D removed. Next the residues were investigated that created the chromophore. The results are not consistent because GFL uses 3 residues (65,66,67) to form the chromophore. 1W7S has residue 66 (labelled CSY) as the entire chromophore! Both get protonated the same (incorrectly) which is odd considering how they molecules are built.

The next test was to "modify bond" in Gaussview 5 and create a "cleaned" chromophore structure (no bonding information from the residue) by manually changing the bonds. The chromophore in each .pdb file was altered to look like HBDI in terms of double single bonds and the hydrogens taken out (makes no difference as these are created each time the .pdb is loaded). When the new .pdb files (1GFL_cleaned.pdb and 1W7S_cleaned.pdb) were loaded they were correctly protonated! Success.

1W7S improvement

Chain A (not cleaned) of the 1W7S .pdb was submitted to run a AMBER calculation, as expected it failed with the lines:

Missing atomic parameters for atom   474 IAtTyp=    20000000
Missing atomic parameters.

This error says that the atom type of atom 474 is not defined. One way to counter this is to locate the atom and then change its definition manually (as shown in the example https://www.ch.ic.ac.uk/wiki/index.php/PYP_:_Photoactive_yellow_protein - time intensive!)

Instead the molecule was cleaned (manually changing the bonding in the chromophore), after the bond updates the structure was re submitted, again failing but with a different message:

Include all MM classes
 Bondstretch undefined between atoms    473    478 CC-N* [H,H]  
 Bondstretch undefined between atoms    475    476 CC-CC [H,H]  
 Bondstretch undefined between atoms    475    486 CC-CM [H,H]  
 Bondstretch undefined between atoms    476    477 CC-O [H,H]  
 Bondstretch undefined between atoms    476    478 CC-N* [H,H]  
 Bondstretch undefined between atoms    486   2559 CM-HC [H,H]  
 Angle bend  undefined between atoms    473    474    475 CC-NB-CC [H,H,H]  
 Angle bend  undefined between atoms    473    478    479 CC-N*-CT [H,H,H]  
 Angle bend  undefined between atoms    473    478    476 CC-N*-CC [H,H,H]  
 Angle bend  undefined between atoms    473    482    483 CC-CT-N [H,H,H]  
 Angle bend  undefined between atoms    473    482   3857 CC-CT-H1 [H,H,H]  
 Angle bend  undefined between atoms    474    475    476 NB-CC-CC [H,H,H]  
 Angle bend  undefined between atoms    474    475    486 NB-CC-CM [H,H,H]  
 Angle bend  undefined between atoms    474    473    478 NB-CC-N* [H,H,H]  
 Angle bend  undefined between atoms    475    476    477 CC-CC-O [H,H,H]  
 Angle bend  undefined between atoms    475    476    478 CC-CC-N* [H,H,H]  
 Angle bend  undefined between atoms    475    486    487 CC-CM-CA [H,H,H]  
 Angle bend  undefined between atoms    475    486   2559 CC-CM-HC [H,H,H]  
 Angle bend  undefined between atoms    476    478    479 CC-N*-CT [H,H,H]  
 Angle bend  undefined between atoms    476    475    486 CC-CC-CM [H,H,H]  
 Angle bend  undefined between atoms    477    476    478 O-CC-N* [H,H,H]  
 Angle bend  undefined between atoms    478    479    480 N*-CT-C [H,H,H]  
 Angle bend  undefined between atoms    478    473    482 N*-CC-CT [H,H,H]  
 Angle bend  undefined between atoms    486    487    488 CM-CA-CA [H,H,H]  
 Angle bend  undefined between atoms    486    487    489 CM-CA-CM [H,H,H]  
 Angle bend  undefined between atoms    487    486   2559 CA-CM-HC [H,H,H]  
 Angle bend  undefined between atoms    488    487    489 CA-CA-CM [H,H,H]  
 Angle bend  undefined between atoms    490    492    491 CA-C-CM [H,H,H]  
 Angle bend  undefined between atoms    491    492    493 CM-C-OH [H,H,H]  
 MM function not complete

This is a better error message since all the atoms are now defined, only parameters are missing. For parameters that involve only atoms in the chromophore at the high level in the ONIOM calculation, the values can be set to 0 in the input file. This is because the ONIOM extrapolation will cancel MM from this high level region, To do this the keyword "Amber=Hardfirst" and wildcards for the missing parameters which are zero valued are placed in the input file.


The above screeshot shows highlighted the atoms listed in the error... most of them if not all will be involved in the high level calculation and thus may not require parameters defining (other than to be set to 0). The reason the parameters are undefined is that the AMBER force field is very specific to protein bonds and thus the organic chromophore has bonds that are not found in this parameter set.

The atoms in the chromophore were frozen in the atom group editor and the input file had a parameter set added to the end setting chromophore parameters to 0:

HrmStr1 CC N* 0.0 0.0
HrmStr1 CC CC 0.0 0.0
HrmStr1 CC CM 0.0 0.0
HrmStr1 CC O 0.0 0.0
HrmStr1 CM HC 0.0 0.0
HrmBnd1 CC NB CC 0.0 0.0
HrmBnd1 CC N* CC 0.0 0.0
HrmBnd1 CC CT N 0.0 0.0
HrmBnd1 CC CT H1 0.0 0.0
HrmBnd1 NB CC CC 0.0 0.0
HrmBnd1 NB CC CM 0.0 0.0
HrmBnd1 NB CC N* 0.0 0.0
HrmBnd1 CC CC O  0.0 0.0
HrmBnd1 CC CC N* 0.0 0.0
HrmBnd1 CC CM CA 0.0 0.0
HrmBnd1 CC CM HC 0.0 0.0
HrmBnd1 CC CC CM 0.0 0.0
HrmBnd1 O CC N*  0.0 0.0
HrmBnd1 N* CT C  0.0 0.0
HrmBnd1 N* CC CT 0.0 0.0
HrmBnd1 CM CA CA 0.0 0.0
HrmBnd1 CA CM HC 0.0 0.0
HrmBnd1 CA C CM 0.0 0.0
HrmBnd1 CM C OH 0.0 0.0
HrmBnd1 CC N* CT 0.0 0.0
HrmBnd1 CM CA CM 0.0 0.0

This file then ran successfully for an AMBER single point energy calculation. It was then submitted for AMBER optimisation using "amber=hardfirst geom=connectivity" and returned a successful minimum with Forces and Displacements converged:

        Item               Value     Threshold  Converged?
 Maximum Force            0.000089     0.000450     YES
 RMS     Force            0.000004     0.000300     YES
 Maximum Displacement     0.000089     0.001800     YES
 RMS     Displacement     0.000004     0.001200     YES
 Predicted change in Energy=-1.156135D-07
 Optimization completed.
    -- Stationary point found.

Cleaned .pdb files and TINKER

  • The files that had been cleaned were opened in TINKER, it is necessary to convert the .pdb to a .xyz file to run any calculations within TINKER. This is currently failing due to an "atom missing on SER-228" - checking this there is no SER residue at 228 (in fact a GLU) so a bit puzzled. This is the 1W7S cleaned version.
  • PYP was loaded in its .pdb format to TINKER and was immediately successful in being converted to .xyz then a MINIMISE algorithm ran successfully with the energy at its X-ray crystal structure as 261KJ/mol reduced to -3670.5355KJ/mol in 1645 iterations - this took a while (45mins locally)!
  • Interestingly the .xyz output from the minimization can be converted back to .pdb (loses hydrogens minimization) and this PYP file was loaded into Gaussview 5 - the file then ran successfully for an AMBER single point energy calculation! cf. the first attempt to get PYP to run was unsuccessful, and the tutorial underwent much more hassle to run this calculation.

What .pdb information is retained?

The only molecule that has been run with AMBER opt (GFP) has lost its secondary structure information but RETAINED its residue information.

Fri 13th Nov 2009

Conversion from.xyz to .com

A protein with 2 residues (Glu-Asp) was built using TINKER, saved as:

  • .xyz
  • .pdb
  • .com

The initial format was .xyz (created in TINEKR as a native format), converted to .pdb using TINKER and then loaded into Gaussview and saved as a .com.

.xyz

   36  Protein built with TINKER/Force Field Explorer
    1  CT     0.000000    0.000000    0.000000   340     2     4     5     6
    2  C      0.000000    0.000000    1.510000   342     1     3     7
    3  O      1.028938    0.000000    2.165506   343     2
    4  HC     1.056270    0.000000   -0.341166   341     1
    5  HC    -0.497647    0.931694   -0.341166   341     1
    6  HC    -0.497647   -0.931694   -0.341166   341     1
    7  N     -1.236201    0.000000    2.027114   232     2     8    11
    8  CT    -1.446961    0.000000    3.471822   233     7     9    12    13
    9  C     -2.509557   -0.992751    3.878558   234     8    10    22
   10  O     -3.569672   -1.098634    3.284129   236     9
   11  H     -2.025518    0.000000    1.381066   235     7
   12  H1    -0.503221   -0.281395    3.983946   237     8
   13  CT    -1.910491    1.397334    3.923704   238     8    14    18    19
   14  CT    -2.132799    1.397334    5.447574   240    13    15    20    21
   15  C     -2.585014    2.781238    5.848089   242    14    16    17
   16  O2    -2.816736    2.973609    7.061266   243    15
   17  O2    -2.693194    3.629064    4.935955   243    15
   18  HC    -2.861794    1.653482    3.412322   239    13
   19  HC    -1.132567    2.145528    3.664589   239    13
   20  HC    -1.170306    1.191123    5.960588   241    14
   21  HC    -2.946052    0.685840    5.701535   241    14
   22  N     -2.153997   -1.723291    4.944151   210     9    23    26
   23  CT    -3.047986   -2.739703    5.491225   211    22    24    27    28
   24  C     -3.120370   -2.663723    6.997574   212    23    25    34
   25  O     -2.128241   -2.535414    7.695870   214    24
   26  H     -1.239994   -1.547921    5.361576   213    22
   27  H1    -4.072297   -2.585746    5.092246   215    23
   28  CT    -2.526745   -4.139104    5.114946   216    23    29    32    33
   29  C     -3.476741   -5.157722    5.698062   218    28    30    31
   30  O2    -3.212872   -6.362275    5.493304   219    29
   31  O2    -4.454666   -4.718803    6.341112   219    29
   32  HC    -1.530243   -4.298013    5.577366   217    28
   33  HC    -2.540086   -4.252948    4.010880   217    28
   34  N     -4.374888   -2.752228    7.460123   344    24    35    36
   35  H     -5.140865   -2.858197    6.794954   345    34
   36  H     -4.534830   -2.711583    8.466684   345    34

.pdb

HEADER    Protein built with TINKER/Force Field Explorer
COMPND
SOURCE
ATOM      1  CH3 ACE     1       0.000   0.000   0.000
ATOM      2  C   ACE     1       0.000   0.000   1.510
ATOM      3  O   ACE     1       1.029   0.000   2.166
ATOM      4  H1  ACE     1       1.056   0.000  -0.341
ATOM      5  H2  ACE     1      -0.498   0.932  -0.341
ATOM      6  H3  ACE     1      -0.498  -0.932  -0.341
ATOM      7  N   GLU     2      -1.236   0.000   2.027
ATOM      8  CA  GLU     2      -1.447   0.000   3.472
ATOM      9  C   GLU     2      -2.510  -0.993   3.879
ATOM     10  O   GLU     2      -3.570  -1.099   3.284
ATOM     11  CB  GLU     2      -1.910   1.397   3.924
ATOM     12  CG  GLU     2      -2.133   1.397   5.448
ATOM     13  CD  GLU     2      -2.585   2.781   5.848
ATOM     14  OE1 GLU     2      -2.817   2.974   7.061
ATOM     15  OE2 GLU     2      -2.693   3.629   4.936
ATOM     16  H   GLU     2      -2.026   0.000   1.381
ATOM     17  HA  GLU     2      -0.503  -0.281   3.984
ATOM     18  HB2 GLU     2      -2.862   1.653   3.412
ATOM     19  HB3 GLU     2      -1.133   2.146   3.665
ATOM     20  HG2 GLU     2      -1.170   1.191   5.961
ATOM     21  HG3 GLU     2      -2.946   0.686   5.702
ATOM     22  N   ASP     3      -2.154  -1.723   4.944
ATOM     23  CA  ASP     3      -3.048  -2.740   5.491
ATOM     24  C   ASP     3      -3.120  -2.664   6.998
ATOM     25  O   ASP     3      -2.128  -2.535   7.696
ATOM     26  CB  ASP     3      -2.527  -4.139   5.115
ATOM     27  CG  ASP     3      -3.477  -5.158   5.698
ATOM     28  OD1 ASP     3      -3.213  -6.362   5.493
ATOM     29  OD2 ASP     3      -4.455  -4.719   6.341
ATOM     30  H   ASP     3      -1.240  -1.548   5.362
ATOM     31  HA  ASP     3      -4.072  -2.586   5.092
ATOM     32  HB2 ASP     3      -1.530  -4.298   5.577
ATOM     33  HB3 ASP     3      -2.540  -4.253   4.011
ATOM     34  N   NH2     4      -4.375  -2.752   7.460
ATOM     35  H1  NH2     4      -5.141  -2.858   6.795
ATOM     36  H2  NH2     4      -4.535  -2.712   8.467
END

.com

%chk=File_Test_Glu-Asp.chk
# hf/3-21g geom=connectivity

Protein built with TINKER/Force Field Explorer

0 1
C(PDBName=CH3,ResName=ACE,ResNum=1)                   0.00000000    0.00000000    0.00000000
C(PDBName=C,ResName=ACE,ResNum=1)                     0.00000000    0.00000000    1.51000000
O(PDBName=O,ResName=ACE,ResNum=1)                     1.02900000    0.00000000    2.16600000
H(PDBName=H1,ResName=ACE,ResNum=1)                    1.05600000    0.00000000   -0.34100000
H(PDBName=H2,ResName=ACE,ResNum=1)                   -0.49800000    0.93200000   -0.34100000
H(PDBName=H3,ResName=ACE,ResNum=1)                   -0.49800000   -0.93200000   -0.34100000
N(PDBName=N,ResName=GLU,ResNum=2)                    -1.23600000    0.00000000    2.02700000
C(PDBName=CA,ResName=GLU,ResNum=2)                   -1.44700000    0.00000000    3.47200000
C(PDBName=C,ResName=GLU,ResNum=2)                    -2.51000000   -0.99300000    3.87900000
O(PDBName=O,ResName=GLU,ResNum=2)                    -3.57000000   -1.09900000    3.28400000
C(PDBName=CB,ResName=GLU,ResNum=2)                   -1.91000000    1.39700000    3.92400000
C(PDBName=CG,ResName=GLU,ResNum=2)                   -2.13300000    1.39700000    5.44800000
C(PDBName=CD,ResName=GLU,ResNum=2)                   -2.58500000    2.78100000    5.84800000
O(PDBName=OE1,ResName=GLU,ResNum=2)                  -2.81700000    2.97400000    7.06100000
O(PDBName=OE2,ResName=GLU,ResNum=2)                  -2.69300000    3.62900000    4.93600000
H(PDBName=H,ResName=GLU,ResNum=2)                    -2.02600000    0.00000000    1.38100000
H(PDBName=HA,ResName=GLU,ResNum=2)                   -0.50300000   -0.28100000    3.98400000
H(PDBName=HB2,ResName=GLU,ResNum=2)                  -2.86200000    1.65300000    3.41200000
H(PDBName=HB3,ResName=GLU,ResNum=2)                  -1.13300000    2.14600000    3.66500000
H(PDBName=HG2,ResName=GLU,ResNum=2)                  -1.17000000    1.19100000    5.96100000
H(PDBName=HG3,ResName=GLU,ResNum=2)                  -2.94600000    0.68600000    5.70200000
N(PDBName=N,ResName=ASP,ResNum=3)                    -2.15400000   -1.72300000    4.94400000
C(PDBName=CA,ResName=ASP,ResNum=3)                   -3.04800000   -2.74000000    5.49100000
C(PDBName=C,ResName=ASP,ResNum=3)                    -3.12000000   -2.66400000    6.99800000
O(PDBName=O,ResName=ASP,ResNum=3)                    -2.12800000   -2.53500000    7.69600000
C(PDBName=CB,ResName=ASP,ResNum=3)                   -2.52700000   -4.13900000    5.11500000
C(PDBName=CG,ResName=ASP,ResNum=3)                   -3.47700000   -5.15800000    5.69800000
O(PDBName=OD1,ResName=ASP,ResNum=3)                  -3.21300000   -6.36200000    5.49300000
O(PDBName=OD2,ResName=ASP,ResNum=3)                  -4.45500000   -4.71900000    6.34100000
H(PDBName=H,ResName=ASP,ResNum=3)                    -1.24000000   -1.54800000    5.36200000
H(PDBName=HA,ResName=ASP,ResNum=3)                   -4.07200000   -2.58600000    5.09200000
H(PDBName=HB2,ResName=ASP,ResNum=3)                  -1.53000000   -4.29800000    5.57700000
H(PDBName=HB3,ResName=ASP,ResNum=3)                  -2.54000000   -4.25300000    4.01100000
N(PDBName=N,ResName=NH2,ResNum=4)                    -4.37500000   -2.75200000    7.46000000
H(PDBName=H1,ResName=NH2,ResNum=4)                   -5.14100000   -2.85800000    6.79500000
H(PDBName=H2,ResName=NH2,ResNum=4)                   -4.53500000   -2.71200000    8.46700000

1 2 1.0 4 1.0 5 1.0 6 1.0
2 3 2.0 7 1.5
3
4
5
6
7 16 1.0 8 1.0
8 9 1.0 11 1.0 17 1.0
9 10 2.0 22 1.5
10
11 18 1.0 12 1.0 19 1.0
12 21 1.0 13 1.0 20 1.0
13 14 2.0 15 2.0
14
15
16
17
18
19
20
21
22 23 1.0 30 1.0
23 26 1.0 24 1.0 31 1.0
24 34 1.5 25 2.0
25
26 27 1.0 33 1.0 32 1.0
27 28 2.0 29 2.0
28
29
30
31
32
33
34 35 1.0 36 1.0
35
36

The issue going from .xyz straight to .com is that ALL protein information would be lost ie. residue information that is written into the .pdb at .xyz to .pdb comversion.

PYP files as downloaded compared to run through TINKER

  • TINKER removes much of the header information
  • TINKER only preserves the 1st model of a .pdb file (may be multiple models)
  • TINKER removes far right 3 columns of information (not essential to the .pdb file)
  • A virtual residue in the original .pdb comprising the chromophore was omitted from the TINKER output (number 126)

What each TINKER search algorithm does

There are several programs within TINKER that may be able to help locate a global minimum (not just one near the published crystal structure) by way of searching the potential energy surface. All of these are MM methods for use in TINKER.
SCAN
A program for general conformational search of an entire potential energy surface via a basin hopping method. The program takes as input a TINKER .xyz coordinates file which is then minimized to find the first local minimum for a search list. A series of activations along various normal modes from this initial minimum are used as seed points for additional minimizations. Whenever a previously unknown local minimum is located it is added to the search list. When all minima on the search list have been subjected to the normal mode activation without locating additional new minima, the program terminates. The individual local minima are written to cycle files as they are discovered. While the SCAN program can be used on standard undeformed potential energy surfaces, we have found it to be most useful for quickly ‘‘scanning’’ a smoothed energy surface to enumerate the major basins of attraction spaning the entire surface.

SNIFFER
A program that implements the Sniffer global optimization algorithm of Butler and Slaminka, a discrete version of Griewank’s global search trajectory method. The program takes an input TINKER .xyz coordinates file and shakes it vigorously via a modified dynamics trajectory before, hopefully, settling into a low lying minimum. Some trial and error is often required as the current implementation is sensitive to various parameters and tolerances that govern the computation. At present, these parameters are not user accessible, and must be altered in the source code. However, this method can do a good job of quickly optimizing conformation within a limited range of convergence.

PSS
Implements our version of a potential smoothing and search algorithm for the global optimization of molecular conformation. An initial structure in .xyz format is first minimized in Cartesian coordinates on a series of increasingly smoothed potential energy surfaces. Then the smoothing procedure is reversed with minimization on each successive surface starting from the coordinates of the minimum on the previous surface. A local search procedure is used during the backtracking to explore for alternative minima better than the one found during the current minimization. The final result is usually a very low energy conformation or, in favorable cases, the global energy minimum conformation. The minimum energy coordinate sets found on each surface during both the forward smoothing and backtracking procedures are placed in sequentially numbered cycle files.

TINKER also contains programs that are able to perform MM optimisations:
MINIMIZE
The MINIMIZE program performs a limited memory L-BFGS minimization of an input structure over Cartesian coordinates using a modified version of the algorithm of Jorge Nocedal. The method requires only the potential energy and gradient at each step along the minimization pathway. It requires storage space proportional to the number of atoms in the structure. The MINIMIZE procedure is recommended for preliminary minimization of trial structures to an rms gradient of 1.0 to 0.1 kcal/mole/Å. It has a relatively fast cycle time and is tolerant of poor initial structures, but converges in a slow, linear fashion near the minimum. The user supplies the name of the TINKER .xyz coordinates file and a target rms gradient value at which the minimization will terminate. Output consists of minimization statistics written to the screen or redirected to an output file, and the new coordinates written to updated .xyz files or to cycle files.

OPTIMIZE
The OPTIMIZE program performs a optimally conditioned variable metric minimization of an input structure over Cartesian coordinates using an algorithm due to William Davidon. The method does not perform line searches, but requires computation of energies and gradients as well as storage for an estimate of the inverse Hessian matrix. The program operates on Cartesian coordinates from a TINKER .xyz file. OPTIMIZE will typically converge somewhat faster and more completely than MINIMIZE. However, the need to store and manipulate a full inverse Hessian estimate limits its use to structures containing less than a few hundred atoms on workstation class machines. As with the other minimizers, OPTIMIZE needs input coordinates and an rms gradient cutoff criterion. The output coordinates are saved in updated .xyz files or as cycle files.

NEWTON
A truncated Newton minimization method which requires potential energy, gradient and Hessian information. This procedure has significant advantages over standard Newton methods, and is able to minimize very large structures completely. Several options are provided with respect to minimization method and preconditioning of the Newton equations. The default options are recommended unless the user is familiar with the math involved. This program operates in Cartesian coordinate space and is fairly tolerant of poor input structures. Typical algorithm iteration times are longer than with nonlinear conjugate gradient or variable metric methods, but many fewer iterations are required for complete minimization. NEWTON is usually the best choice for minimizations to the 0.01 to 0.000001 kcal/mole/Å level of rms gradient convergence. Tests for directions of negative curvature can be removed, allowing NEWTON to be used for optimization to conformational transition state structures (this only works if the starting point is very close to the transition state). Input consists of a TINKER .xyz coordinates file; output is an updated set of minimized coordinates and minimization statistics.

Tues 17th Nov 2009

Comparing 2 .pdb files:

  • As downloaded from PDB
  • Converted to .xyz in TINKER then back to .pdb

These files were then cleaned up to compare in a terminal using the "diff" command:

  • Header information and end information removed
  • 3 columns to the right in PDB version were removed to become consistent with the TINEKR file
  • The first column "ATOM" was removed effectively making the line numbers equal to the atom numbers in both files

Wed 18th Nov 2009

1st diff was on line 52, a HYDROGEN ATOM:
52 HE2 HIS A 3 13.349 -10.803 9.527
added to the TINKER output, not in PDB file.

2nd diff was on line 657, a HYDROGEN ATOM:
657 HE2 GLU A 46 -0.867 -5.216 -0.822
added to the PDB file, not in TINKER output

3rd diff was on line 1007, a HYDROGEN ATOM (number 1008):
5HG CYS A 69 -8.909 -5.194 5.526
added to the TINKER output, not in PDB file.

4th diff was on line 1621, a HYDROGEN ATOM (number 1623):
HE2 HIS A 108 4.663 7.459 6.879
added to the TINKER output, not in PDB file.

Following these reconciliations, both files were exactly the same with 1911 atoms present in each. 3 atoms were removed from the TINKER file and 1 was removed from the PDB file.

  • NOTE - loading the original .pdb from PDB to Gaussview then converting to a .gjf results in an error "1 or more PDB atom names has a ' or * character which can cause problems for Gaussian" - this is because the PDB file has extra atoms on the end of the file to define a highlighted chromophore... After these were removed in a txt editor the .gjf save went smoothly. Infact after this removal the file will run an AMBEr single point energy calculation fine - cf running when the extra info is in the bottom.

All differences in files were due to protonation states of atoms... not the AA backbone.

TINKER vs PDB




The highlighted fragments represent where the respective file version is protonated and the other is not.

Fri 20th Nov 2009

In an attempt to load the 1W7S structure to TINKER and successfully convert it it a .xyz format the following procedure was undertaken:

Mon 23rd Nov 2009

Protonation of GFP at various pH conditions

With 230 amino acid residues, each possessing a unique pKa in GFP it is obvious that there is a huge spectrum of protonation states available at a single solvation pH. Since protons cannot be detected using X-ray diffraction, protonation states must be uncovered via a difference method. Several questions arise when thinking about protonation of the protein:

  • Which residues have the ability to be protonated/deprotonated
  • Is the PDB structure protonated for a specific pH (Bio pH)
  • How does the protonation of the protein change through a variety of solvation pHs
  • Does the protonation state affect the chromophore photophysics (electronic embedding)

Answering these questions:

  • Polar amino acids are those that can be protonated/deprotonated, these are identified by Gaussview and shown in the atom list of the molecule with the option of setting them to either protonated/deprotonated.
  • Dont know if the current structure is protonated to any specific pH... this is unclear until protonate at certain pHs
  • The protonation will be tested across a range of pHs to see how protonation changes

Tues 24th Nov 2009

The pKa value of a residue will indicate protonation state as a function of pH. If you know the pH of your solution then if the pKa of a given functional group is lower than this pH then the functional group will be unprotonated. If the pKa is higher than the pH then the functional group will be protonated. Beware that pKa values can be distorted by local conformational effects. The structure is protonated as though it were in pH 7 as it is loaded into Gaussview. Since hydrogens are defined in the PDB file it could be that the author added protons under these conditions.

Histadine has 2 protonation points on the imidizole R chain:

Yes-Yes Yes-No No-Yes

The imidazole sidechain of histidine has a pKa of approximately 6, and overall, the amino acid has a pKa of 7.6. This means that at physiologically relevant pH values, relatively small shifts in pH will change its average charge. Below a pH of 6, the imidazole ring is mostly protonated as described by the Henderson–Hasselbalch equation. When protonated, the imidazole ring bears two NH bonds and has a positive charge. The positive charge is equally distributed between both nitrogens and can be represented with two equally important resonance structures.

Protonation

At pH 7 the PDB file protonations are correct for all residues. Moving to pH 5 the histadine residues become protonated (pKa 6)- this is the only change. At more alkaline pHs it would take pH 9.1 to make any effect (deprotonating CYS and then TYR at 9.7)

Mini-Biomolecule

To test ONIOM calculations vs pure DFT methods and other calculation types on a bio system that is smaller than GFP (computationally less expensive, can streamline the real calcs)

Trying a molecule analogous to the example given in the ONIOM tutorial...

Thurs 26th Nov 2009

Checking the GFP input will run

Procedure to get 1WS7 to run an AMBER energy calculation:

  • Load the PDB file without water
  • Try opt amber but will fail due to missing parameters (all in the chromophore)
  • Set these parameters to 0 since they cancel in ONIOM
  • Freeze the atotms involving missing paramters
  • Energy AMBER will now run for the file - NO CHARGES - 9.07881632au
  • Open the Atom Editor, Edit, MM charges, Assign all atoms
  • Manually edit any missing charges
  • Energy AMBER will now run including charges (basic AMBER94 charges) - 0.260391165au

Charges in this file did not sum to an integer number - the only residue with a non integer charge was no.64 with a charge of 0.61160000

MM sanity checks:
All charges sum to:               -5.38840000
Charges of atoms sum to:          -5.38840000
Layer 1 (S) has     34 atoms and charge   0.61160000 sum=   0.61160000
Layer 2 (R) adds  3574 atoms and charge  -6.00000000 sum=  -5.38840000


WARNING -- The following centers have non-zero partial
charges, but zero Vanderwaals parameters. These centers
can collapse onto other centers with non-zero charge
Center    Charge
  1833  0.423900
  1890  0.410200
  2029  0.427500
  2043  0.427500
  2080  0.410200
  2089  0.399200
  2123  0.410200
  2179  0.410200
  2186  0.410200
  2260  0.410200
  2287  0.410200
  2294  0.410200
  2347  0.427500
  2369  0.399200
  2469  0.427500
  2508  0.399200
  2551  0.410200
  2619  0.399200
  2639  0.410200
  2714  0.410200
  2924  0.399200
  2939  0.399200
  2950  0.427500
  2981  0.399200
  3180  0.427500
  3233  0.399200
  3262  0.410200
  3361  0.399200
  3377  0.427500
  3384  0.410200
  3397  0.427500
  3418  0.427500
  3562  0.410200
  3593  0.410200
  3608  0.410200
  • oniom(pm6:amber=hardfirst) geom=connectivity was run successfully, resulting in an energy of 0.210085730788au

This still resulted in some charge problems:

MM sanity checks:
All charges sum to:               -5.38840000
Charges of atoms sum to:           0.79320000
Residue  63 PDB Number    64_A PHE has     1 atoms, and a charge of   0.59730000
Residue  64 PDB Number    66_A CSY has    34 atoms, and a charge of   0.61160000
Residue  65 PDB Number    68_A VAL has     1 atoms, and a charge of  -0.41570000
WARNING -- The following centers have non-zero partial
charges, but zero Vanderwaals parameters. These centers
can collapse onto other centers with non-zero charge
Center    Charge
  3608  0.410200
MMInit generated parameter data with length LenPar=       40470.
Standard basis: Dummy (5D, 7F)
There are    36 symmetry adapted basis functions of A   symmetry.
Integral buffers will be    262144 words long.
Raffenetti 1 integral format.
Two-electron integral symmetry is turned on.
   36 basis functions,    36 primitive gaussians,    36 cartesian basis functions
   27 alpha electrons       27 beta electrons
      nuclear repulsion energy       200.9124124811 Hartrees.
NAtoms= 3608 NActive=   36 NUniq=   36 SFac= 7.50D-01 NAtFMM=   80 NAOKFM=F Big=F
AMBER calculation of energy.
Energy=    0.110474559     NIter=   0.
Dipole moment=       3.390901       5.889448      -2.505476
ONIOM: saving gridpoint  1
ONIOM: restoring gridpoint  3
ONIOM: calculating energy.
ONIOM: gridpoint  1 method:  low   system:  model energy:     0.110474559177
ONIOM: gridpoint  2 method:  high  system:  model energy:     0.060169124949
ONIOM: gridpoint  3 method:  low   system:  real  energy:     0.260391165016
ONIOM: extrapolated energy =       0.210085730788
ONIOM: calculating electric field derivatives.
ONIOM: Integrating ONIOM file  4 number   619
ONIOM: Dipole        =-2.76174164D+01-1.34762322D+02 3.99239712D+01
ONIOM: Dipole moment (Debye):
   X=   -70.1965    Y=  -342.5316    Z=   101.4766  Tot=   364.0782
ONIOM: Integrating ONIOM file  5 number   695

An ONIOM calculation was run using a DFT method on the high level and AMBER on the low level: oniom(b3lyp/3-21g:amber=hardfirst) geom=connectivity which ran successfully...

Charges

Charges in the protein are important for the calculation that will be run on GFP eventually since they provide the electrostatic field or "atmosphere" which surrounds the chromophore, polarizing the electron density used by the DFT method to evaluate the chromophore. Gaussian offers 2 types of influence which the MM portion can exert on the QM part:

  • Mechanical Embedding
  • Electronic Embedding

Mechanical embedding does not use any electronic components to affect the central high region. Electronic embedding does use the charge generated in the MM region to polarize the high region.
At pH 7 the GFP molecule is in a 6- charge state according to assigning AMBER charges and summing the residue charges (available in the out file). 3 options were considered at this stage for treating charges:

  • Ignore charges on the entire molecule
  • Use AMBER94 assigned partial charges on the low region
  • Use Qeq to assign charges based on an overall charge of -6

The first option can be discarded since the electronic atmosphere is of particular interest here. The second 2 options are to be investigated. See below (on a small Biomolecule)

ON a 4 residue chain (in mini biomolecule folder) - As in the GFP tutorial on the wiki, a energy calculation was submitted with the keyword amber=qeq to get back the charges across the molecule. (not entirely sure what this does yet). AS a test of the effect of charges on a molecule a test molecule was compiled in TINKER consisting of 4 residues with 0 overall charges....


No Charges 0.01940311au
Amber Charges -0.18117151au
Qeq Charges -0.23038199au

Clearly the Qeq shows the lowest energy - the considerable differences in energy show that charges are an important part of the input.

Testing Charges and EE on a small system


Energies of the optimisations using various charge approximations and with/without EE

Method Mechanical Embedding Electronic Embedding Computational Time (mechanical embedding) Computational Time (electronic embedding)
B3LYP 6-31G* -1406.04 N/A 11 hours 18 minutes 44.7 seconds 11 hours 18 minutes 44.7 seconds
ONIOM B3LYP 6-31G*:AMBER (No Charges) -783.2225 -783.2223 2 hours 13 minutes 4.1 seconds
ONIOM B3LYP 6-31G*:AMBER (Amber Charges)
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) -783.2467 2 hours 6 minutes 8.6 seconds

NB.

  • B3LYP 6-31G* and ONIOM B3LYP 6-31G*:AMBER (No Charges) were from the same starting geometry.
  • ONIOM B3LYP 6-31G*:AMBER (Amber Charges) and ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) were from the same starting geometry.
  • All Electronic embedding is from the same starting point - Can compare times.

Dipole moments using various charge approximations and with/without EE

Method Dipole moment (No EE) Dipole moment (With EE)
B3LYP 6-31G* 7.4914 N/A
ONIOM B3LYP 6-31G*:AMBER (No Charges) 3.2328 3.2241
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) 5.2794 5.2865
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) 8.173 8.1654

2nd Dec 2009

A single point energy of 1w7s was run with the additional qeq keyword to return partial charges (overall charge set to -6) This new molecule was submitted for AMBER opt but failed after around 700 iterations... this could be due to the error:

WARNING -- The following centers have non-zero partial
charges, but zero Vanderwaals parameters. These centers
can collapse onto other centers with non-zero charge

When the atoms defined in this error had their partial charges set to 0.00, the result was a fully converged AMBER optimisation:

  • Frozen chromophore.
  • pH 7 giving a charge of -6
  • Charges assigned using Qeq method
  • Zero Van der Waals parameter atoms partial charges set to 0.00

Looking at what atoms are not defined in terms of Van der Waals paramters it is ALWAYS an aloholic proton that is not parameterised...

Tutorial used AMBER charges then ran Qeq for the missing charges... i think this is what she thought she did... gaussian "amber=qeq" will overwrite all charges.. need to use "amber=uncharged" so that we dont overwrite the amber charges...

In effect full Qeq charges were assigned in the tutorial. Also overall charge is 0 - will check this.

The GFP tutorial

  • AMBER charges (AMBER94 charges) were assigned to the GFP molecule in the tutorial
  • she notices correctly that some are missing in the chromophore and so decides to assign them via Qeq
  • Runs Qeq on the GFP molecule with a 0 charge which is wrong! When run the AMBER charges and look at the log file we can see that the overall charge should be +4 for that protonation state.
  • Since Qeq is an equilibrium method, partial charges are solved with the constraint that their sum must equal the overall charge.
  • These Qeq charges should therefore not fit with an integer sum when pasted into the AMBER charges.

Qeq versus AMBER94 charges

Ogawa et al. state:

Geometry optimizations of a crambin protein were performed using force fields combined with the
CQEq. The CQEq charges were found to be more comparable to the Mulliken charges obtained by the ab initio molecular orbital
calculation than the fixed charges used in the AMBER force field. Furthermore, the CQEq charges give more realistic charge redistribution
between amino acids in the crambin.

Mini Biomolecule calculations:

Method Dipole moment (No EE) Dipole moment (With EE)
B3LYP 6-31G* 7.4914 N/A
ONIOM B3LYP 6-31G*:AMBER (No Charges) 3.2328 3.2241
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) 5.2794 5.2865
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) 8.173 8.1654

Clearly the Qeq charges are in better agreement with the DFT method than no or AMBER charges.

These frequencies are calulated using mechanical embedding!

Method C=O C=C
B3LYP 6-31G* 1799.65 1701.86
ONIOM B3LYP 6-31G*:AMBER (No Charges) 1802.25 1703.63
ONIOM B3LYP 6-31G*:AMBER (Amber Charges) 1803.37 1701.67
ONIOM B3LYP 6-31G*:AMBER (Qeq Charges) 1800.18 1702.86

As a result it seems obvious that a Qeq charge assignment in constraint by the correct total charge gives results most comparable to the pure QM method (both dipoles and frequencies)

3rd Dec 2009

ESP Charges

ESP calculates the expectation values of the electrostatic potential of a molecule on a uniform distribution of points. The resultant ESP surface is then fit to atom centered charges that best reproduce the electron distribution in a least-squares method. The set of points defining the default surface is generated according to the algorithm of Connolly.

ESP charges thus require a QM calculation to create the molecular electrostatic potential used to fit atom centered charges. The size of GFP means that this is not an efficient method for use in this case?

Qeq Charges

The charge equilibration (QEq) method treats the atomic charge variation through the equilibration of electronegativities at atomic sites. The advantage of the QEq method is that the polarization effect in MM is taken into account by variation of the charge distribution in the molecules. The atomic charge variation relates to the intramolecular geometrical changes.

AMBER charges

8th Dec 2009

A input creation page is now avaliable:

https://www.ch.ic.ac.uk/wiki/index.php/AndyForesterGFP

The final input file had protein information pasted in using text to columns in excel and now displays PDB residue information and PDB secondary structure information (nice cartoon pictures in ChemBio3D!)

Interesting is a paper that uses TD-DFT on GFP (in my papers folder on the mac), the group uses a 3 stage minimisation method from the 1GFL PDB structure:

  • Freeze all atoms in the backbone of the protein and optimise after adding hydrogens
  • Optimise the chromophore at a semi-empirical level
  • They cut out a chromophore section but i could then perform a full QM/MM geometry optimisation

A more charged mini molecule to test electronic embedding


The molecule was submitted with 3 different job lines:

  • b3lyp 6-31g(d)
  • oniom(b3lyp 6-31g(d):amber=(hardfirst,qeq))
  • oniom(b3lyp 6-31g(d):amber=(hardfirst,qeq))=embedcharge

Opt and Freq jobs were completed successfully. The charge breakdown input was:

  • Low Real 2 1
  • High Model 0 1
  • Low Model 0 1

The aim of the calculations was to test electronic embedding versus the mechanical embedding. A paper suggests that there is a large difference in accuracy between the two methods. Qeq charge equilibration was used in both cases as it has already been proved to be much more effective than AMBER assigned charges. It has consistently overestimated the dipole moment though.

Method Dipole moment
B3LYP 6-31G* 8.02
ONIOM B3LYP 6-31G*:AMBER (ME) 12.22
ONIOM B3LYP 6-31G*:AMBER (EE) 12.23

These results are odd since we expect the EE to have some effect on the charge distribution in the molecule (it is polarising the QM region) comparing the frequencies as a further check:

Method C=O C=C
B3LYP 6-31G* 1764 1636
ONIOM B3LYP 6-31G*:AMBER (ME) 1797 1697
ONIOM B3LYP 6-31G*:AMBER (EE) 1797 1697

Again the EE has made no difference to the frequency of various strong vibrations in the molecule. The conclusion is thus either:

  • EE has no real effect on the moleucule
  • EE is not working properly

Assuming the latter further investigation discovered that the meechanism of EE is determined using scaling parameters for MM charges during electronic embedding in the QM calculations. As a default the charges are scaled 555500 using the nomenclature ScaleCharge=ijklmn where n is closest to the QM region (massively anti intuitive). The input is read left to right and not all scaling factors need be defined. Further to this, the value given is then multiplied by 0.2 to arrive at the actual scaling coefficient. The result is thus for an atom:

  • 1 bond from the QM region (n) = 0.2n Default is 0
  • 2 bonds from the QM region (m) = 0.2m Default is 0
  • 3 bonds from the QM region (l) = 0.2l Default is 5
  • 4 bonds from the QM region (k) = 0.2k Default is 5
  • 5 bonds from the QM region (j) = 0.2j Default is 5
  • 6 bonds from the QM region (i) = 0.2i Default is 5

So all the atoms (and partial charges) in the MM region in the test molecule were being scaled to 0 when performing the EE. This explains the fact that no difference was uncovered in the previous calculation set.

A new calculation was submitted with the command scalecharge=555 so that all charges in the MM part will be scaled to 1 (0.2xn,m,l)

Charges...again

So since i deleted the charges on the atoms where they can collapse, the overall charge is now -19.6 from the original -6 minus the 13.6 in charge deleted. REALLY ANNOYING! This is a huge difference in charges so will have to be solved. One easy option would be to set the involved atoms to 0 then run QEQ on the atoms with unassigned charges... essentially distributing the -6 charge through all atoms except those with 0 VdW parameters.

Tues 15th Dec 2009

The more charged molecule failed once the charges were scaled to 1 since SCF convergence was not attained.

Mon 25th Jan 2010

TAO tool publication was very good read... has a good suggestion of an optimisation procedure:

  • Take the crystal structure
  • Add Hydrogens
  • Determine protonation
  • MM geometry optimisation
  • (MD?)
  • (Look at several samples of the MD)
  • Optimise a final structure

Thurs 28th Jan 2010

Tao Tools requires preprocess within AMBER tools to run the scripts correctly. Does not have HIS on the residue database... Amber defines the protonation state within its residue naming scheme... HIS is either HIE, HIP, HID depending on its protonation state. [1]

Thurs 4th Jan 2010

Preprocessing

To get TAO tools and also for completeness it is necessary to preprocess the PDB file before attempting to run ONIOM calculations, this includes:

  • Adding hydrogens
  • Create a custom residue file for the chromophore
  • Convert from PDB to gjf ONIOM input

Ambertools are extremely helpful with this preprocessing especially since amber will be the low level theory used throughout this project. Within ambertools are a number of useful programs:

  • antechamber - has a parameterlookup function within amber and GAFF parameter library files.
  • xLeap - builds the protein from its constituent residues - adds hydrogens according to library files - allows a parm file to be built to add custom residues.

I will be working on using these 2 programs to create a full MM input file which is fully defined and correct.

xLeap

To load xLeap we must use X11 to get the graphical interface and the ssh -Y command to allow tunneling back to X11. Also the command line to start xLeap should be: xleap -s -f /apps/ambertools/amber11/dat/leap/cmd/leaprc.ff03. This loads the ff03 parameter file and results in the start up of the graphical interface with:

Welcome to LEaP!
Sourcing leaprc: /amber/dat/leap/cmd/leaprc
Log file: ./leap.log
Loading parameters: /amber/dat/leap/parm/parmME.dat
Loading library: /amber/dat/leap/lib/all_nucleic94.lib
Loading library: /amber/dat/leap/lib/all_aminoME.lib
Loading library: /amber/dat/leap/lib/all_aminoctME.lib
Loading library: /amber/dat/leap/lib/all_aminontME.lib
Loading library: /amber/dat/leap/lib/ions94.lib
Loading library: /amber/dat/leap/lib/water.lib

This shows that the residue libraries have been loaded. We are then free to load a pdb file which will be read by xLeap and built according to the residues! In addition to the residue names xleap also requires information about the chain connectivity. Residues can be connected to other residues from both sides, only one side or not at all. xleap is fairly simple in that it expects that all residues in the pdb file are connected in the order in which they are listed unless they are separated by a "TER" card. In most cases where a single protein chains is being investigated like here - the TER card should end the residue information.

To load a pdb use (ignore the <>) <name for use in xLeap>=loadpdb "<name of file>". Make sure that the file is in your PATH else fully define the path in the filename.

The pdb should then be read into xLeap and built. At the end of the output we get (I loaded ChainA of the 1W7S without any water or hydrogens):

Leap added 1765 missing atoms according to residue templates:
      1 Heavy
      1764 H / lone pairs
The file contained 21 atoms not in residue templates

This shows that Leap added hydrogens automatically and also a missing "heavy" atom (one which is not a H or lp):

Added missing heavy atom: .R<CHIE 229>.A<OXT 18>

For amino acids, the traditional order of atom records within a residue is N, CA, C, O, followed by the sidechain atoms (CB, CG1, CG2 . . . ) in order first of increasing remoteness, and then branch. The extra oxygen at the carboxyl terminal is called " OXT", and appears after all sidechain atoms.

Now that we have established that there are 21 atoms that are in the non standard residue region we need to define a custom residue which describes this region (chromophore). This involves creating a prep file....

I first drew up the undefined section of GFP (chromophore) in Gaussview5 and saved as a.pdb then:

  • Load in antechamber and convert to mol2 format using antechamber -i <input> -fi pdb -o <output> -fo mol2 -c bcc
  • Use parmchk to make sure the molecule was defined in the GAFF force field - kicks out a .frcmod file with missing parameters (including dihedrals) - nice :D!
  • Manually add the "1CSY" where a standalone "0" is in the mol2 file so Leap can read it
  • Load the gaff parameter file to Leap using source leaprc.gaff
  • Load the chromophore_CSY_full.mol2 to Leap using CSY=loadmol2 "ChromophoreGFP_CSY_full.mol2"
  • Check the residue using check CSY - Notice missing parameters are there.
  • Load the ChromophoreGFP_CSY_full.frcmod file created by parmchk using loadamberparams ChromophoreGFP_CSY_full.frcmod
  • Now check the CSY residue again using check CSY - No Problems! woo hoo! Now I can create amber parameters :D

Tues 9th Feb 2010

RESP Fitting

To get charges that are like those of the standard amber force field an RESP fitting method has to be used. This involves 3 steps:

  • First, the molecule studied is optimized to determine a stable minimum (using Gaussian).
  • Then, this minimized structure is used to calculate a Molecular Electrostatic Potential (MEP) on a three-dimensional grid (using Gaussian).
  • Finally, this grid is exported into the "RESP program" which is used to fit atom-centered charges to the MEP

To assist in this procedure - I am using a program called RED - [2] which can be used in all 3 stages...

  • Ante_RED is used to prepare Gaussian input files for the RESP fit
  • RED III is used to fit atom centered charges

The procedure used to create the CSY file for RESP fitting is as follows:

  • Cut the CSY residue from the 1W7S protein structure and cap terminal ends with methyl groups
  • Optimise this in Gaussian at PM6 level
  • Run the optimised CSY geometry through ante_RED to create the Gaussian input file involving minimisation at HF level and calculation of the MEP - also get the .P2N file required for RED III
  • Run this Gaussian input file

Input file for optimisation:

#P hf/6-31G* Opt=(Tight,CalcFC) Freq SCF(Conver=8) Test

This will run a HF level optimisation and frequency analysis. Opt=tight tightens the cutoffs on forces and step size that are used to determine convergence. An optimization with Opt=Tight will take several more steps than with the default cutoffs. For molecular systems with very small force constants (low frequency vibrational modes), this may be necessary to ensure adequate convergence and reliability of frequencies computed in a subsequent job step. This option can only be used with Berny optimizations. Opt = CalcFC causes the calculation of an analytical Hessian (in HF). SCF(Conver=8) sets the SCF convergence criterion to 10^-8.

Parameter files for use in Gaussian (Gaussian only uses amber94)

Using external parameter files. It is possible to include MM function entries within an external file by replacing the entries within the input file with the usual Gaussian include file mechanism: e.g., @oplsaa.prm. Parameter files for various force fields are included within the main Gaussian directory (i.e., $g09root/g09). If you want to specify the entire force field via an external file, use an input file like the following:

  1. UFF=SoftOnly

Read-in force field example

molecule specification

@$g09root/g09/oplsaa.prm

Note that the choice of a molecular mechanics keyword is arbitrary as the entire force field depends on the read-in functions and parameters and not on any internal ones (in other words, specifying Amber would yield the same result as UFF).

Tues 16th Feb 2010

RESP Fitting

Following optimisation of the fragment for RESP fitting as seen before. We now have to fit the RESP to a MEP, this involves a few calculations:

  • ESP calcs in gaussian
  • Fitting calcs by resp in ambertools

The result is a .mol2 containing the fitted charges. The process has 2 input files:

  • .log
  • .p2n

The p2n is created by ante_RED and controls the RESP fitting process. (See RED site). To get a fit that excludes the capping methyls we add a REMARK line:

REMARK INTRA-MCC 0.0 |  35  36  37  38  39  40  41  42 | R

This must be formatted EXACTLY as shown here (2 spaces between the numbers). The numbers correspond to atoms numbers, the 0.0 constrains these atoms to 0.0 charge and the R tells RESP to remove these atoms from the 2nd round fit and the subsequent mol2 output. Leaving a fragment.

The RED program can be run on CX1! I used the following Jobscript (REDjobscript_4):

#PBS -l ncpus=1 
#PBS -l mem=6000mb
#PBS -l walltime=01:00:00
#PBS -q pqchem
#PBS -joe
module load ambertools
module load gaussian
export GAUSS_SCRDIR=$TMPDIR
echo $GAUSS_SCRDIR
cd $HOME
pwd
perl /home/af706/RED-vIII.3.pl > /home/af706/RED-vIII.log

So here we are using the chemistry node pqchem:

  • loading the ambertools (for RESP) and gaussian modules
  • defining a scratch directory for the RESP program to read gaussian scratch
  • TEMPDIR is used since the scratch must be empty upon calculation start. The temporary directory will be created fresh on each node as the calculation is started, and it's unlikely that two calculations will run on the same node.
  • Moving to my HOME directory /home/af706 because the script must be run in the same directory as the input files (.log and .p2n)
  • RUN!

The fitting process runs single point energy calcs and fits the RESP - overall it took 3mins for my fragment.

Going from xLeap out to Gaussian in

2 ways to do this:

  • Load in Gaussview
  • Use Pengs scripts

Gaussview

  • The file loads and all residues are recognised (apart from the chromophore). The file contains the same number of atoms as the saved Leap file.
  • Charges are assigned from amber94 - not the chromophore still
  • Amber atom types are assigned correctly to ALL atoms
  • All that remains is to add charges from the RESP fit.
  • PDB information is retained in input

TAO

  • Atoms are added so atom count is different to the file leaving Leap
  • Fails to treat terminal residues correctly
  • Lose residue information
  • Charges are missing from chromophore and atoms in terminal residues

22nd Feb 2010

Links to HOW TO for Ambertools + Gaussview/ Ambertools + TAO tools + Gaussview

Running ONIOM calculations - https://www.ch.ic.ac.uk/wiki/index.php/AndyForesterONIOM

26th Feb 2010

Picking out model Modes

Integers and integer ranges without a keyword are interpreted as mode numbers, although the [not]mode keywords may be used. The keywords atoms and notatoms can be used to define an atom list whose modes should be included/excluded (respectively). Atoms can also be specified by ONIOM layer via the [not]layer keywords, which accept these values: real for the real system, model for the model system in a 2-layer ONIOM, middle for the middle layer in a 3-layer ONIOM, and small for the model layer of a 3-layer ONIOM. Atoms may be similarly included/excluded by residue with residue and notresidue, which accept lists of residue names or numbers. Both keyword sets function as shorthand forms for atom lists.

TAO tools

The expected differences to the one without using his tools are:

  • Different partial charges since his scripts use all_amino03 to assign charges,
 Gaussview uses amber94 charges (Net charge is the same for both files - good thing)
  • Potential for some changes in atom-types due to different versions of amber being used.

What actually happened:

  • Difference in individual partial charges but same overal net charge
  • Different atom type assignment led to some more undefined parameters.
  • Defining these and running an energy led to significantly different energy than the non tools
 method:

WITHOUT tools

AMBER calculation of energy.
Energy per function type:
Direct non-bonded                                 122152.193595037
Non-bonded pair                                  -122162.319396395
Harmonic stretch I                                     0.354921205
Harmonic bend I                                        0.989669436
Amber torsion                                          1.587390468
Improper torsion                                       0.013641893
Energy per function class:
       Coulomb              -9.200657698
   Vanderwaals              -0.925143661
    Stretching               0.354921205
       Bending               0.989669436
       Torsion               1.587390468
  Out-of-plane               0.013641893
Energy=    -7.18017836     NIter=   0.
Dipole moment=     -25.836974    -115.466231     -34.635294

WITH TAO tools

AMBER calculation of energy.
Energy per function type:
Direct non-bonded                                 137343.707022741
Non-bonded pair                                  -137347.284026475
Harmonic stretch I                                     0.354921205
Harmonic bend I                                        1.082982995
Amber torsion                                          1.589487307
Improper torsion                                       0.013645198
Energy per function class:
       Coulomb              -6.890960120
   Vanderwaals               3.313956385
    Stretching               0.354921205
       Bending               1.082982995
       Torsion               1.589487307
  Out-of-plane               0.013645198
Energy=   -0.535967030     NIter=   0.
Dipole moment=     -29.988370    -117.918540     -32.409355

Obvious where the differences are.

3PHY

https://www.ch.ic.ac.uk/wiki/index.php/AndyForesterPYP