GFP : Green Fluorescent Protein
Creating an input file
This is a step-by-step guide to creating a .com input file for ONIOM calculations using GFP. We've prepared things using amber as the low layer of theory.
1. Download the text .pdb file for GFP from http://www.rcsb.org/pdb/explore/explore.do?structureId=1W7S
2. Open GaussView 4.1 and open the .pbd file (crtl+o). The "drag and drop" open does not work with .pdb files. You must specify that hydrogens are added: GaussView recognizes amino acid sequences from its library and assigns the right atom types and charges accordingly.
3. Save the file as a .com (ctrl+s).
4. Open the .com in TextEdit, and in the route section replace the method and basis set with the keyword "amber".
5. Open this new .com with GaussView 4.1 which will attempt to assign as many atom types and charges as possible.
6. Save this assigned file as a com (ctrl+s).
This was the basic part. Now we want to assign the atom types GaussView missed:
7. Export the .com file from step 6 to a cluster folder.
8. Open an X11 terminal window and navigate to the .com file location. Open using the nedit command.
9. In the nedit window, toggle the statistics line on with alt+a. This will allow lines without atom types or atom charges to be located easily for future use.
10. Search for "- " in nedit. Note down the lines with missing atom types. The atom numbers can be easily deduced from the line numbers.
11. Cut and paste the lines with missing atom types to a new input file.
12. Open the new input file with GaussView 3.0. Manually assign the missing atom types if possible. If it is not possible to tell (e.g.: the atom is terminal but this may be because the input was cut and paste), remember the approximate location and open the .com file from step 6. Manually navigate to the region of interest. Make sure you toggle labels off when navigating to speed the process up!
13. Update the nedit file with the assigned atom types, save.
14. By way of review there are 4 unassigned atom types on lines 480, 4141, 4877, 5964. The correct assignments are N-NB-, O-O-, S-S-, N-NB- respectively.
And finally we want to assign the missing charges:
15. In the route section of the updated .com file on the cluster, specify the route section as "# amber=qeq". Make sure you include a minimum memory of 4000mb and also a suitable location for the .chk file.
16. Run the input file on the cluster.
17. You will encounter an error message concerning unknown bond stretches and bends. To correct this, specify "amber=(qeq,hardfirst) in the route section and input the bond stretches and bends at the end of the input as usual.
18. Now run the input file. Hopefully this will work and you will get the missing charges.
19. Open the .log file in an nedit window using X11 as before and copy all the missing charges. Paste into another nedit window displaying the input file from step 17 and save. You now have a suitable input file for running ONIOM calculations.
Files from different stages of input setup
Raw input files
The files below have no charges on the chromophore.
Here are two files, one with GFP with a surrounding water matrix and one with just GFP (chromophore in high level)
Media:gfp_water_1w7s_ready_to_use.gjf
Media:gfp_nowater_1w7s_ready_to_use.gjf
Ready-to-use input files
Media:gfp_nowater_charges.gjf
Media:gfp_water_charges.gjf
Input file with an additional amino acid in the high region (and 1 water molecule)
Media:gfp_water_charges_amino_acid.gjf
Calculations
- gfp_water_charges_complete_1.com : Chromophore well defined (hydrogens, bonds... OK), with the water environment
Single point energy successful
Input, Output :Media:input_spe_1.gjf
- gfp_water_charges_complete_opt_1.com : Optimisation of the previous system.
Optimisation successful
Input, Output :Media:input_opt_1.gjf
- gfp_water_charges_complete_opt_1_freq_1.com : Frequency calculation of the previous optimisation
Input, Output : Media:input_freq_1.gjf
But one imaginay frequency, so not a minimum but a transition state
- gfp_water_charges_complete_2.com : Chromophore well defined, with the water environment and the additional amino acid plus one water molecule in high level of accuracy.
Single point energy calculation successful
Input, Output :Media:input_spe_2.gjf
- gfp_water_charges_complete_opt_2.com : Optimisation of the previous system
Problem of job killed, calculation requires a memory of 14000mb and a minimum of 500h
Input, Output:Media:input_opt_2.gjf
- gfp_nowater_charges_complete_energy_1.com : chromophore well defined and just the chromophore in the high level.
single point energy successful.
- gfp_nowater_charges_complete_energy_2.com : test of oniom(b3lyp/6-31g(d):amber=hardfirst)=scalecharge=555500 (no change concerning this and embedcharge)
- gfp_nowater_charges_complete_opt_1.com : optimisation of the previous system
opt successful
Media:gfp_nowater_charges_complete_opt_1.gjf
- gfp_nowater_charges_complete_opt_1_freq_1.com : frequency calculation of the previous optimised geometry
Media:gfp_nowater_charges_complete_opt_1_freq_1.gjf
No imaginary frequencies, but ModelModes produced fewer modes than expected (only 40). So, using a script written by David Mendive Tapia, the output can be cut as required to create space to display the remaining modes: Media:jobscript_live.log. This script prints just the geometry and frequency in the output. Change the .log extension to .sh
- gfp_nowater_charges_complete_opt_1_freq_1_red : ModelModes
- gfp_nowater_charges_complete_opt_1_freq_2_red : Freq calculation
- gfp_nowater_charges_complete_opt_model_1.com : Optimisation with the option of printing the geometry of the model system
Successful
Media:gfp_nowater_charges_complete_opt_model_1.gjf
- gfp_nowater_charges_complete_opt_model_1_freq_1.com : Frequency calculation of the model geometry, but failed because of the "page layout", so gaussian did not recognize this way of presenting the geometry.
Media:gfp_nowater_charges_complete_opt_model_1_freq_1.gjf
- gfp_nowater_charges_complete_opt_model_1_freq_2.com : Optimised geometry changed, 9 imaginary frequencies obtained
Media:gfp_nowater_charges_complete_opt_model_1_freq_2.gjf
To run some frequency calculation on the model, use the IOp(1/33=1) keyword in order to print the model geometry.
Aurelie's comments
Optimisation
First I run optimisations with the file without water : optimisation = no problem, frequency calculation = problem...
Once the optimised geometry found I checked if there was some imaginary frequency, so I ran with the keyword Freq=Modelmodes which enables me to get a file with a reasonable size and the information that I want to get. I was lucky because I had no imaginary frequency, but I just got 40 modes with the keyword Freq=ModelModes, instead of 3*36-6 = 102, so there is a problem.
I imagined two possibilities : because the calculation was very big it crashed when I wanted to print the final output even if I restrained what it had to print. Or there is a limitation in he code and so just the 40 last modes can be printed.
So I tried to run an other calculation with the optimised geometry by asking the geometry of the model with IOp(1/33=1), once the file obtained, I had a look at the output and the geometry of the model was behind the part below :
At end of L120: --------------------------------------------------------------------- Center Atomic Atomic Coordinates (Angstroms) Number Number Type X Y Z --------------------------------------------------------------------- 1 7 6 23.050591 3.469428 -1.008981
So that was strange because when I did this previously (using IOp(1/33=1)) I got the atoms outside the model with a -1, and here I have for the atoms outside the model 6 and for the model some big number(as a code), as below
--------------------------------------------------------------------- Center Atomic Atomic Coordinates (Angstroms) Number Number Type X Y Z --------------------------------------------------------------------- . . . . 473 6 20001005 -0.375481 1.652856 0.482589 474 7 20001016 -0.273771 1.012963 1.610015 475 6 20001005 0.594142 -0.055349 1.376775 476 6 20001005 1.027366 -0.007835 -0.040053 477 8 20001024 1.707451 -0.776975 -0.709324 478 7 20001018 0.428530 1.145189 -0.536877 479 6 20001001 0.563225 1.567653 -1.931383
So just take the geometry and hope for a frequency calculation on the model did not work, so I took the geometry and delete all the atoms which did not correspond to the model system, and replace the big code number by 0.
Back to ONIOM for biomolecules
Back to ONIOM tutorial (G03)