Posts Tagged ‘gromacs’

Gromacs Genion Fix

Monday, January 25th, 2010

Gromacs is a wonderful open source molecular dynamics simulation package that I will be using for some of my research here at Imperial. However, as is oftentimes the case with scientific packages (both open and closed source), there are some minor quirks that I’ve had to learn to deal with. One of those quirks was elusive and irritating enough that I’ve decided to write a short howto so that someone elsewhere might not have to spend as long as I did trying to solve it.

Warning: I am not a computer scientist or programmer, and barely even a scientist. There is no guarantee that this code will not destroy your data, so be sure to backup all files before attempting this fix (which you should be doing anyways)! Having said that, it worked for me, and I can only hope for the best.

Version: This behaviour is seen on Gromacs 4.0.7. The proposed fix uses standard unix command line utilities (and is a very ugly but working hack).

Symptoms: While trying to neutralize the charge on a solvated system using genion to add X ions, genion insists on instead adding kX ions, where k is an integer. Additionally, attempts to use genion’s own “-neutral” option also fail, with it calculating the correct number of ions to add, but then adding it multiple times.

Cause: The reason this happens is due to the way the topology files are constructed. Near the end of the topology file is a section that details the type and number of molecules in the system. For instance, it might look something like this:

[ molecules ]
; Compound #mols
Protein_A 1
Protein_B 1
SOL 27550

When genion runs, if one selects option 12, “solvent”, genion will replace some of the solvent “SOL” molecules with the specified ions, perhaps resulting in the following, where we specified the addition of 8 NA+ ions:

[ molecules ]
; Compound #mols
Protein_A 1
Protein_B 1
SOL 27542
NA+ 8
CL- 0

However, sometimes, due to prior processing with other tools, there will be multiple “SOL” lines, like so:

[ molecules ]
; Compound #mols
Protein_A 1
Protein_B 1
SOL 72
SOL 75
SOL 27403

Notice that the total number of solvent molecules is the same in both examples. However, genion gets confused when multiple solvent lines are present, and seemingly acts on each line! This causes the resulting topology file to look like this:

[ molecules ]
; Compound #mols
Protein_A 1
Protein_B 1
SOL 64
NA+ 8
CL- 0
SOL 67
NA+ 8
CL- 0
SOL 27395
NA+ 8
CL- 0

Notwithstanding the fact that the resulting topology file is decidedly not neutral, the topology file also then for some reason does not match up with the .gro file, which causes additional problems down the line.

Of course, by now, you’ve probably figured out how to solve the problem yourself: just make sure to consolidate multiple SOL lines before running genion, and you’re golden! Since I am lazy and do not like doing that by hand (and because I’ll probably need to bulk process many proteins sometime in the future), I hacked together a really ugly but working bash script to do just that. Following:

#!/bin/bash
# Consolidate the solvent molecules in the topology file.
filename=example.top
sol_mol=0
cp $filename $filename-new
while read line; do
sed "/$line/d" $filename-new > $filename-new2
mv $filename-new2 $filename-new
add=$(echo $line | sed 's/SOL//')
sol_mol=$(($sol_mol+$add))
done < <( tail -n $(( $(cat $filename | wc -l) - $(grep " molecules " $filename -n | sed 's/:\[ molecules \]//') + 1 )) $filename | grep SOL )
echo "SOL $sol_mol" >> $filename-new
mv $filename-new $filename

If anyone actually stumbles across this page and finds it useful (which I rather doubt, but you never know), I’d love to hear about it.

~William~