## Monday, May 19, 2014

### Towards a new version of the manuscript

In the opening post of the nmrlipids project, published simultaneously with the accompanying manuscript, it was written that:
"The ultimate goal of this blog is to find an atomistic (preferably united-atom) force field that reproduces the experimental properties discussed in the manuscript. Naturally the optimal situation would be that some of the already available force fields would fulfill this goal. If this, however, turns out not to be the case, the goal will be to find the appropriate modifications."
As discussed in The lipid forcefield comparison post, none of the tested united-atom force fields fulfills the goal$$^1$$. The same post concluded that three all-atom force fields (CHARMM36, GAFFlipids and MacRog) gave a reasonable agreement with experiments for a fully hydrated pure bilayer, when the absolute values of headgroup and glycerol order parameters were compared. Since then, the comparison has been refined to include the sign of the order parameters as well as the R/S labeling of hydrogens. Fig. 1 shows an up to date comparison of the top three all-atom force fields, the Berger force field, and the Lamberg parameters (currently in development in this blog) to experiments.
 Figure 1. A set of simulation results for the POPC headgroup and glycerol order parameters reported in the blog is compared to experiments.

With these refinements (sign and R/S labeling) it becomes clear that the order parameters for the g$$_1$$ group in the GAFFlipid force field are not in a reasonable  agreement with experiments. Thus there are only two all-atom models left to consider: CHARMM36 and MacRog. Interestingly, as discussed in the post Response of headgroup and glycerol order parameters to changing conditions: Results, the responses of these two force fields to changing conditions (dehydration, ion concentration and cholesterol content) are in qualitative agreement with experiments. However, the Na$$^+$$ association with the bilayer is clearly too strong for MacRog; the association is less for CHARMM36, but more simulations are needed to complete the comparison to experiments.

In conclusion, CHARMM36 is a force field that relatively well satisfies the original goal of the nmrlipids project. In this respect, we could now just write up the manuscript, submit and finish the project.

Having said that, I still think that it would be very useful to have also a united-atom model that fulfills the original goal. Regarding the studies of pure lipid bilayers, CHARMM36 appears significantly slower than united-atom models and also slower than other all-atom models. Based on my personal experience (I have not investigated it further) I think the slowness is due to the Lennard-Jones interactions of the water hydrogens in the CHARMM water model$$^2$$. In addition to speed, there will probably be interest in using a lipid model with correct molecular structure in combination with proteins (or other molecules) that lack a model combatible with the CHARMM36 lipids.

If we manage to get correct structures for one united atom model, the same procedure could probably be used for other models as well. A promising method to tune the potentials to get the correct order parameters was already reported in this blog by Antti Lamberg in March. However, at the time, we had not considered the signs and the R/S labeling and thus the order parameters were matched to wrong values.

On the basis of these perpectives, I suggest that we make one further attempt to correct the order parameters for the Berger model in the following way:
1. We calculate the dihedral distributions from the CHARMM36 (and MacRog) for the glycerol and headgroup region.
2. We set up dihedral potentials for the Berger model such that they resemble the ones in the CHARMM36 (and/or MacRog).
3. We use these potentials as a starting point for a Lamberg-style optimization.
I think that this procedure has potential to produce a united-atom model that will have the correct glycerol and headgroup structures. If this procedure does not succeed, I suggest we finish this project and publish the results; further attempts to fix united-atom model can be continued in another project in any case. I will already start to prepare the next version of the manuscript.

First results for step 1 of the procedure. As preliminary work for the above procedure I plotted 40 consecutive lipid structures observed in Berger and CHARMM36 simulations such that the glycerol groups were aligned. The results are shown in Figs. 2 and 3.

 Figure 2. 40 consecutive ($$\Delta$$t=1.25 ns) structures observed in a Berger simulations (POPC, T=298K). Hydrogens have been added to the visualization by the Gromacs protonate program in the same way as for the order parameter calculation.

 Figure 3. 40 consecutive ($$\Delta$$t=1.25 ns) structures observed in a CHARMM36 simulation (POPC, T=303K).
Comparing the Figs. 2 and 3 one can clearly see that the conformations sampled by the g$$_1$$ and g$$_3$$ groups significantly differ between the models - which is no surprise as the models predict different order parameters. In all configurations of the CHARMM36 model the hydrogens of the g$$_1$$ group are pointing to the same direction, indicating a rather rigid dihedral in the g$$_1$$-g$$_2$$ bond. In the Berger model, the g$$_1$$-g$$_2$$ bond seems to be significantly more flexible. The situation is opposite for the g$$_3$$ group: the hydrogens are pointing in the same direction in almost all configurations for the Berger model, while CHARMM36 seems to be significantly more flexible. This difference seems to have significant effect to the conformations of the whole PC headgroup. (When looking at the figures, it should be noted that the molecules are shown from different angles to make the above points more clear.)

I am currently processing some ideas how to proceed to the step 2 in the procedure. I will report the progress in the near future. It would be also interesting to see structural figures with aligned glycerol from the MacRog model.

$$^1$$The recent united atom CHARMM has not been tested yet. The authors of the model state that "The headgroup is still an AA potential in C36-UA, so only the chain SCDs were compared." Thus similar results than in CHARMM36 are expected. However, the united atom CHARMM might still be quite slow due to the water model. Some results with the model would be interesting.

$$^2$$Using regular TIP3P model does not seem to be a solution since significant changes in phase behaviour was reported by Piggot et al.

1. I'm asking for a bit of a clarification as to your 3-step process. Just to recap, in the procedure I was using, I modified some of the dihedrals of the Berger model and simulated whole bilayers for ~50 ns. I then computed the order parameters of said simulations, and tried to parametrize a function f(dihedrals) = order parameters, so that I could then predict a set of dihedrals that would give the experimental order parameters.

We were unsure of whether all atom models work very well at the time, so this seemed like the reasonable thing to do. However, now that we have established that CHARMM seems to reproduce the experimental values well for almost all use cases that we have tried, one could directly parametrize a united atom model to reproduce the dihedral distributions of CHARMM. This is, I think, what you are saying but I want to explicitly point out that this can, and probably should, be done for single molecules in vacuum (or some other minimally small systems). This means a _lot_ less computational effort.

As for the dihedrals, one can take the Berger definitions and see whether by tuning the parameters CHARMM distributions can be replicated or not. I had assumed some symmetries in the parameter values, but I think this is probably not necessary if the simulations only contain single molecules.

2. Here is what I have done so far:

I started to work on the dihedral between g1 and g2 which seemed to be somehow too loose in the Berger. My goal was to find dihedral potentials which would reproduce the dihedral distributions observed in CHARMM. I think that it would be difficult to use the CHARMM potentials directly since it has hydrogens involved which are not present in Berger (I did not think this further though). So I calculated the dihedral angle distribution observed in CHARMM for the atoms which are present in the Berger, i.e. C12-C13-C32-O33 and O14-C13-C32-O33 with the Berger notation. The observed distributions from CHARMM simulation are here:

https://www.dropbox.com/s/skvy3tinbdl7dxj/g1-g2dih_C12-C13-C32-O33fromCHARMM.xvg
https://www.dropbox.com/s/pft3l48shaa6q2u/g1-g2dih_O14-C13-C32-O33fromCHARMM.xvg

First I thought that I would use Gromacs proper dihedrals to construct potential which would have minima and maxima at the locations corresponding to the maxima and minima at the observed distribution. I think that this would be possible but one would need a quite large number of terms since the observed distributions are asymmetric and the peaks are quite narrow (I did not think this further though).

Then I decided to use tabulated potentials. I constructed tabulated potentials for the C12-C13-C32-O33 and O14-C13-C32-O33 dihedrals roughly with the following steps 4 steps:

1. I took running average to smooth the observed dihedral distribution.
2. I multiplied the distribution with (-1) and added a constant to make values positive.

I ran simulations directly with this kind of potential. However, this did not work since with some angles the distributions had values of zero (no observed angles) which led to potential with flat maximum having zero derivative through several angles. Then in the simulation system was happy to have these angles since there was no force, even though the energy maximum. To get rid of this problem I added the step 3:

3. I added gaussian functions to the potential at the locations of flat maximum.

As a consequence, I got the following tabulated potentials for the C12-C13-32-O33 and O14-C13-C32-O33 dihedrals:

https://www.dropbox.com/s/a7ltepbas2kz6hu/potential_C12-C13-C32-O33.xvg
https://www.dropbox.com/s/3qd3blwuhdw62y8/potential_O14-C13-C32-O33.xvg

Finally,

[ exclusions ]
12 33
14 33
11 14

When I ran a simulation with these potentials I get structures which looks like these:

https://www.dropbox.com/s/p7mnub6j9wxu9kq/FFtestSNAP.pdf

Now the glycerol structure is closer to the CHARMM.

My plan was to automatically run the above procedure for all the dihedrals in the headgroup and glycerol to make new dihedral potentials, and then these potentials would be modified with the same style as Antti did directly for the Berger potentials previously to give perfect order parameters.

Now there are some potential problems though:

1. I am not sure how I can automize the addition of gaussian function. I will think about this. This can be done manually as well, but it just takes a bit more time.

2. When Antti modified berger potentials, the modification was done for proper dihedral parameters. Modifying tabulated potentials automatically might be more difficult? What do you think?

In conclusion, I am trying a brutal coarse graining approach to make United Atom dihedral potentials from All Atom simulations. To do something in vacuum, as Antti suggested might be also reasonable, especially if the current approach does not work.

3. The problem with tabulated potentials is that they are slow. I would much prefer using nontabulated potentials. Of course dihedtals are intramolecular, so this is not a very big deal. Your procedure, however, is somewhat arbitrary. I suggest using iterative Boltzmann inversion (which as I have understood it is somewhat close in spirit to what you are doing anyway) or force matching or some other well established way of parametrization. These all have thein own problems, espeially when it comes to the generalization of systems*, which is why I would again prefer using parametrized potentials (instead of tabulated ones) if at all possible. With these methods, though, the arbitrary functional form is not a problem. From what I remember VOTCA interfaces with GROMACS and is relatively easy to use for these types of computations. To add custom Gaussians you might consider PLUMED. This is a GROMACS extension meant for free energy calculations, but the way you are parametrizing your potential sound awfully lot like metadynamics (dropping Gaussians), which is what PLUMED does.

*Suppose you manage to parametrize POPC and DPPC with pretty much any generalized function method (especially Boltzmann inversion). In a mixture of DPPC and POPC, the dihedrals would need to be reparametrized for both molecules. Physically this seems nonsensical.

1. I do not think that introducing ~10 tabulated dihedral potentials leads to a significant computational cost. The way I do it is extremely simple and fast to do. I think that I will do it like this, and then, depending on the results consider more sophisticated approaches.

The nonsymmetric, sharp peaked functional form is not a problem for the "coarse graining" method. For me it seems to be a problem for gromacs proper dihedrals which are given as series of trigonometric functions (see eq. 4.61 in manual). I think that to make a potential with the required shape one needs quite many terms into the sum (correct if I am wrong).

Adding gaussians to the potential function is very easy:
cat popcTST4_d1.xvg | awk '{print $1" "$2+0.01*exp(-($1+60)*($1+60)/(2*20*20))}'
To automize the decision where to add it, i.e., in my case finding the places of flat potential maximum, is difficult. This is currently the only "arbitrary" part.

4. The computational cost was more of a side note, and the reason I suggested VOTCA/PLUMED was to point out that they use methods which have some theoretical justification and whose side effects have been studied. Metadynamics in its standard form tries to flatten out the histogram (distribution), but I see no reason why you could not in principle use the same methodology to fit to known dihedral distributions. If I have time, I might have a go at this later. But again, because it is about dropping Gaussians, I am expecting to get results similar to your procedure; metadynamics just drops them to wherever the current angle is at predetermined timesteps and is theoretically guaranteed to give a working functional form (for the mean field), and you decide these positions by hand.

The side effect that I am talking about is basically overfitting to data, and this should have been the emphasis of my last post. If you take a curve that traverses through all of your data points, it is by definition a perfect fit to the data, but it can easily misstep much worse than a linear fit when you inter/extrapolate from the data. The more parameters you have in your fit (and each of your Gaussian insertion will essentially add parameters/degrees of freedom), the more likely you are to be overfitting. Now as a practical example this extrapolation would happen in different salt conditions, for example: Your parametrization procedure can easily overfit to the salt free data and give complete nonsense when salt is added.

There are some related examples in the current literature. Martini uses "more" physically inspired potentials than the SDK model, and only the former qualitatively captures the gel-fluid transition in bilayers (even though the fluid phase is arguably more accurately described in the latter). Another example might be the one I mentioned earlier: iterative Boltzmann inversion will give pair-pair interactions that work for molecule types A and B separately, but should you mix A and B, the A-A and B-B interactions, too, have to be recomputed, which clearly does not make much physical sense. This is why I would prefer a more physically motivated procedure (using predetermined functional forms). Having said this, I think it is worthwhile testing as many different ideas as possible and seeing which gets you better results.

I am aware that I am overfitting. However, it is not clear if it matters in our case and this can be checked against the response data (dehydration, ions etc.).

5. A progress report on the Berger modification. I ran a bunch more simulations (64, to be exact), and as I had predicted (somewhere on this blog, after we discovered that the signs of the order parameters were in fact important and wrong in Berger), I have been unable to satisfactorily reproduce the signs of the g1 order parameters. Granted, I sometimes do get the sign itself correct, but this is a still far cry from getting the order up to -0.15.

While the procedure I was using is by no means exhaustive (as in, it does not sample the whole parameter space), it does prompt the question of whether the dihedrals I was modifying can in fact produce the correct behaviour at all. One might trying modifying more of the dihedrals, without using the symmetry considerations that I was using, or by trying to fit them to CHARMM distributions in vacuum. My guess, though, is that even these might not be enough, but rather Berger dihedrals (or charges, or a combination thereof) are more fundamentally flawed and that one would need to add other dihedrals into the mix to have any chance of succeeding. This is speculation, of course, and I'd be happy to see someone show me wrong. If I have time, I'll later look into doing some of the things that I proposed.

1. I have been trying to make dihedral potentials which would reproduce similar dihedral distributions observed in CHARMM by using the approach I mentioned above (tabulated potentials). The potentials I have now gives the following order parameters:
beta -0.118127
beta 0.0921357
alpha 0.217337
alpha 0.0414373
g3 -0.14162
g3 -0.300409
g2 -0.344162
g1 -0.329405
g1 -0.184324

and the structure which looks like this: https://www.dropbox.com/s/kzstx1unjaxbuin/BergerTST6.png

Compared to the structure in CHARMM (shown above), the structure of the glycerol is pretty similar except that the g2-g3 dihedral seems too rigid. However, the order parameters for the glycerol are clearly too negative in the current model. I am now suspecting that this is due to the dihedrals between glycerol and acyl chains which looks different between models but I have not tuned those yet. I have been and will be quite a lot out of office during the summer, but I will try to test this during the next week. After this it might be useful to try Antti's approach by starting from these pontentials which are quite different compared to the original ones.

I am also suspecting that the too rigid g2-g3 might reflect to the alpha and beta.

2. I tried to tune the dihedrals between glycerol and acyl chains which I mentioned above. It did not fix the problem and the order parameters are pretty similar as above. I am now sharing the parameters which I have made by trying to directly reproduce the CHARMM dihedral distributions in the Berger model:
https://www.dropbox.com/s/ylgsczvh11qwuj8/share7.tar

The file contains also dihedral distrubutions calculated from the CHARMM model.

The structure looks currently like this:
https://www.dropbox.com/s/cavebz57cbwea4t/fftestGLYplot.jpg

It seems, at least, too rigid compared to CHARMM.

I am sharing the parameters now since I will not work on this for a while now, and someone might be willing to play or improve these.

3. Before I forget, I saw this paper on dihedral fitting that was just published which could be interesting (didn't have time yet to read it though): http://pubs.acs.org/doi/abs/10.1021/ci500112w.

4. I think that this might be useful paper for us. In their webpage they are sharing the code which can be used to fit asymmetric potentials by using trigonometric functions:

I think that I can get rid of tabulated potentials and gaussian peak addition with this. I will try this as soon as I have time.

6. Matti Javanainen sent me unpublished trajectory for POPC bilayer, ran with MacRog model. I used it to make a similar plot for the structure as shown above for the Berger and CHARMM:
https://www.dropbox.com/s/e14hh6xq67d2nju/MacRogGLYplot.jpg

Regarding the glycerol, it is slightly different than CHARMM:
g1 is more "flexible" changing between two configurations
g3 is more "rigid" sampling only two conformations (three in CHARMM)

With visual inspection it seems that beoynd the glycerol there are possibly some larger differences.

When lookng at the order parameter results, the CHARMM is closer to experiments for g3 and MacRog overestimates the forking. This would indicate that the CHARMM sampling would be better for g3. Also beta is better in CHARMM.

On the other, g2 and g1 order parameters are better in MacRog, indicating that the sampling of this part may be better in MacRog. However, when I calculated the order parameters for the first carbons in acyl chains in MacRog I got the following results:

sn1:
C2 0.0670417
C2 0.0828269
C3 -0.137857
C3 -0.131714
C4 -0.151883
C4 -0.153766
C5 -0.172636
C5 -0.172115

sn2:
C2 -0.146994
C2 -0.172383
C3 -0.186539
C3 -0.18673
C4 -0.163145
C4 -0.152277
C5 -0.176759
C5 -0.167419

Especially the sn-1 looks weird. In constrast, CHARMM is in reasonable agreement with experiments.

I did not fully understand the MacRog paper (http://dx.doi.org/10.1021/jp5016627) in this respect. In the text they say:
" Deuterium magnetic resonance studies demonstrated that the two C–D bonds of carbon 2 (C22, Figure 1) of the sn-2 chain are characterized by different values of SCD, while in the case of the sn-1 chain (C32, Figure 1) the values are the same.(88) In our simulations, differences in the SCD values of the C–D bonds in carbon 2 are observed in both chains (C22 and C32)."
However, in my eye the results shown in Fig. 7 are not what is written in the text, and also different what I got.

I think that I will continue using CHARMM as the basis for the above discussed procedure (tabulated potentials), at least for now.