https://github.com/cabb99/openfit

Molecular Density Fitting with 3D Gaussian Functions

Last synced: 10 months ago · JSON representation

Repository

Molecular Density Fitting with 3D Gaussian Functions

Basic Info

Host: GitHub
Owner: cabb99
License: mit
Language: Python
Default Branch: main
Size: 43 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Code of conduct

OpenFit

Molecular Density Fitting Using 3D Gaussian Functions

Description

This project implements a molecular density fitting approach that optimizes the correlation between an experimental density map and a synthetic density map originating from molecular dynamics using 3D Gaussian functions. The algorithm generates a force in the direction that increases the correlation between the simulated and effective maps. It also allows adjusting anisotropically the spread parameter (sigma) in each direction.

Features

Efficient simulation of density maps using 3D Gaussian functions.
Optimization of molecular coordinates and sigma parameters to improve fit.
Calculation of the correlation coefficient to measure fit quality.
Use of numerical and analytical methods to compute derivatives.
Implementation optimized with Numba for high performance.

Installation

Prerequisites

Python 3.6+
Numpy
SciPy
Numba

Usage

Quick Start

Here's a quick example to get started with the Fit class:

Fit particles without a forcefield

```python import numpy as np from openfit import Fit

Define your parameters: coordinates, sigma, experimentalmap, and voxelsize

coordinates = np.array([...]) # Particle coordinates (n, 3) sigma = np.array([...]) # Standard deviation (n, 3) densitymap = np.array([...]) # Density map voxelsize = [1, 1, 1] # Voxel size

Initialize the Fit object

mdfit = Fit(densitymap, voxel_size)

Calculate the derivative of the correlation over the coordinates and sigma

diff = mdfit.dcorrcoef(coordinates, sigma)

Perform the fitting process just with the coordinates

md_fit.fit(coordinates, sigma)

Access the optimized coordinates and sigma

optimizedcoordinates = mdfit.coordinates optimizedsigma = mdfit.sigma ```

Build a simulated map from coordinates

```python import openfit import numpy as np

Parse the pdb

scene = openfit.Scene.frompdb('pdbfile.pdb')

Corrects missing element names

scene['element'] = scene['name'].str[0] scene['mass'] = scene['element'].replace({'N': 14, 'C': 12, 'O': 16, 'S': 32})

Creates an empty voxel array

coords = scene.getcoordinates() voxelsize = (coords.max() - coords.min()) / 40 # ~40x40x40 Fit = openfit.Fit.fromdimensions(mincoords=coords.min() - 10, maxcoords=coords.max() + 10, voxelsize=voxel_size)

Sets the coordinates

Fit.set_coordinates(coords.values(), sigma=np.ones(coords.shape) * 2, epsilon=scene['mass'].values)

Saves the density map

Fit.savemrc('samplemap.mrc') ```

Derivation

Energy

This class implements V_fit, a forcefield potential that can be included in an OpenMM molecular dynamics simulation.

$$ V = V{ff} + V{Fit} $$

The potential is defined in terms of the correlation coefficient (c.c.), which is a function of the experimental and simulated densities at each voxel $(i, j, k)$:

$$ V_{Fit} = k (1 - \text{c.c.}) $$

The correlation coefficient (c.c.) is defined as:

$$ \text{c.c.} = \frac{\sum{ijk} \rho{\text{exp}}(i,j,k) \rho{\text{sim}}(i,j,k)}{\sqrt{\sum{ijk} \rho{\text{exp}}(i,j,k)^2} \sqrt{\sum{ijk} \rho_{\text{sim}}(i,j,k)^2}} $$

where $\rho{\text{exp}}(i,j,k)$ and $\rho{\text{sim}}(i,j,k)$ represent the experimental and synthetically simulated density of each voxel $(i,j,k)$, respectively. The synthetically simulated density $\rho_{\text{sim}}(i,j,k)$ is obtained by integrating the three-dimensional Gaussian function over each voxel:

$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \int{V{ijk}} g(x,y,z;xn,yn,z_n) \, dx \, dy \, dz $$

with the Gaussian function for the particle $n$, $g(x,y,z;xn,yn,z_n)$, defined as:

$$ g(x,y,z;xn,yn,zn,\sigma{x,n},\sigma{y,n},\sigma{z,n},\epsilonn) = \frac{\epsilonn}{(2\pi)^{\frac{3}{2}}\sigma{x,n}\sigma{y,n}\sigma{z,n}} \exp\left( -\frac{1}{2} \left[ \frac{(x-xn)^2}{\sigma{x,n}^2} + \frac{(y-yn)^2}{\sigma{y,n}^2} + \frac{(z-zn)^2}{\sigma_{z,n}^2} \right] \right) $$

Then the integral for each box can be written as:

$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \left( \epsilonn \int{xi^{min}}^{xi^{max}} \int{yj^{min}}^{yj^{max}} \int{zj^{min}}^{zj^{max}} g(x,y,z;xn,yn,zn,\sigma{x,n},\sigma{y,n},\sigma{z,n}) \, dz \, dy \, dx \right )$$

Where $xi^{min}$, $xi^{max}$, $yj^{min}$, $yj^{max}$, $zj^{min}$, and $zj^{max}$ are the boundaries in x, y, and z for the voxel $(i,j,k)$. The solution to this integral involves the error function ($\text{erf}$). For a Gaussian distribution with mean $\mu$ and standard deviation $\sigma$, the integral over a finite range can be expressed using the error function as follows:

$$ \Phi(x; \mu, \sigma) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right] $$

Where $\Phi(x; \mu, \sigma)$ represents the cumulative distribution function (CDF) for a normal distribution at point $x$. The error function ($\text{erf}(x)$) is defined as:

$$ \text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt $$

To compute the integral of the 3D Gaussian over the finite box, we apply the CDF for each dimension:

$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \epsilonn \times \left( \Phi(xi^{max}; xn, \sigma{x,n}) - \Phi(xi^{min}; xn, \sigma{x,n}) \right) \times \left( \Phi(yj^{max}; yn, \sigma{y,n}) - \Phi(yj^{min}; yn, \sigma{y,n}) \right) \times \left( \Phi(zk^{max}; zn, \sigma{z,n}) - \Phi(zk^{min}; zn, \sigma_{z,n}) \right) $$

Derivatives

To compute the derivative of $V{Fit}$ with respect to the coordinates of the nth atom $(xn, yn, zn)$, we need to apply the chain rule to the derivative of $V{Fit}$ in terms of c.c. and then the derivative of c.c. in terms of $(xn, yn, zn)$. Like $V_{Fit}$ with respect to a variable $v$ is:

$$ \frac{\partial V{Fit}}{\partial v} = \frac{\partial V{Fit}}{\partial \text{c.c}} \left( \frac{\partial \text{c.c}}{\partial \rho{sim}} \left( \frac{\partial \rho{sim}}{\partial x_n} \right) \right) $$

Where the first derivative is a constant:

$$ \frac{d V_{Fit}}{d v} = k \frac{d\text{c.c.}}{dv} $$

The second derivative is a function of $\rho_{sim}$:

$$ \frac{d\text{c.c.}}{dv} = \frac{\sum{ijk} \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial v} \cdot \rho{\text{exp}}(i,j,k)}{\sqrt{\sum{ijk} \rho{\text{sim}}(i,j,k)^2} \cdot \sqrt{\sum{ijk} \rho{\text{exp}}(i,j,k)^2}} - \frac{\sum{ijk} 2 \cdot \rho{\text{sim}}(i,j,k) \cdot \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial v} \cdot \sum{lmn} \rho{\text{sim}}(l,m,n) \cdot \rho{\text{exp}}(l,m,n)}{2 \cdot \left( \sum{ijk} \rho{\text{sim}}(i,j,k)^2 \right)^{\frac{3}{2}} \cdot \sqrt{\sum{ijk} \rho_{\text{exp}}(i,j,k)^2}} $$

Where $\rho{\text{exp}}(i,j,k)$ represents the experimental density at the voxel located at indices (i, j, k), $\rho{\text{sim}}(i,j,k)$ is the simulated density at the voxel $(i, j, k)$, which is a function of the molecular coordinates and parameters like sigma, and $\frac{\partial \rho_{\text{sim}}(i,j,k)}{\partial v}$ denotes the partial derivative of the simulated density at voxel $(i, j, k)$ with respect to a variable $v$ which can be the coordinates or sigma.

The derivatives of the function $\Phi(x; \mu, \sigma) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right]$ are as follows:

In terms of $\mu$:

$$ \frac{\partial \Phi}{\partial \mu} = -\frac{e^{-\frac{(x -\mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} $$

In terms of $\sigma$:

$$ \frac{\partial \Phi}{\partial \sigma} = -\frac{(x - \mu)e^{-\frac{(x - \mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma^2} $$

Then the derivative of $\frac{\partial \rho{\text{sim}}(i,j,k)}{\partial xn}$ would be for example:

$$ \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial xn} = \epsilonn \left( \frac{e^{\frac{-(xi^{max} - xn)^2}{2\sigma{x,n}^2}} - e^{\frac{-(xi^{min} - xn)^2}{2\sigma{x,n}^2}}}{\sqrt{2\pi}\sigma{x,n}} \right) \times \left( \Phi(yj^{max}; yn, \sigma{y,n}) - \Phi(yj^{min}; yn, \sigma{y,n}) \right) \times \left( \Phi(zk^{max}; zn, \sigma{z,n}) - \Phi(zk^{min}; zn, \sigma{z,n}) \right) $$

Copyright

Acknowledgements

Carlos Bueno was supported by the MolSSI Software Fellowship, under the mentorship of Jessica Nash. We thank AMD (Advanced Micro Devices, Inc.) for the donation of high-performance computing hardware and HPC resources. This project is also supported by the Center for Theoretical Biological Physics (NSF Grants PHY-2019745 and PHY-1522550), with additional support from the D.R. Bullard Welch Chair at Rice University (Grant No. C-0016 to PGW). Project based on the Computational Molecular Science Python Cookiecutter version 1.10.

Owner

Name: Carlos Bueno
Login: cabb99
Kind: user
Location: Houston, Texas
Company: Rice University

Repositories: 7
Profile: https://github.com/cabb99

Research Assistant. Center for Theoretical Biophysics (CTBP)

GitHub Events

Total

Push event: 1
Create event: 2

Last Year

Push event: 1
Create event: 2

https://github.com/cabb99/openfit

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

OpenFit

Description

Features

Installation

Prerequisites

Usage

Quick Start

Fit particles without a forcefield

Define your parameters: coordinates, sigma, experimentalmap, and voxelsize

Initialize the Fit object

Calculate the derivative of the correlation over the coordinates and sigma

Perform the fitting process just with the coordinates

Access the optimized coordinates and sigma

Build a simulated map from coordinates

Parse the pdb

Corrects missing element names

Creates an empty voxel array

Sets the coordinates

Saves the density map

Derivation

Energy

Derivatives

Copyright

Acknowledgements

Owner

GitHub Events

Total

Last Year

Dependencies