https://github.com/cabb99/openfit
Molecular Density Fitting with 3D Gaussian Functions
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Molecular Density Fitting with 3D Gaussian Functions
Basic Info
- Host: GitHub
- Owner: cabb99
- License: mit
- Language: Python
- Default Branch: main
- Size: 43 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
OpenFit
Molecular Density Fitting Using 3D Gaussian Functions
Description
This project implements a molecular density fitting approach that optimizes the correlation between an experimental density map and a synthetic density map originating from molecular dynamics using 3D Gaussian functions. The algorithm generates a force in the direction that increases the correlation between the simulated and effective maps. It also allows adjusting anisotropically the spread parameter (sigma) in each direction.
Features
- Efficient simulation of density maps using 3D Gaussian functions.
- Optimization of molecular coordinates and sigma parameters to improve fit.
- Calculation of the correlation coefficient to measure fit quality.
- Use of numerical and analytical methods to compute derivatives.
- Implementation optimized with Numba for high performance.
Installation
Prerequisites
- Python 3.6+
- Numpy
- SciPy
- Numba
Usage
Quick Start
Here's a quick example to get started with the Fit class:
Fit particles without a forcefield
```python import numpy as np from openfit import Fit
Define your parameters: coordinates, sigma, experimentalmap, and voxelsize
coordinates = np.array([...]) # Particle coordinates (n, 3) sigma = np.array([...]) # Standard deviation (n, 3) densitymap = np.array([...]) # Density map voxelsize = [1, 1, 1] # Voxel size
Initialize the Fit object
mdfit = Fit(densitymap, voxel_size)
Calculate the derivative of the correlation over the coordinates and sigma
diff = mdfit.dcorrcoef(coordinates, sigma)
Perform the fitting process just with the coordinates
md_fit.fit(coordinates, sigma)
Access the optimized coordinates and sigma
optimizedcoordinates = mdfit.coordinates optimizedsigma = mdfit.sigma ```
Build a simulated map from coordinates
```python import openfit import numpy as np
Parse the pdb
scene = openfit.Scene.frompdb('pdbfile.pdb')
Corrects missing element names
scene['element'] = scene['name'].str[0] scene['mass'] = scene['element'].replace({'N': 14, 'C': 12, 'O': 16, 'S': 32})
Creates an empty voxel array
coords = scene.getcoordinates() voxelsize = (coords.max() - coords.min()) / 40 # ~40x40x40 Fit = openfit.Fit.fromdimensions(mincoords=coords.min() - 10, maxcoords=coords.max() + 10, voxelsize=voxel_size)
Sets the coordinates
Fit.set_coordinates(coords.values(), sigma=np.ones(coords.shape) * 2, epsilon=scene['mass'].values)
Saves the density map
Fit.savemrc('samplemap.mrc') ```
Derivation
Energy
This class implements V_fit, a forcefield potential that can be included in an OpenMM molecular dynamics simulation.
$$ V = V{ff} + V{Fit} $$
The potential is defined in terms of the correlation coefficient (c.c.), which is a function of the experimental and simulated densities at each voxel $(i, j, k)$:
$$ V_{Fit} = k (1 - \text{c.c.}) $$
The correlation coefficient (c.c.) is defined as:
$$ \text{c.c.} = \frac{\sum{ijk} \rho{\text{exp}}(i,j,k) \rho{\text{sim}}(i,j,k)}{\sqrt{\sum{ijk} \rho{\text{exp}}(i,j,k)^2} \sqrt{\sum{ijk} \rho_{\text{sim}}(i,j,k)^2}} $$
where $\rho{\text{exp}}(i,j,k)$ and $\rho{\text{sim}}(i,j,k)$ represent the experimental and synthetically simulated density of each voxel $(i,j,k)$, respectively. The synthetically simulated density $\rho_{\text{sim}}(i,j,k)$ is obtained by integrating the three-dimensional Gaussian function over each voxel:
$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \int{V{ijk}} g(x,y,z;xn,yn,z_n) \, dx \, dy \, dz $$
with the Gaussian function for the particle $n$, $g(x,y,z;xn,yn,z_n)$, defined as:
$$ g(x,y,z;xn,yn,zn,\sigma{x,n},\sigma{y,n},\sigma{z,n},\epsilonn) = \frac{\epsilonn}{(2\pi)^{\frac{3}{2}}\sigma{x,n}\sigma{y,n}\sigma{z,n}} \exp\left( -\frac{1}{2} \left[ \frac{(x-xn)^2}{\sigma{x,n}^2} + \frac{(y-yn)^2}{\sigma{y,n}^2} + \frac{(z-zn)^2}{\sigma_{z,n}^2} \right] \right) $$
Then the integral for each box can be written as:
$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \left( \epsilonn \int{xi^{min}}^{xi^{max}} \int{yj^{min}}^{yj^{max}} \int{zj^{min}}^{zj^{max}} g(x,y,z;xn,yn,zn,\sigma{x,n},\sigma{y,n},\sigma{z,n}) \, dz \, dy \, dx \right )$$
Where $xi^{min}$, $xi^{max}$, $yj^{min}$, $yj^{max}$, $zj^{min}$, and $zj^{max}$ are the boundaries in x, y, and z for the voxel $(i,j,k)$. The solution to this integral involves the error function ($\text{erf}$). For a Gaussian distribution with mean $\mu$ and standard deviation $\sigma$, the integral over a finite range can be expressed using the error function as follows:
$$ \Phi(x; \mu, \sigma) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right] $$
Where $\Phi(x; \mu, \sigma)$ represents the cumulative distribution function (CDF) for a normal distribution at point $x$. The error function ($\text{erf}(x)$) is defined as:
$$ \text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt $$
To compute the integral of the 3D Gaussian over the finite box, we apply the CDF for each dimension:
$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \epsilonn \times \left( \Phi(xi^{max}; xn, \sigma{x,n}) - \Phi(xi^{min}; xn, \sigma{x,n}) \right) \times \left( \Phi(yj^{max}; yn, \sigma{y,n}) - \Phi(yj^{min}; yn, \sigma{y,n}) \right) \times \left( \Phi(zk^{max}; zn, \sigma{z,n}) - \Phi(zk^{min}; zn, \sigma_{z,n}) \right) $$
Derivatives
To compute the derivative of $V{Fit}$ with respect to the coordinates of the nth atom $(xn, yn, zn)$, we need to apply the chain rule to the derivative of $V{Fit}$ in terms of c.c. and then the derivative of c.c. in terms of $(xn, yn, zn)$. Like $V_{Fit}$ with respect to a variable $v$ is:
$$ \frac{\partial V{Fit}}{\partial v} = \frac{\partial V{Fit}}{\partial \text{c.c}} \left( \frac{\partial \text{c.c}}{\partial \rho{sim}} \left( \frac{\partial \rho{sim}}{\partial x_n} \right) \right) $$
Where the first derivative is a constant:
$$ \frac{d V_{Fit}}{d v} = k \frac{d\text{c.c.}}{dv} $$
The second derivative is a function of $\rho_{sim}$:
$$ \frac{d\text{c.c.}}{dv} = \frac{\sum{ijk} \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial v} \cdot \rho{\text{exp}}(i,j,k)}{\sqrt{\sum{ijk} \rho{\text{sim}}(i,j,k)^2} \cdot \sqrt{\sum{ijk} \rho{\text{exp}}(i,j,k)^2}} - \frac{\sum{ijk} 2 \cdot \rho{\text{sim}}(i,j,k) \cdot \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial v} \cdot \sum{lmn} \rho{\text{sim}}(l,m,n) \cdot \rho{\text{exp}}(l,m,n)}{2 \cdot \left( \sum{ijk} \rho{\text{sim}}(i,j,k)^2 \right)^{\frac{3}{2}} \cdot \sqrt{\sum{ijk} \rho_{\text{exp}}(i,j,k)^2}} $$
Where $\rho{\text{exp}}(i,j,k)$ represents the experimental density at the voxel located at indices (i, j, k), $\rho{\text{sim}}(i,j,k)$ is the simulated density at the voxel $(i, j, k)$, which is a function of the molecular coordinates and parameters like sigma, and $\frac{\partial \rho_{\text{sim}}(i,j,k)}{\partial v}$ denotes the partial derivative of the simulated density at voxel $(i, j, k)$ with respect to a variable $v$ which can be the coordinates or sigma.
The derivatives of the function $\Phi(x; \mu, \sigma) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right]$ are as follows:
- In terms of $\mu$:
$$ \frac{\partial \Phi}{\partial \mu} = -\frac{e^{-\frac{(x -\mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} $$
- In terms of $\sigma$:
$$ \frac{\partial \Phi}{\partial \sigma} = -\frac{(x - \mu)e^{-\frac{(x - \mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma^2} $$
Then the derivative of $\frac{\partial \rho{\text{sim}}(i,j,k)}{\partial xn}$ would be for example:
$$ \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial xn} = \epsilonn \left( \frac{e^{\frac{-(xi^{max} - xn)^2}{2\sigma{x,n}^2}} - e^{\frac{-(xi^{min} - xn)^2}{2\sigma{x,n}^2}}}{\sqrt{2\pi}\sigma{x,n}} \right) \times \left( \Phi(yj^{max}; yn, \sigma{y,n}) - \Phi(yj^{min}; yn, \sigma{y,n}) \right) \times \left( \Phi(zk^{max}; zn, \sigma{z,n}) - \Phi(zk^{min}; zn, \sigma{z,n}) \right) $$
Copyright
© 2024, Carlos Bueno
Acknowledgements
Carlos Bueno was supported by the MolSSI Software Fellowship, under the mentorship of Jessica Nash. We thank AMD (Advanced Micro Devices, Inc.) for the donation of high-performance computing hardware and HPC resources. This project is also supported by the Center for Theoretical Biological Physics (NSF Grants PHY-2019745 and PHY-1522550), with additional support from the D.R. Bullard Welch Chair at Rice University (Grant No. C-0016 to PGW). Project based on the Computational Molecular Science Python Cookiecutter version 1.10.
Owner
- Name: Carlos Bueno
- Login: cabb99
- Kind: user
- Location: Houston, Texas
- Company: Rice University
- Repositories: 7
- Profile: https://github.com/cabb99
Research Assistant. Center for Theoretical Biophysics (CTBP)
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2