https://github.com/cabb99/openfit

Molecular Density Fitting with 3D Gaussian Functions

https://github.com/cabb99/openfit

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Molecular Density Fitting with 3D Gaussian Functions

Basic Info
  • Host: GitHub
  • Owner: cabb99
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 43 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Code of conduct

README.md

OpenFit

GitHub Actions Build Status codecov

Molecular Density Fitting Using 3D Gaussian Functions

Description

This project implements a molecular density fitting approach that optimizes the correlation between an experimental density map and a synthetic density map originating from molecular dynamics using 3D Gaussian functions. The algorithm generates a force in the direction that increases the correlation between the simulated and effective maps. It also allows adjusting anisotropically the spread parameter (sigma) in each direction.

Features

  • Efficient simulation of density maps using 3D Gaussian functions.
  • Optimization of molecular coordinates and sigma parameters to improve fit.
  • Calculation of the correlation coefficient to measure fit quality.
  • Use of numerical and analytical methods to compute derivatives.
  • Implementation optimized with Numba for high performance.

Installation

Prerequisites

  • Python 3.6+
  • Numpy
  • SciPy
  • Numba

Usage

Quick Start

Here's a quick example to get started with the Fit class:

Fit particles without a forcefield

```python import numpy as np from openfit import Fit

Define your parameters: coordinates, sigma, experimentalmap, and voxelsize

coordinates = np.array([...]) # Particle coordinates (n, 3) sigma = np.array([...]) # Standard deviation (n, 3) densitymap = np.array([...]) # Density map voxelsize = [1, 1, 1] # Voxel size

Initialize the Fit object

mdfit = Fit(densitymap, voxel_size)

Calculate the derivative of the correlation over the coordinates and sigma

diff = mdfit.dcorrcoef(coordinates, sigma)

Perform the fitting process just with the coordinates

md_fit.fit(coordinates, sigma)

Access the optimized coordinates and sigma

optimizedcoordinates = mdfit.coordinates optimizedsigma = mdfit.sigma ```

Build a simulated map from coordinates

```python import openfit import numpy as np

Parse the pdb

scene = openfit.Scene.frompdb('pdbfile.pdb')

Corrects missing element names

scene['element'] = scene['name'].str[0] scene['mass'] = scene['element'].replace({'N': 14, 'C': 12, 'O': 16, 'S': 32})

Creates an empty voxel array

coords = scene.getcoordinates() voxelsize = (coords.max() - coords.min()) / 40 # ~40x40x40 Fit = openfit.Fit.fromdimensions(mincoords=coords.min() - 10, maxcoords=coords.max() + 10, voxelsize=voxel_size)

Sets the coordinates

Fit.set_coordinates(coords.values(), sigma=np.ones(coords.shape) * 2, epsilon=scene['mass'].values)

Saves the density map

Fit.savemrc('samplemap.mrc') ```

Derivation

Energy

This class implements V_fit, a forcefield potential that can be included in an OpenMM molecular dynamics simulation.

$$ V = V{ff} + V{Fit} $$

The potential is defined in terms of the correlation coefficient (c.c.), which is a function of the experimental and simulated densities at each voxel $(i, j, k)$:

$$ V_{Fit} = k (1 - \text{c.c.}) $$

The correlation coefficient (c.c.) is defined as:

$$ \text{c.c.} = \frac{\sum{ijk} \rho{\text{exp}}(i,j,k) \rho{\text{sim}}(i,j,k)}{\sqrt{\sum{ijk} \rho{\text{exp}}(i,j,k)^2} \sqrt{\sum{ijk} \rho_{\text{sim}}(i,j,k)^2}} $$

where $\rho{\text{exp}}(i,j,k)$ and $\rho{\text{sim}}(i,j,k)$ represent the experimental and synthetically simulated density of each voxel $(i,j,k)$, respectively. The synthetically simulated density $\rho_{\text{sim}}(i,j,k)$ is obtained by integrating the three-dimensional Gaussian function over each voxel:

$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \int{V{ijk}} g(x,y,z;xn,yn,z_n) \, dx \, dy \, dz $$

with the Gaussian function for the particle $n$, $g(x,y,z;xn,yn,z_n)$, defined as:

$$ g(x,y,z;xn,yn,zn,\sigma{x,n},\sigma{y,n},\sigma{z,n},\epsilonn) = \frac{\epsilonn}{(2\pi)^{\frac{3}{2}}\sigma{x,n}\sigma{y,n}\sigma{z,n}} \exp\left( -\frac{1}{2} \left[ \frac{(x-xn)^2}{\sigma{x,n}^2} + \frac{(y-yn)^2}{\sigma{y,n}^2} + \frac{(z-zn)^2}{\sigma_{z,n}^2} \right] \right) $$

Then the integral for each box can be written as:

$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \left( \epsilonn \int{xi^{min}}^{xi^{max}} \int{yj^{min}}^{yj^{max}} \int{zj^{min}}^{zj^{max}} g(x,y,z;xn,yn,zn,\sigma{x,n},\sigma{y,n},\sigma{z,n}) \, dz \, dy \, dx \right )$$

Where $xi^{min}$, $xi^{max}$, $yj^{min}$, $yj^{max}$, $zj^{min}$, and $zj^{max}$ are the boundaries in x, y, and z for the voxel $(i,j,k)$. The solution to this integral involves the error function ($\text{erf}$). For a Gaussian distribution with mean $\mu$ and standard deviation $\sigma$, the integral over a finite range can be expressed using the error function as follows:

$$ \Phi(x; \mu, \sigma) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right] $$

Where $\Phi(x; \mu, \sigma)$ represents the cumulative distribution function (CDF) for a normal distribution at point $x$. The error function ($\text{erf}(x)$) is defined as:

$$ \text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt $$

To compute the integral of the 3D Gaussian over the finite box, we apply the CDF for each dimension:

$$ \rho{\text{sim}}(i,j,k) = \sum{n=1}^N \epsilonn \times \left( \Phi(xi^{max}; xn, \sigma{x,n}) - \Phi(xi^{min}; xn, \sigma{x,n}) \right) \times \left( \Phi(yj^{max}; yn, \sigma{y,n}) - \Phi(yj^{min}; yn, \sigma{y,n}) \right) \times \left( \Phi(zk^{max}; zn, \sigma{z,n}) - \Phi(zk^{min}; zn, \sigma_{z,n}) \right) $$

Derivatives

To compute the derivative of $V{Fit}$ with respect to the coordinates of the nth atom $(xn, yn, zn)$, we need to apply the chain rule to the derivative of $V{Fit}$ in terms of c.c. and then the derivative of c.c. in terms of $(xn, yn, zn)$. Like $V_{Fit}$ with respect to a variable $v$ is:

$$ \frac{\partial V{Fit}}{\partial v} = \frac{\partial V{Fit}}{\partial \text{c.c}} \left( \frac{\partial \text{c.c}}{\partial \rho{sim}} \left( \frac{\partial \rho{sim}}{\partial x_n} \right) \right) $$

Where the first derivative is a constant:

$$ \frac{d V_{Fit}}{d v} = k \frac{d\text{c.c.}}{dv} $$

The second derivative is a function of $\rho_{sim}$:

$$ \frac{d\text{c.c.}}{dv} = \frac{\sum{ijk} \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial v} \cdot \rho{\text{exp}}(i,j,k)}{\sqrt{\sum{ijk} \rho{\text{sim}}(i,j,k)^2} \cdot \sqrt{\sum{ijk} \rho{\text{exp}}(i,j,k)^2}} - \frac{\sum{ijk} 2 \cdot \rho{\text{sim}}(i,j,k) \cdot \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial v} \cdot \sum{lmn} \rho{\text{sim}}(l,m,n) \cdot \rho{\text{exp}}(l,m,n)}{2 \cdot \left( \sum{ijk} \rho{\text{sim}}(i,j,k)^2 \right)^{\frac{3}{2}} \cdot \sqrt{\sum{ijk} \rho_{\text{exp}}(i,j,k)^2}} $$

Where $\rho{\text{exp}}(i,j,k)$ represents the experimental density at the voxel located at indices (i, j, k), $\rho{\text{sim}}(i,j,k)$ is the simulated density at the voxel $(i, j, k)$, which is a function of the molecular coordinates and parameters like sigma, and $\frac{\partial \rho_{\text{sim}}(i,j,k)}{\partial v}$ denotes the partial derivative of the simulated density at voxel $(i, j, k)$ with respect to a variable $v$ which can be the coordinates or sigma.

The derivatives of the function $\Phi(x; \mu, \sigma) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right]$ are as follows:

  • In terms of $\mu$:

$$ \frac{\partial \Phi}{\partial \mu} = -\frac{e^{-\frac{(x -\mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} $$

  • In terms of $\sigma$:

$$ \frac{\partial \Phi}{\partial \sigma} = -\frac{(x - \mu)e^{-\frac{(x - \mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma^2} $$

Then the derivative of $\frac{\partial \rho{\text{sim}}(i,j,k)}{\partial xn}$ would be for example:

$$ \frac{\partial \rho{\text{sim}}(i,j,k)}{\partial xn} = \epsilonn \left( \frac{e^{\frac{-(xi^{max} - xn)^2}{2\sigma{x,n}^2}} - e^{\frac{-(xi^{min} - xn)^2}{2\sigma{x,n}^2}}}{\sqrt{2\pi}\sigma{x,n}} \right) \times \left( \Phi(yj^{max}; yn, \sigma{y,n}) - \Phi(yj^{min}; yn, \sigma{y,n}) \right) \times \left( \Phi(zk^{max}; zn, \sigma{z,n}) - \Phi(zk^{min}; zn, \sigma{z,n}) \right) $$

Copyright

© 2024, Carlos Bueno

Acknowledgements

Carlos Bueno was supported by the MolSSI Software Fellowship, under the mentorship of Jessica Nash. We thank AMD (Advanced Micro Devices, Inc.) for the donation of high-performance computing hardware and HPC resources. This project is also supported by the Center for Theoretical Biological Physics (NSF Grants PHY-2019745 and PHY-1522550), with additional support from the D.R. Bullard Welch Chair at Rice University (Grant No. C-0016 to PGW). Project based on the Computational Molecular Science Python Cookiecutter version 1.10.

Owner

  • Name: Carlos Bueno
  • Login: cabb99
  • Kind: user
  • Location: Houston, Texas
  • Company: Rice University

Research Assistant. Center for Theoretical Biophysics (CTBP)

GitHub Events

Total
  • Push event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Create event: 2

Dependencies

pyproject.toml pypi