careless

Merge X-ray diffraction data with Wilson's priors, variational inference, and metadata

https://github.com/rs-station/careless

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, nature.com
  • Committers with academic emails
    3 of 8 committers (37.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.2%) to scientific vocabulary

Keywords from Contributors

interactive serializer packaging network-simulation shellcodes hacking autograding observability embedded optim
Last synced: 7 months ago · JSON representation

Repository

Merge X-ray diffraction data with Wilson's priors, variational inference, and metadata

Basic Info
  • Host: GitHub
  • Owner: rs-station
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 19.1 MB
Statistics
  • Stars: 17
  • Watchers: 6
  • Forks: 7
  • Open Issues: 14
  • Releases: 33
Created over 5 years ago · Last pushed 7 months ago
Metadata Files
Readme License

README.md

Careless

Merging crystallography data without much physics.

Build codecov PyPI DOI

Installation

As described in the TensorFlow docs, it is best practice to install careless in a fresh anaconda environment to avoid conflicts with previously installed dependencies.

Create a new environment using the following commands. bash conda create -yn careless python=3.12 conda activate careless pip install --upgrade pip

Now install careless for CPU, bash pip install careless or for NVIDIA GPUs bash pip install careless[cuda] You may run careless devices to check whether GPU support was successfully installed. If you run into issues please File an issue.

Installation with GPU Support

Careless supports GPU acceleration on NVIDIA GPUs through the CUDA library. We strongly encourage users to take advantage of this feature. To streamline installation, we maintain a script which installs careless with CUDA support. The following section will guide you through installing careless for the GPU.

Dependencies

careless is likely to run on any operating system and python version which is compatible with TensorFlow. careless uses mostly tools from the conventional scientific python stack plus - optimization routines from TensorFlow - statistical distributions from Tensorflow-Probability - crystallographic computing resources from - ReciprocalSpaceship - GEMMI

careless does not require but may take advantage of various accelator cards supported by TensorFlow.

Get Help

For help with command line arguments, type careless mono --help for monochromatic or careless poly --help for Laue processing options.

For usage examples and data from the careless preprint and paper, check out careless-examples. For a detailed case study of careless processing including information about crossvalidation measures, see our preprint and paper on time-resolved study of DJ-1.

Still confused? File an issue! Issues help us improve our code base and leave a public record for other users.

Core Model

pgm

careless uses approximate Bayesian inference to merge X-ray diffraction data. The model which is implemented in careless tries to scale individual reflection observations such that they become consistent with a set of prior beliefs. During optimization of a model, careless trades off between consistency of the merged structure factor amplitudes with the data and consistency with the priors. In essence, the optimizer tries to strike a compromise which maximizes the likelihood of the observed data while not straying far from the prior distributions.

The implementation breaks the model down into 4 types of objects.

Variational Merging Model

The VariationalMergingModel is central object which houses the estimates of the merged structure factors. In careless merged structure factors are represented by truncated normal distributions which have support on (0, ∞). According to French and Wilson2 this is the appropriate parameterization for acentric reflections which are by far the majority in most space groups. These distributions are stored in the VariationalMergingModel.surrogate_posterior attribute. They serve as a parametric approximation of the true posterior which cannot easily be calculated. It has utility methods for training the model. It contains an instance of each of the other objects. During optimization, the loss function is constructed by sampling values for the merged structure factors and scales these are combined with the prior and likelihood to compute the Evidence Lower BOund or (ELBO) Gradiennt ascent is used to maximize the ELBO.

Priors

The simplest prior which careless implements are the popular priors1 derived by A. J. C. Wilson from the random atom model. This is a relatively weak prior, but it is sufficient in practice for many types of crystallographic data.

careless now includes support for use of multivariate priors as described in our preprint. See the dw-examples repo for use examples. Support for reference priors will be addressed in a future release.

Likelihoods

The quality of the current structure factor estimates during optimization is judged by a likelihood function. These are symmetric probability distributions centered at the observed reflection observation. careless includes normally-distributed and robust, t-distributed likelihoods.

Scaling Models

Right now the only model which careless explicitly implements is a sequential neural network model. This model takes reflection metadata as input and outputs a gaussian distribution of likely scale values for each reflection.

Special metadata keys for scaling. careless will parse any existing metadata keys in the input Mtz(s). During configuration some new metadata keys will be populated that are useful in many instances. - dHKL : The inverse square of the reflection resolution. Supplying this key is a convenient way to parameterize isotropic scaling. - fileid : An integer ID unique to each input Mtz. - imageid : An integer ID unique to each image across all input Mtzs. - {H,K,L}obs : Internally, careless refers to the original miller indices from indexing as Hobs, Kobs, and Lobs. Supplying these three keys is the typical method to enable anisotropic scaling.

Considerations when choosing metadata.

  • Polarization correction : Careless does not apply a specific polarization correction. In order to be sure the model accounts for polarization, it is important to supply the x,y coordinates of each reflection observation.
  • Isotropic scaling : This is easily accounted for by supplying the 'dHKL' metadata key.
  • Interleaved rotation series : Most properly formatted Mtzs have a "Batch" column which contains a unique id for each image. Importantly, these are usually in order. If you have time resolved data with multiple timepoints per angle, you may want to use the "Batch" key in conjunction with the "file_id" key. This way images from the same rotation angle will be constrained to scale more similarly.
  • Multi crystal scaling : For scaling multiple crystals, it is best if image identifiers in the metadata do not overlap. Therefore, use the 'image_id' key.

1: Wilson, A. J. C. “The Probability Distribution of X-Ray Intensities.” Acta Crystallographica 2, no. 5 (October 2, 1949): 318–21. https://doi.org/10.1107/S0365110X49000813.

2: French, S., and K. Wilson. “On the Treatment of Negative Intensity Observations.” Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography 34, no. 4 (July 1, 1978): 517–25. https://doi.org/10.1107/S0567739478001114.

Owner

  • Name: Reciprocal Space Station
  • Login: rs-station
  • Kind: organization

Open source crystallography software for exploring reciprocal space

GitHub Events

Total
  • Create event: 11
  • Issues event: 6
  • Release event: 3
  • Watch event: 2
  • Delete event: 7
  • Issue comment event: 15
  • Push event: 19
  • Pull request review comment event: 1
  • Pull request review event: 3
  • Pull request event: 14
Last Year
  • Create event: 11
  • Issues event: 6
  • Release event: 3
  • Watch event: 2
  • Delete event: 7
  • Issue comment event: 15
  • Push event: 19
  • Pull request review comment event: 1
  • Pull request review event: 3
  • Pull request event: 14

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 612
  • Total Committers: 8
  • Avg Commits per committer: 76.5
  • Development Distribution Score (DDS): 0.611
Past Year
  • Commits: 99
  • Committers: 4
  • Avg Commits per committer: 24.75
  • Development Distribution Score (DDS): 0.222
Top Committers
Name Email Commits
Kevin Dalton k****n@g****m 238
Kevin Dalton k****n@f****u 167
Kevin Dalton k****n@p****n 138
Kevin Dalton k****n 43
Jack Greisman J****n@g****m 22
Egor Marin m****n@p****u 2
dependabot[bot] 4****] 1
Doris Mai h****i@c****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 69
  • Total pull requests: 99
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 4 days
  • Total issue authors: 14
  • Total pull request authors: 8
  • Average comments per issue: 3.35
  • Average comments per pull request: 0.94
  • Merged pull requests: 91
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 3
  • Pull requests: 5
  • Average time to close issues: 5 days
  • Average time to close pull requests: 2 days
  • Issue authors: 3
  • Pull request authors: 3
  • Average comments per issue: 2.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kmdalton (25)
  • DHekstra (22)
  • gyuhyeokcho (5)
  • dennisbrookner (3)
  • JBGreisman (3)
  • hkwang (2)
  • DorisMai (2)
  • ElkeDeZitter (2)
  • marinegor (1)
  • apeck12 (1)
  • maggie-klureza (1)
  • james-vincent (1)
  • biochem-fan (1)
  • kiwhite (1)
Pull Request Authors
  • kmdalton (96)
  • JBGreisman (8)
  • marinegor (2)
  • DorisMai (2)
  • dennisbrookner (1)
  • PrinceWalnut (1)
  • hkwang (1)
  • dependabot[bot] (1)
  • DHekstra (1)
Top Labels
Issue Labels
enhancement (10) bug (4) wontfix (1)
Pull Request Labels
documentation (3) bug (3) enhancement (1) dependencies (1)

Dependencies

setup.py pypi
  • matplotlib *
  • reciprocalspaceship >=0.9.16
  • tensorflow >=2.7.0rc1
  • tensorflow-probability *
  • tqdm *
.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • codecov/codecov-action v1 composite
.github/workflows/publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite