Recent Releases of GEMMI

GEMMI - 0.7.1

Library

  • reading mmcif: added reading of TLS information
  • writing mmcif: added a few new items
  • using Logger also in classes Ddl and CifToMtz (Logger is now used in all library functions that output warnings or messages)
  • improved and documented mmCIF validation (with DDL2)
  • added read_ccp4_header() for reading only map header when a map is a huge file
  • added a few functions related to TLS (will be documented later)
  • documented: working with XDS files, normalization of amplitudes (F->E)
  • calculating merging statistics (R-merge, R-meas, R-pim, CC1/2) in various ways: Gemmi can calculate R-merge (and other R-*) in 3 different ways that are present in the literature and other programs; for CC1/2, the sigma-tau method is used
  • internal refactoring of file reading
  • misc bug fixes

Python

  • functions Mtz.filtered() and XdsAscii.filtered()
  • a number of other additions and a few small changes/fixes
  • all cif-reading function consistently read gzipped files (previously, cif.read_file() and gemmi.read_small_structure() didn't)

Program

  • gemmi fprime supports ranges of energies
  • gemmi merge – added new options, most importantly --stats to print quality metrics
  • gemmi convert – option --add-tls to convert "residual" B-factors (from Refmac) to full B-factors
  • gemmi mask – solvent masking that takes into account alternative conformers and atom occupancy (experimental)

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 1 year ago

GEMMI - 0.7.0

C++14 (or later) is required to build the library, C++17 (or later) to build Python bindings. Expect breaking changes, especially in Python bindings. The lists below are not complete, but should cover most of the changes.

Library

  • Added unified logging of warnings/errors from various gemmi functions (class Logger)
  • replaced string Model::name with int Model::num
  • mmcif: better handling of null authcompid
  • fixes for mmJSON
  • Removed deprecated functions:
    • UnitCell.fractionalizationmatrix and orthogonalizationmatrix – use frac.mat and orth.mat
    • counthydrogensites() – use hashydrogen() or countatom_sites(gemmi.Selection('[H,D]')
    • Grid::resampleto() – use interpolategrid()
  • unified API of Grid interpolation functions. They now have parameter order that can be 0 (nearest value), 1 (linear interpolation), or 3 (cubic). In C++ there are also functions such as trilinear_interpolation() to ensure no overhead.
  • to_pdb: write HET records
  • Extended selection syntax with: [metals] and [nonmetals].
  • Added function setismetal() intended for debatable metalloids
  • improved interoperability with MMDB (a CCP4 library)
  • MonLib: removed read_cif args
  • mtz: fixed writing BATCH records
  • hydrogen placement: fixes needed for new files with metals in CCP4 Monomer Library
  • pdb: fixed reading TLS S tensor
  • Structure metadata: expanded RefinementInfo

Python

  • Python bindings migrated from pybind11 to nanobind.
    • Much lower runtime overhead, faster build times, better error diagnostics.
    • Built-in typing stubs.
    • Only Python 3.8+.
    • Sadly, no support for Buffer Protocol. It was replaced with NumPy __array__ methods. For NumPy, you can also use .array properties that were available also in the previous releases.
    • No implicit conversions from list to ndarray, and from bytes to string (let me know where it causes problems)
    • gemmi.ValueSigmaAsuData.value_array has now shape (N,2)
  • Added pickling support for Structure, Model, Chain, Residue, Atom, cif.Document, cif.Block.
  • Added function interpolatepositionarray (#323).
  • Python extension module is now installed into site-packages/gemmi/ (this change should be invisible to the user)

Program

  • gemmi convert --sifts-num is now more customizable
  • gemmi sf2map: added option --check (see docs)
  • gemmi cif2mtz: add a rule to spec to convert pdbx_F_calc_with_solvent to F-model (+phase)
  • gemmi xds2mtz: handles merged files from XSCALE
  • gemmi mtz2cif and merge: recognize extension .ahkl as XDS file

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 1 year ago

GEMMI - 0.6.7

This is primarily a bug-fix release. New Python bindings are not included yet.

Enhancements:

  • New subcommand gemmi set for changing coordinates, B-factors and occupancies in coordinate files (mmCIF and PDB). Unlike other tools, it replaces numbers while leaving the rest of the file intact. An alternative to CCP4 PDBSET keywords: BFACTOR, OCCUPANCY, SHIFT, NOISE. Note that gemmi convert offers overlapping capabilities. For instance, gemmi convert --apply-symop=x+0.123,y,z shifts the coordinates similarly to gemmi set --shift='9.3 0 0' (the latter takes the shift in Angstroms).

  • Improved anisotropic scaling of structure factors. More work is planned in this area.

Fixes:

  • fixed reading of mmCIF files without _atom_site.auth_seq_id
  • in Topology preparation: fixed a couple of bugs, peptide links are now assumed to be CIS for ω=0±60° (previously, ω=0±30°)
  • fixed re-assignment of ATOM/HETATM record types (gemmi convert --assign-records)
  • fixed gemmi convert --sifts-num for UniProt sequence numbers >5000

And various minor changes that are hard to describe concisely.

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 1 year ago

GEMMI - 0.6.6

Library: * SmallStructure: changed how the space group is read and accessed. Relying on H-M space group names alone was not always sufficient. The new mechanism uses the list of operations and Hall symbol in preference to the H-M symbol – the order is configurable. * symmetry triplets: parse decimal fractions (small molecule files may use notation such as x+0.25 instead of x+1/4) * tabulated space groups: a few more settings: B 1 2 1, B 1 21 1, F 1 m 1, F 1 d 1, F 1 2 1 * X-ray scattering coefficients: changed the default value of IT92::ignore_charge to true (i.e. charges are now ignored by default; before version 0.6.3 they were always ignored) * cif::Table: added method ensure_loop() that converts tag-value pairs into a loop; might be needed before calling append_row() * placehydrogens(): fix for NH3-like configurations * improved gemmi->mmdb conversion * Grid: tweaked goodgridsize() to ensure that when creating a grid up to a certain dmin, all reflections up to dmin are in the grid (it matters when no oversampling is applied) * DensityCalculator: deprecated function `setgridcellandspacegroup(), usegrid.setupfrom() * fixed TNT-compatible reciprocal space ASU calculation for non-standard settings * infer_polymer_end(): complicate the heuristic even more, to detect files that have HETATM incorrectly used for standard residues in a polymer (such files were reported, they are either a result of mutating from non-standard residues, or a buggy program) * added function assign_het_flags() to re-set ATOM/HETATM flags * Model: added funtionscalculatebisorange()andcalculatebanisorange(); the first one can be used to detect if pLDDT is in the range 0-100 (like from AlphaFold) or 0-1 (like from ESMFold) * writing mmCIF: write _entity_poly_seq.hetero * added flagEntity::reflectsmicrohetero` that shows if sequences were read from SEQRES (and don't account for point mutations) or from _entitypolyseq; new function `addmicroheterotosequences()` changes the former to the latter

Program: * gemmi sfcalc: added a few more options * gemmi convert: added options --assign-records[=A|H], improved --sifts-num, adding microheterogeneities to entitypoly_seq when converting from PDB * gemmi cifdiff: added option -t for basic comparison of values for a single tag

Other: * minimal WebAssembly port (C++ code compiled with emscripten) of Structure, as a proof-of-concept and for reading mmCIF files in UglyMol * examples/to_rdkit.py: example of conversion of gemmi ChemComp to RDKit Mol

and a number of less important changes

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 2 years ago

GEMMI - 0.6.5

Library: * gemmi can now be built with zlib-ng, a faster fork of zlib (good for working with large, compressed files) * experimental: binary serialization of Structure (contained objects, such as Model, Chain or UnitCell, can also be serialized separately) * finalized handling of 5-character monomer names; uses the tilde-hetnam extension (ABCDE~DE) for PDB files * when atom names in the coordinate file match previous names (_chem_comp_atom.alt_atom_id) from the monomer library (the names in the CCD and therefore also in the ML change occasionally), print better diagnostic; added function MonLib::update_old_atom_names() to update the names in a Structure * topology: fixed handling of two bonds between the same two residues * options for handling mmCIF files with incorrect entities (modified add_entity_ids() when called with overwrite=true) * added function Intensities::prepare_merged_mtz() * a few bug fixes (for instance, in handling of negative residue numbers in the selection syntax)

Python bindings: * generating type stubs - see #293 * python: cif.Loop.val() has been replaced with __getitem__/__setitem__ * fixed Mtz.Batch.ints and Mtz.Batch.floats

Program * subcommand diff has been renamed to cifdiff * subcommand prep has been renamed to crd * validate: more options for checking monomer files * gemmi-grep: added option --extended-regexp * mtz2cif: added column names Iplus/Iminus (used by ccp4i2) to the default conversion spec

Note: this list is meant to show important changes only.

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 2 years ago

GEMMI - 0.6.4

Library * completely changed build system for Python module, from setuptools to scikit-build-core * optimized electron density calculation: single-precision version is now about 2x faster and slightly less exact; some other grid-based calculations also got optimized in the process * as part of the above optimizations, some of the grid computations require that the model is in the standard orientation (conventional axis directions); in other cases (which are very rare after the remediation of non-standard coordinate frames in the PDB) call standardizecrystalframe() * CIF output: more flexible formatting * mmCIF writing: category entitypoly is included by default, with pdbxstrandid and pdbxseqonelettercode * minor changes in reading mmCIF coordinate files * cif: added functions Loop::addcolumns(), Loop::removecolumn(), Column::erase() * MRC map format: ORIGIN record is ignored (previously, if ORIGIN was non-zero, Ccp4::fullcell() returned false and some map properties were not set) * new function Grid::symmetrizeavg() * fixed bug in ReciprocalGrid::prepareasudata() * added function readpirorfasta() for reading sequences (previously it was undocumented and more limited) * added function pdbxonelettercode() which returns a string like AA(MSE)H…, for entitypoly.pdbxseqonelettercode * new functions expandoneletter() and expandonelettersequence() that take ResidueKind.AA/RNA/DNA as argument replaced expandproteinoneletter*() * adjusted weights in alignsequencetopolymer() * added function assignbestsequences() * PDB reading: added Structure::terstatus flag to indicate if TER records were: absent, present, clearly in wrong places * experimental (not documented yet) new functions: Model::getcra(), Model::getparentof() * Topo::Bond stores a flag for bonds between different symmetry images * ChemComp::Atom: store _chemcompatom.altatomid as oldid, use it in new function updateoldatomnames() * riding hydrogens: added H had wrong occupancy in special, rare cases * added Vec3f – Vec3 with single-precision numbers * minor API changes: Binner::setup() doesn't return anything, changed argument types of Scaling::scaledata(), align_sequences()

Program * new tool gemmi-diff that compares categories and tags in two (mm)CIF files * gemmi-align prints vertical list with option --verbose * gemmi-residues has new options: -e, -sss, --chains * gemmi-rmsz: added option --missing to print missing atoms * gemmi-validate: more options for validating monomer files * gemmi-h: more options * gemmi-mtz: prints info about SYMM records

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 2 years ago

GEMMI - 0.6.3

  • new: normalization of amplitudes using so-called "Karle" approach, similar as in the CCP4 program ECALC
  • added X-ray scattering coefficients for ions (previously, the charge of atom was ignored)
  • pdb: reading CONECT records, and an option to also write them
  • when reading pdb, if any chain has 2+ TER records, all TER records are ignored
  • more configuration options for writing pdb files
  • added functions Mtz::expandtop1() and Mtz::readfilegz()
  • cif::Block::find_value(tag) now returns also value from the corresponding loop if that loop has only one row
  • changes in gemmi-validate related to validation with DDL2
  • gemmi-sfcalc: added option --sigma-cutoff
  • gemmi sf2map --mapmask: if the unit cells in coordinate file is different than in SF file, use only the latter
  • improved transformtoassembly(), expandncs() and renamechain()
  • cif2mtz: Mtz column for pdbx_DELPHWT has now label PHDELWT (#272)
  • fixed ensure_asu(): phase-shift (for phases and H-L coefficients) was wrong
  • fixed UnitCell::findnearestimage() for non-crystals with NCS
  • fixed DensityCalculator::requestedgridspacing()
  • changes and enhancements in addchemcompto_block(), in solvent masking, in mtz2cif, and in several other places
  • added python bindings to MtzToCif, cif::Ddl, PdbWriteOptions, changed how options for PDB writing are passed, more bindings for Mtz::Batch

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 2 years ago

GEMMI - 0.6.2

  • a number of fixes, mostly in topology preparation
  • support for extended (longer) CCD and PDB codes that are about to be introduced by the PDB
  • gemmi-convert: added option to rename a monomer
  • a few changes and additions in cif2mtz, including:
    • anomalous data written as separate rows for F+ and F- is now converted as expected
    • refln.Fsquaredmeas is now a synonym for Fsquared_meas
  • gemmi-grep: new option --only-tags
  • gemmi-validate: a couple of new checks and options
  • pdb and mmCIF: convert MODRES <-> pdbxstructmodresidue
  • cif.Block: blocks with no name (just data_) used to have the name set to "#", now it's " "

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 3 years ago

GEMMI - 0.6.1

  • changed how CISPEP is stored: previously, it was assumed that a link between two residues is either TRANS or CIS; if the residues have atoms with alternative conformations, both link types can be present at the same time
  • riding hydrogens: previously, hydrogens had the same altlocs as the parent atom; now if the parent atom has a single conformation, but it has neighbors in multiple conformations, the hydrogens will be also added in multiple conformations
  • major changes in NearestNeighbor: now it's possible to search atoms not only in the first nearest cells, but in any number of nearest cells; findnearestatom() was changed to find, by default, a nearest atom within any distance
  • changes in Mtz.reindex(), primarily to fix determination of the new space group
  • gemmi-convert: added option --all-auth to write _atom_site.auth_atom_id and auth_comp_id, which are skipped by default (because they are always the same as labelid)
  • added more options to gemmi-rmsz, gemmi-xds2mtz, gemmi-cif2mtz
  • fix a recent regression in checkpolymertype(): RNA was returned instead of DNA
  • improved heuristic of detecting where the polymer ends (if TER record is missing)
  • selection syntax: fix parsing a single sequence id such as "A/208" (it was parsed as A/208-/)
  • removed SMat33::calculateeigenvector() – use eigendecomposition() instead; SoftwareItem::pdbx_ordinal; NeighborSearch::Mark::x,y,z (use ::pos)
  • more code was moved from headers to src/*.cpp

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 3 years ago

GEMMI - 0.6.0

  • C++ library is no longer header-only, several function were moved from headers to src/ to make compilation faster
  • major changes in cmake build, requiring now cmake 3.15+
  • improvements in calculating riding hydrogen positions
  • changed again the scheme of automatically assigned subchain names (A-p -> Axp, because PDB software can't handle non-alphanumeric characters there)
  • a function for calculating polarization correction for XDS INTEGRATE.HKL
  • improvements in xds2mtz, converting more data and filling more records in MTZ batch headers
  • added SpaceGroup::changeofhand_op()
  • various bug fixes and small improvements

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 3 years ago

GEMMI - 0.5.8

  • gemmi program has new subcommand xds2mtz that converts from XDS_ASCII to multirecord MTZ
  • subcommand gemmi-residues has new option -s (--short) for shorter overview of model chains (can be used twice)
  • cif2mtz: more flexible spec for converting symbols to numbers
  • preparation of Refmac intermediate files – it can be used to substitute a part of Refmac
  • MonLib and Topo: a number of changes related to reading a monomer library and in prepare_topology()
  • changed the scheme of automatically assigned subchain names (Apoly -> A-p)
  • readstructure(): added optional arg `savedoc` that stores cif.Document if the read file is mmCIF or mmJSON
  • reading PDB files: more metadata is read by default
  • writing mmCIF files: atomsite.group_PDB is written by default
  • support for mmCIF extension atomsite.ccp4deuteriumfraction
  • added function copyfrommmdb(mmdb::Manager* manager) -> Structure
  • improved checkpolymertype()
  • Grid::setsizefrom_spacing() with different rounding modes
  • rename src/ to prog/; in the next version the library won't be fully header-only, cpp files will go into src/

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 3 years ago

GEMMI - 0.5.7

  • new functions for working with Structure: assignserialnumbers(), assigncisflags(), has_hydrogen()
  • enhanced transformtoassembly()
  • functions countatomsites() and count_occupancies() now take Selection as an optional arg
  • deprecated counthydrogensites(): can be replaced with hashydrogen() or countatom_sites(Selection("[H,D]"))
  • selection syntax extended with ";polymer" and ";solvent" (it can also be "!polymer,solvent")
  • improved preparation of the intermediate file (crd) for Refmac
  • gemmi program: several options were added to subprograms
  • Ccp4.setup() now returns void (previously – number)
  • python: cif.Block.find_pair() and Item.pair were changed to return tuple (previously – list)

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 3 years ago

GEMMI - 0.5.6

  • calculating ASU brick for given space group settings
  • add reciprocal ASU definitions used by TNT
  • add option style to Op.triplet()
  • add cif.Block.set_pairs() – it adds name-value pairs to a given category
  • various bug fixes, as usual
  • Python 2.7 is no longer listed as supported

Scientific Software - Peer-reviewed - C++
Published by wojdyr almost 4 years ago

GEMMI - 0.5.5

  • determining lattice symmetry and (psuedo-)merohedral twinning laws
  • added GroupOps::derivesymmorphic() and GroupOps::addinversion()
  • getting change-of-basis in the Niggli reduction
  • added transformtoassembly() – it does a bit more than make_assembly()
  • removed Op::negated()
  • python: add bindings to UnitCell.orth and UnitCell.frac
  • additional options in gemmi-mtz2cif and gemmi-cif2mtz
  • bug fixes (in Ccp4::setextent(), mergeatomsinexpanded_model(), status column in mtz2cif)

Scientific Software - Peer-reviewed - C++
Published by wojdyr almost 4 years ago

GEMMI - 0.5.4

  • new sub-program under development: gemmi-prep
  • parse pdb AUTHOR record and corresponding mmCIF category
  • new function floodfillabove()
  • mtz2cif: rhombohedral groups in hexagonal settings start with H now
  • renamed MonLib::findlink() to getlink(), modified MonLib::match_link()
  • CIF files can be written with left-aligned columns
  • bug fixes and minor changes (in particular in class Topo)

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 4 years ago

GEMMI - 0.5.3

  • API change: changed arguments for Ccp4.setup()
  • added Binner class for analyzing reflections in resolution bins
  • added support for neutron scattering in gemmi-sfcalc and the library
  • new functions: seitztoop, normalizegrid, Grid.getsubarray, Grid.set_subarray
  • minor improvements in programs mtz2cif, cif2mtz, map
  • fix: CIF tags are now case insensitive
  • changes in preparing topology accounting for changes in Refmac and CCP4 monomer library

Scientific Software - Peer-reviewed - C++
Published by wojdyr about 4 years ago

GEMMI - 0.5.2

  • various bug fixes and small improvements, many of them for the MTZ -> mmCIF conversion
  • new and documented functions: cif.Table.moverow(), ResidueInfo.fastacode(), GroupOps.addmissingelements(), NeighborSearch.add_chain()
  • added customizable form factors for unknown element (X) #167
  • improved reading of unusual PDB files #169
  • calculate_superposition(): added option to use backbone only
  • calculating RMSD without superposition is a separate function now: calculatecurrentrmsd()
  • simplified MaskedGrid, Grid.asu() renamed to masked_asu()
  • renamed Intensities.Type to DataType
  • a few experimental functions (not documented and not sufficiently tested yet):
    • readfirstblock_gz() – read only first block of CIF without reading the whole file
    • Mtz.ensure_asu() – changes Miller indices together with phase shifts and anomalous data swapping
    • interpolategridofalignedmodel() – for grid interpolation based on model alignment (currently global alignment only, local alignment such as in PanDDA is not implemented yet)
    • checkdatatypeundersymmetry() – for checking if mmCIF refln category contains mean, anomalous or unmerged data

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 4 years ago

GEMMI - 0.5.1

  • various bug fixes
  • support for huge MTZ files (>2GB) using the format extension added in CCP4 8.0 – implemented by Claus
  • better handling of NaNs in maps
  • option to detect the format of a coordinate file (format=CoorFormat.Detect)
  • major changes in class Topo (which is undocumented yet)
  • store atomsite.labelentityid from mmCIF as Residue::entity_id
  • two more tabulated space group settings: C 4 2 2 and C 4 2 21
  • new functions Mtz.copycolumn() and Mtz.replacecolumn(), SmallStructure.makecifblock()
  • and a number of minor additions including: Grid.clone(), Grid.changevalues(), Model.remove*(), transformposandadp(), MonLib.monliblist, SmallStructure.wavelength, UnitCell.approx(), SMat33.multiply(), Structure.inputformat
  • new options in the command-line program: gemmi mtz --histogram, gemmi cif2mtz --add

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 4 years ago

GEMMI - 0.5.0

  • bug fixes
  • Selling-Delaunay reduction
  • gemmi program: enhancements in mtz2cif, cif2mtz and merge
  • SymImage renamed to NearestImage, its properties has also changed
  • calculate_superposition(): optional outlier rejection (inspired by PyMOL)

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 4 years ago

GEMMI - 0.4.9

  • centred to primitive lattce transformation
  • Niggli reduction and Buerger reduction for lattice basis
  • added UnitCell::is_compatible_with_spacegroup()
  • Topology-related improvements (Keitaro)
  • functions SMat33::elements() and change_basis() were replaced – each was replaced with two new functions
  • more options in gemmi-merge
  • more Python bindings
  • various fixes, in particular in validation in gemmi mtz2cif --depo.

Scientific Software - Peer-reviewed - C++
Published by wojdyr over 4 years ago

GEMMI - 0.4.8

  • reindexing MTZ files,
  • tricubic map interpolation,
  • extended selection syntax,
  • mergeatomsinexpandedmodel() – sort out atoms on special positions after expanding NCS or making bioassembly,
  • IT92::normalize() – makes exactly the same scattering coefficients that Refmac uses,
  • Mtz::remove_column(),
  • understanding triplets with x/N ("h/2+k/2, -h/2+k/2, l"),
  • better inference of elements in a PDB file with missing element field,
  • Topo::Force was renamed to Topo::Rule,
  • various other additions and, most importantly, bug fixes.

Scientific Software - Peer-reviewed - C++
Published by wojdyr almost 5 years ago

GEMMI - 0.4.7

Python porting note: CIF parsing may raise ValueError; opening a file may rise IOError. Previously it was RuntimeError.

Scientific Software - Peer-reviewed - C++
Published by wojdyr almost 5 years ago