Releases | Open Source Science

semibin - Version 2.1.0

Main new feature is adding support for using output of strobealign-aemb. Use of the SemiBin command (instead of SemiBin2) will continue to work, but print a warning and set a delay to ask users to upgrade.

Full ChangeLog

SemiBin: Support running SemiBin with strobealign-aemb (--abundance/-a)
citation: Add citation subcommand
SemiBin1: Introduce separate SemiBin1 command
internal: Code simplification and refactor
deprecation: Deprecate --orf-finder=fraggenescan option
Update abundance normalization
SemiBin: do not use more processes than can be taken advantage of (#155)

- Python
Published by luispedro about 2 years ago

semibin - Version 2.0.2

Minor bugfix release (#128)

- Python
Published by luispedro over 2 years ago

semibin - Version 2.0.1

Fix bugs in v2.0.0, mainly with the SemiBin2 command and argument passing.

Full ChangeLog: * trainself: Fix bug with --mode * concatenatefasta: Fix bug with compression * bin_short: Make alias work

- Python
Published by luispedro over 2 years ago

semibin - Version 2.0.0

Effectively a minor release, turning the SemiBin2 beta into a full SemiBin2 release and soft-deprecating SemiBin1.

Full ChangeLog:

SemiBin: Better error checking throughout
SemiBin: Write a log file
concatenate_fasta: support compression
concatenate_fasta: slightly better error message when contig ID already contains separator
SemiBin: add bin_short as alias for bin

- Python
Published by luispedro over 2 years ago

semibin - Version 1.5.1

Fix use of --no-recluster with multieasybin, see #128

- Python
Published by luispedro about 3 years ago

semibin - Version 1.5.0: SemiBin2 beta

Big change is the addition of a SemiBin2 script, which is still experimental, but should be a slightly nicer interface.

User-visible improvements since v1.4.0

Added a new option for ORF finding, called fast-naive which is an internal very fast implementation.
Added the possibility of bypassing ORF finding altogether by providing prodigal outputs directly (or any other gene prediction in the right format)
Command line argument checking is more exhaustive instead of exiting at first error
Added --quiet flag to reduce the amount of output printed
Better --help (group required arguments separately)
Add --output-compression option to compress outputs
Add --tag-output option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at #123.
Add contig->bin mapping table (#123)
SemiBin.main.main1 and SemiBin.main.main2 can now be called as a function with command line arguments (main1 corresponds to SemiBin1 and main2 corresponds to SemiBin2)

```python
import SemiBin.main

...

SemiBin.main.main2(['singleeasybin', '--input-fasta', ...])
```

- Python
Published by luispedro over 3 years ago

semibin - Version 1.4.0: long read binning

Big change is the added binning algorithm for assemblies from long-read datasets.

The overall structure of the pipeline is still similar to what was manuscript, but when clustering, it does not use infomap, but another procedure (an iterative version of DBSCAN).

Use the flag --sequencing-type=long_read to enable an alternative clustering that works better with long reads.

Other user-visible improvements

Better error checking at multiple steps in the pipeline so that processes that will crash are caught as early as possible
Add --allow-missing-mmseqs2 flag to check_install subcommand (eventually, self-supervision will be the default and mmseqs2 will be an optional dependency)

Command line parameter deprecations

The previous arguments should continue to work, but going forward, the newer arguments are probably a better API.

Selecting self-supervised learning is now done with the --self-supervised flag (instead of --training-type=self)
Training from multiple samples is now enabled with the --train-from-many flag (instead of --mode=several)

Bugfixes

The output table sometimes had the wrong path in v1.3. This has been fixed
Prodigal is now run in a more robust manner when using multiple threads (#106)

- Python
Published by luispedro over 3 years ago

semibin - Version 1.3.1

Version 1.3.0 erroneously made --training-type mandatory.

We had intended to keep backwards compatibility with previous versions and v1.3.1 fixes that.

- Python
Published by luispedro over 3 years ago

semibin - Version 1.3.0

Introduces self-supervised learning! This is optional (for now, will become default in SemiBin2), but can achieve better results. See the docs on training SemiBin models for more information).

Also, fixes a few minor bugs, namely bin names in the output table and renames one command line argument from the mispelled --epochs (instead of a misspelling).

- Python
Published by luispedro over 3 years ago

semibin - Version 1.2.0

Big change is adding a new chicken caecum prebuilt model (courtesy of Florian Plaza Oñate), but also better outputs.

Full ChangeLog

Pretrained model from chicken caecum
Output table with basic information on bins (including N50 & L50)
When reclustering is used (default), output the unreclusted bins into a directory called output_prerecluster_bins
Added --verbose flag and silented some of the output when it is not used
Use coloredlogs (if package is available)

- Python
Published by luispedro over 3 years ago

semibin - Version 1.1.1

Completely remove use of atomicwrites package.

- Python
Published by luispedro over 3 years ago

semibin - Version 1.1.0

User-visible improvements

Support .cram format input (#104)
Support using depth file from Metabat2 (#103)
More flexible specification of prebuilt models (case insensitive, normalize - and _)
Better output message when no bins are produced

Bugfixes

Fix bug using atomicwrite on certain network filesystems (#97)

Internal improvements

Remove torch version restriction (and test on Python 3.10)

- Python
Published by luispedro over 3 years ago

semibin - Version 1.0.3

Bugfix release

Fix coverage parsing when value is not an integer (#103)
Fix multi_easy_bin with taxonomy file given on the command line

Full Changelog: https://github.com/BigDataBiology/SemiBin/compare/v1.0.2...v1.0.3

- Python
Published by luispedro almost 4 years ago

semibin - Version 1.0.2

Bugfix release

Completely fixes (#93) (see also #101)

- Python
Published by luispedro almost 4 years ago

semibin - Version 1.0.1

Bugfix release (fixes #93)

- Python
Published by luispedro about 4 years ago

semibin - Version 1.0.0

Released April 29 2022

This coincides with the publication of the manuscript.

User-visible improvements

More balanced file split when calling prodigal in parallel should take better advantage of multiple threads
Fix bug when long stretches of Ns are present (#87)
Better error messages (#90 & #91)

Bugfixes

Fix bugs in training from multiple samples
Fix bug in incorporating CAT results

Full Changelog: https://github.com/BigDataBiology/SemiBin/compare/v0.7.0...v1.0.0

- Python
Published by luispedro about 4 years ago

semibin - Version 0.7.0

Full support for Mac OS X is the big change.

Full ChangeLog:

Improve check_install command by printing out paths and correctly handling optionality of FragGeneScan/prodigal
Reuse markers.hmmout to make the training from several samples faster
Add option --tmpdir to set temporary directory
Substitute FragGeneScan with Prodigal (FragGeneScan can still be used with --orf-finder parameter)
Add 'concatenate_fasta' command to combine fasta files for multi-sample binning

- Python
Published by luispedro about 4 years ago

semibin - Version 0.6.0

Version 0.6

Released February 7 2022

User-visible improvements

Provide pretrained models from soil, cat gut, human oral,pig gut, mouse gut, built environment, wastewater and global (training from all samples).
Users can now pass in the output of running mmseqs2 directly and SemiBin will use that instead of calling mmseqs itself (use option --taxonomy-annotation-table).
The subcommand to generate cannot links is now called generate_cannot_links. The old name (predict_taxonomy) is kept as a deprecated alias.
Similarly, sequence features (k-mer and abundance) are generated using the commands generate_sequence_features_single and generate_sequence_features_multi (for single- and multi-sample modes, respectively). The old names generatedatasingle/generatedatamulti`) are kept as deprecated aliases.
Add check_install command and run check_install before easy command

Bugfixes

Fix bug with non-standard characters in sample names (#68).

New Contributors

@SvetlanaUP made their first contribution in https://github.com/BigDataBiology/SemiBin/pull/60

- Python
Published by luispedro over 4 years ago

semibin - Version 0.5.0

Version 0.5

Released January 7 2022

User-visible improvements

Reclustering is now the default (use --no-recluster to disable it; the option --recluster is deprecated and ignored) as the computational costs are much lower
GTDB lazy downloading is now performed even if a non-standard directory is used
The CACHEDIR.TAG protocol was implemented (this is supported by several tools that perform tasks such as backups).

Bugfixes

Fix bug with --min-len (minimal length). Previously, only contigs greater than the given minimal length were used (instead of greater-equal to the minimal length).
GTDB downloading was inconsistent in a few instances which have been fixed

Internal improvements

Much more efficient code (including lower memory usage) for binning, especially if a pretrained model is used. As an example, using a deeply-sequenced ocean sample, generating the data (generate_data_single step) goes down from 14 to 9 minutes; while binning (bin step, using --recluster) goes down from 10m17s (using 20GB of RAM, at peak) to 4m33 (using 4.5 GB, at peak). Thus total time from BAM file to bins went down from 25 to 14 minutes (using 4 threads) and peak RAM is now 4.5GB, making it usable on a typical laptop.

- Python
Published by luispedro over 4 years ago

semibin - Version 0.4.0

Add support for .xz FASTA files as inputs
Removed BioPython dependency
Fixed bug in FASTA unzipping
Fixed bug in multi-sample data splitting

- Python
Published by luispedro over 4 years ago

semibin - Version 0.3.0

Version 0.3

Release 10 August 2021

User-visible improvements

Support training from several samples
Remove output_bin_path if output_bin_path exists
Make several internal parameters configuable: (1) minimum length of contigs to bin (--min-len parameter); (2) minimum length of contigs to break up in order to generate must-link constraints (--ml-threshold parameter); (3) the ratio of the number of base pairs of contigs between 1000-2500 bp smaller than this value, the minimal length will be set as 1000bp, otherwise 2500bp (--ratio parameter).
Add -p argument for predict_taxonomy mode

Internal improvements

Better code overall
Fix np.concatenate warning
Remove redundant matrix when clustering
Better pretrained models
Faster calculating dapth using Numpy
Use correct number of threads in kneighbors_graph()

Bugfixes

Respect number of threads (-p argument) when training (issue 34)

- Python
Published by luispedro almost 5 years ago

semibin - Version 0.2.0

First release under the name SemiBin

Full ChangeLog (see also https://semibin.readthedocs.io/en/latest/whatsnew/):

Change name to SemiBin
Add support for training with several samples
Test with Python 3.9
Download mmseqs database with --remove-tmp-file 1
Better output names
Fix bugs when paths have spaces
Fix installation issues by listing all the dependencies
Add download_GTDB command
Add --no-recluster option
Add --environment option
Add --mode option
All around more robust code by including more error checking & testing
Better built-in models

- Python
Published by luispedro about 5 years ago

semibin - Release 0.1.1

First release.

Can perform both single-sample and multi-sample binning.

- Python
Published by luispedro about 5 years ago

Recent Releases of semibin

semibin - Version 2.1.0

semibin - Version 2.0.2

semibin - Version 2.0.1

semibin - Version 2.0.0

semibin - Version 1.5.1

semibin - Version 1.5.0: SemiBin2 beta

User-visible improvements since v1.4.0

semibin - Version 1.4.0: long read binning

Other user-visible improvements

Command line parameter deprecations

Bugfixes

semibin - Version 1.3.1

semibin - Version 1.3.0

semibin - Version 1.2.0

semibin - Version 1.1.1

semibin - Version 1.1.0

User-visible improvements

Bugfixes

Internal improvements

semibin - Version 1.0.3

semibin - Version 1.0.2

Bugfix release

semibin - Version 1.0.1

semibin - Version 1.0.0

User-visible improvements

Bugfixes

semibin - Version 0.7.0

semibin - Version 0.6.0

Version 0.6

User-visible improvements

Bugfixes

New Contributors

semibin - Version 0.5.0

Version 0.5

User-visible improvements

Bugfixes

Internal improvements

semibin - Version 0.4.0

semibin - Version 0.3.0

Version 0.3

User-visible improvements

Internal improvements

Bugfixes

semibin - Version 0.2.0

semibin - Release 0.1.1