btllib
btllib: A C++ library with Python interface for efficient genomic sequence processing - Published in JOSS (2022)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
📚Bioinformatics Technology Lab common code library
Basic Info
Statistics
- Stars: 25
- Watchers: 13
- Forks: 6
- Open Issues: 33
- Releases: 44
Topics
Metadata Files
README.md
Bioinformatics Technology Lab common code library in C++ with Python wrappers.
Platforms
- Linux
- MacOS
Installation for users
The recommended way is to download using Conda package manager:
conda install -c bioconda -c conda-forge btllib
Alternatively, you can compile the code from source. Download btllib-$VERSION.tar.gz from the GitHub latest release where $VERSION is the latest btllib version and do the following:
- tar xzf btllib-$VERSION.tar.gz to extract the source code.
- Have the dependencies ready:
* GCC 6+ or Clang 5+ (with OpenMP and C++17 support)
* Python 3.9+
* Meson and Ninja Python3 packages, CMake (If not available, these will be automatically installed to a temporary directory.)
- Run btllib/compile
* This will install btllib in the btllib/install directory. You can provide the --prefix parameter to change this.
* The C++ compiler must be the same as the one used for compiling Python. E.g. if you installed Python using a package manager, you should use the C++ compiler from the same package manager. You can change the compiler by exporting the CXX environment variable to point to the compiler before running btllib/compile.
* You can optionally run python3 -m pip install $PREFIX/lib/btllib/python afterwards to install the Python package. The Python wrappers are usable even without this step. $PREFIX is the path where btllib is installed.
Using the library
- Run time dependencies:
- SAMtools for reading SAM, BAM, and CRAM files.
- gzip, tar, pigz, bzip2, xz, lrzip, zip, and/or 7zip for compressing/decompressing files. Not all of these are necessary, only the ones whose compressions you'll be using.
- Note that lrzip is not available on the btllib conda osx-arm64 build
- wget for downloading sequences from a URL.
- Building C++ code (
$PREFIXis the path where btllib is installed):- Link your code with
$PREFIX/lib/libbtllib.a(pass-L $PREFIX/lib -l btllibflags to the compiler).- You can do so by typing the following in your console:
export CPPFLAGS="-isystem /path/to/btllib/install/include $CPPFLAGS"export LDFLAGS="-L/path/to/btllib/install//lib -lbtllib $LDFLAGS"
- You can do so by typing the following in your console:
#includeany header from the$PREFIX/includedirectory (pass-I $PREFIX/includeflag to the compiler).btllibusesC++11features, so that standard should be enabled at a minimum.
- Link your code with
- Running Python code:
- The Python used to import btllib must be the same as the one used to compile the library. Specifically, btllib uses
python3-configto determine the flags used for compilation. Runningpython3-config --exec-prefixwill give the path to the Python installation that needs to be used. Thepython3executable can be found at$(python3-config --exec-prefix)/bin/python3. - The wrappers correspond one-to-one with C++ code so any functions and classes can be used under the same name. The only exceptions are nested classes which are prefixed with outer class name (e.g.
btllib::SeqReader::Flagin C++ versusbtllib.SeqReaderFlagin Python), and (Kmer)CountingBloomFilter which providesCountingBloomFilter8,CountingBloomFilter16,CountingBloomFilter32,KmerCountingBloomFilter8,KmerCountingBloomFilter16,CountingBloomFilter32with counters 8, 16, and 32 bits wide. - If you compiled btllib from source code and didn't install the Python wrappers, you can use
PYTHONPATHenvironment variable orsys.path.append()in your Python code to include$PREFIX/lib/btllib/python/btllibdirectory to make btllib available to the interpreter. - Include the library with
import btllib
- The Python used to import btllib must be the same as the one used to compile the library. Specifically, btllib uses
- Executables
- btllib generated executables can be found in
$PREFIX/bindirectory. Append that path to thePATHenvironment variable to make it available to your shell.
- btllib generated executables can be found in
Documentation
For btllib developers
- Initial setup:
git clone --recurse-submodules https://github.com/bcgsc/btllibin order to obtain all the code.- In
btllibdir, runmeson buildto create a build directory.
- Every time you want to run tests, in the
builddir:ninja wrapto regenerate wrappers.ninja testto build wrappers and tests, and run tests.
- Before making a pull request, in the
builddir:ninja quality-assuranceto make sure all CI tests pass.- Make a commit after the above step, in case it has made any changes to wrappers or formatting. Don't commit the changes made to the
sdsl-litesubproject. Meson config file adjusts thesdsl-liteconfig in order for it to work forbtllib, but this is done ad hoc and is not necessary to be committed. By doing it ad hoc we keep a list of differences compared to the upstream repository.
- Before making a release, in the
builddir:- Do the same as for a pull request and
ninja docsto regenerate docs to reflect the release and then commit the changes.meson dist --allow-dirtyto generate a self-contained package based on the last commit.--allow-dirtypermits making a distributable with uncommited changes. This is necessary assdsl-litedependency has ad hoc changes made during the build process. The resulting distributable will be compressed with xz. For easier use, decompress it and then compress with gzip. Attach the resulting file to the release.
The following are all the available ninja commands which can be run within build directory:
- ninja clang-format formats the whitespace in code (requires clang-format 8+).
- ninja wrap wraps C++ code for Python (requires SWIG ≥4.0 and <4.3).
- ninja clang-tidy runs clang-tidy on C++ code and makes sure it passes (requires clang-tidy 8+).
- ninja builds the tests and wrapper libraries / makes sure they compile.
- ninja test runs the tests.
- ninja code-coverage assures code coverage threshold is satisfied. (requires gcovr 3.3+)
- ninja sanitize-undefined runs undefined sanitization.
- ninja test-wrappers tests whether wrappers work.
- ninja docs generates code documentation from comments (requires Doxygen).
- ninja quality-assurance runs clang-format, wrap, clang-tidy, test, code-coverage, sanitize-undefined, and test-wrappers. These are all checked at the CI test.
Credits
- Author: Vladimir Nikolic
- Components:
- Hamid Mohamadi and Parham Kazemi for ntHash
- Justin Chu for MIBloomFilter
- Johnathan Wong for aaHash
- Included dependencies:
- Chase Geigle for cpptoml
- Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri for sdsl-lite
Citing
If you use btllib in your research, please cite:
Nikolić et al., (2022). btllib: A C++ library with Python interface for efficient genomic sequence processing. Journal of Open Source Software, 7(79), 4720, https://doi.org/10.21105/joss.04720
If you use aaHash in your research, please cite:
Wong et al., (2023). aaHash: recursive amino acid sequence hashing. Bioinformatics Advances, vbad162, https://doi.org/10.1093/bioadv/vbad162.
Owner
- Name: BC Cancer Canada's Michael Smith Genome Sciences Centre
- Login: bcgsc
- Kind: organization
- Email: rwarren@bcgsc.ca
- Location: Vancouver, BC, Canada
- Website: http://www.bcgsc.ca/
- Repositories: 86
- Profile: https://github.com/bcgsc
Sequencing centre for genomics and bioinformatics research. NOTE 01/2019: FTP now closed. Cited data available: http://www.bcgsc.ca/downloads/supplementary
JOSS Publication
btllib: A C++ library with Python interface for efficient genomic sequence processing
Authors
Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada, Bioinformatics Graduate Program, The University of British Columbia, Vancouver, BC, Canada
Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada, Bioinformatics Graduate Program, The University of British Columbia, Vancouver, BC, Canada
Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada
Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada, Bioinformatics Graduate Program, The University of British Columbia, Vancouver, BC, Canada
Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, BC, Canada, Bioinformatics Graduate Program, The University of British Columbia, Vancouver, BC, Canada
Tags
C++ bioinformatics algorithms data structures genome bloom filterCitation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Nikolić"
given-names: "Vladimir"
orcid: "https://orcid.org/0000-0002-2992-9935"
- family-names: Kazemi
given-names: Parham
- family-names: Coombe
given-names: Lauren
orcid: "https://orcid.org/0000-0002-7518-2326"
- family-names: Wong
given-names: Johnathan
- family-names: Afshinfard
given-names: Amirhossein
orcid: "https://orcid.org/0000-0002-6875-4939"
- family-names: Chu
given-names: Justin
- family-names: Warren
given-names: René L.
orcid: "https://orcid.org/0000-0002-9890-2293"
- family-names: Birol
given-names: Inanç
orcid: "https://orcid.org/0000-0003-0950-7839"
title: "btllib"
url: "https://github.com/bcgsc/btllib"
preferred-citation:
type: article
authors:
- family-names: "Nikolić"
given-names: "Vladimir"
orcid: "https://orcid.org/0000-0002-2992-9935"
- family-names: Kazemi
given-names: Parham
- family-names: Coombe
given-names: Lauren
orcid: "https://orcid.org/0000-0002-7518-2326"
- family-names: Wong
given-names: Johnathan
- family-names: Afshinfard
given-names: Amirhossein
orcid: "https://orcid.org/0000-0002-6875-4939"
- family-names: Chu
given-names: Justin
- family-names: Warren
given-names: René L.
orcid: "https://orcid.org/0000-0002-9890-2293"
- family-names: Birol
given-names: Inanç
orcid: "https://orcid.org/0000-0003-0950-7839"
doi: "10.21105/joss.04720"
url: "https://doi.org/10.21105/joss.04720"
year: 2022
publisher: "The Open Journal"
volume: 7
number: 79
pages: 4720
title: "btllib: A C++ library with Python interface for efficient genomic sequence processing"
journal: "Journal of Open Source Software"
GitHub Events
Total
- Create event: 9
- Release event: 2
- Issues event: 2
- Watch event: 2
- Delete event: 3
- Member event: 1
- Issue comment event: 5
- Push event: 16
- Pull request review comment event: 5
- Pull request review event: 13
- Pull request event: 12
- Fork event: 1
Last Year
- Create event: 9
- Release event: 2
- Issues event: 2
- Watch event: 2
- Delete event: 3
- Member event: 1
- Issue comment event: 5
- Push event: 16
- Pull request review comment event: 5
- Pull request review event: 13
- Pull request event: 12
- Fork event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| schutzekatze | v****h@p****m | 399 |
| schutzekatze | v****4@g****m | 195 |
| Parham Kazemi | p****3@g****m | 180 |
| MurathanGoktas | m****s@g****m | 112 |
| Vladimir Nikolić | v****c@b****a | 27 |
| afshinfard | a****d@g****m | 24 |
| Lauren Coombe | l****e@g****m | 14 |
| Vladimir Nikolić | v****0@p****m | 13 |
| JW | 3****e | 12 |
| Johnathan Wong | 3****4 | 2 |
| Vladimir Petko | v****o@c****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 51
- Total pull requests: 80
- Average time to close issues: 5 months
- Average time to close pull requests: 14 days
- Total issue authors: 14
- Total pull request authors: 9
- Average comments per issue: 1.35
- Average comments per pull request: 0.74
- Merged pull requests: 59
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 12
- Average time to close issues: 1 day
- Average time to close pull requests: about 8 hours
- Issue authors: 2
- Pull request authors: 4
- Average comments per issue: 3.33
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- vlad0x00 (21)
- parham-k (6)
- yurivict (5)
- aafshinfard (4)
- lcoombe (3)
- jwcodee (2)
- mxwang66 (2)
- mr-c (1)
- Duda5 (1)
- berkeucar (1)
- chenrui333 (1)
- nileshpatra (1)
- p-linnane (1)
- vpa1977 (1)
Pull Request Authors
- parham-k (30)
- jwcodee (27)
- vlad0x00 (12)
- lcoombe (10)
- aafshinfard (5)
- JackyYiu (4)
- MurathanGoktas (4)
- mr-c (1)
- vpa1977 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
spack.io: btllib
Bioinformatics Technology Lab common code library in C++ with Python wrappers.
- Homepage: https://github.com/bcgsc/btllib
- License: []
-
Latest release: 1.7.5
published 11 months ago