bioframe

Genomic interval operations on Pandas DataFrames

https://github.com/open2c/bioframe

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 21 committers (4.8%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

bioinformatics dataframes genomic-intervals genomic-ranges genomics ngs-analysis numpy pandas python spatial-join

Keywords from Contributors

sparse ngs hi-c 3d-genome chromatin contact-matrix cooler file-format hdf5 mesh
Last synced: 4 months ago · JSON representation ·

Repository

Genomic interval operations on Pandas DataFrames

Basic Info
  • Host: GitHub
  • Owner: open2c
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 3.14 MB
Statistics
  • Stars: 184
  • Watchers: 10
  • Forks: 34
  • Open Issues: 32
  • Releases: 30
Topics
bioinformatics dataframes genomic-intervals genomic-ranges genomics ngs-analysis numpy pandas python spatial-join
Created over 9 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Bioframe: Operations on Genomic Interval Dataframes

CI pre-commit.ci status Docs status Paper Zenodo Slack NumFOCUS

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.

Bioframe is built directly on top of Pandas. Bioframe provides:

  • A variety of genomic interval operations that work directly on dataframes.
  • Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
  • Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.

Read the documentation, including the guide, as well as the publication for more information.

Bioframe is an Affiliated Project of NumFOCUS.

Installation

Bioframe is available on PyPI and bioconda:

sh pip install bioframe

Contributing

Interested in contributing to bioframe? That's great! To get started, check out the contributing guide. Discussions about the project roadmap take place on the Open2C Slack and regular developer meetings scheduled there. Anyone can join and participate!

Interval operations

Key genomic interval operations in bioframe include: - overlap: Find pairs of overlapping genomic intervals between two dataframes. - closest: For every interval in a dataframe, find the closest intervals in a second dataframe. - cluster: Group overlapping intervals in a dataframe into clusters. - complement: Find genomic intervals that are not covered by any interval from a dataframe.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.

To overlap two dataframes, call: ```python import bioframe as bf

bf.overlap(df1, df2) ```

For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call: ```python import bioframe as bf

bf.merge(df1) ```

For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See the guide for visualizations of other interval operations in bioframe.

File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s readcsv/readtable but provides a schema argument to populate column names for common tabular file formats.

python jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz' ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)

Tutorials

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Citing

If you use bioframe in your work, please cite:

bibtex @article{bioframe_2024, author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey}, doi = {10.1093/bioinformatics/btae088}, journal = {Bioinformatics}, title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}}, year = {2024} }

Owner

  • Name: Open Chromosome Collective
  • Login: open2c
  • Kind: organization
  • Email: open.chromosome.collective@gmail.com

Citation (CITATION.cff)

cff-version: 1.2.0
type: software
title: bioframe
license: MIT
repository-code: 'https://github.com/open2c/bioframe'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Nezar
    family-names: Abdennur
    orcid: 'https://orcid.org/0000-0001-5814-0864'
  - given-names: Geoffrey
    family-names: Fudenberg
    orcid: "https://orcid.org/0000-0001-5905-6517"
  - given-names: Ilya
    family-names: Flyamer
    orcid: "https://orcid.org/0000-0002-4892-4208"
  - given-names: Aleksandra
    family-names: Galitsyna
    orcid: "https://orcid.org/0000-0001-8969-5694"
  - given-names: Anton
    family-names: Goloborodko
    orcid: "https://orcid.org/0000-0002-2210-8616"
  - given-names: Maxim
    family-names: Imakaev
    orcid: "https://orcid.org/0000-0002-5320-2728"
  - given-names: Sergey
    family-names: Venev
    orcid: "https://orcid.org/0000-0002-1507-7460"
abstract: >-
  Bioframe is a library to enable flexible and performant
  operations on genomic interval data frames in Python.
keywords:
  - bioinformatics
  - genomics
  - ranges
  - intervals
  - dataframes
  - pandas
  - numpy
  - Python
identifiers:
  - type: doi
    value: 10.5281/zenodo.3897573
    description: Zenodo
  - type: doi
    value: 10.1101/2022.02.16.480748
    description: bioRxiv preprint
  - type: doi
    value: 10.1093/bioinformatics/btae088
    description: Publication
preferred-citation:
  type: article
  title: "Bioframe: Operations on Genomic Intervals in Pandas Dataframes"
  authors:
    - family-names: Open2C
    - given-names: Nezar
      family-names: Abdennur
      orcid: 'https://orcid.org/0000-0001-5814-0864'
    - given-names: Geoffrey
      family-names: Fudenberg
      orcid: "https://orcid.org/0000-0001-5905-6517"
    - given-names: Ilya
      family-names: Flyamer
      name-suffix: M
      orcid: "https://orcid.org/0000-0002-4892-4208"
    - given-names: Aleksandra
      family-names: Galitsyna
      name-suffix: A
      orcid: "https://orcid.org/0000-0001-8969-5694"
    - given-names: Anton
      family-names: Goloborodko
      orcid: "https://orcid.org/0000-0002-2210-8616"
    - given-names: Maxim
      family-names: Imakaev
      orcid: "https://orcid.org/0000-0002-5320-2728"
    - given-names: Sergey
      family-names: Venev
      orcid: "https://orcid.org/0000-0002-1507-7460"
  journal: Bioinformatics
  year: 2024
  url: "https://doi.org/10.1093/bioinformatics/btae088"
  doi: "10.1093/bioinformatics/btae088"

GitHub Events

Total
  • Create event: 2
  • Release event: 2
  • Issues event: 2
  • Watch event: 13
  • Delete event: 2
  • Issue comment event: 9
  • Push event: 62
  • Pull request event: 17
  • Fork event: 10
Last Year
  • Create event: 2
  • Release event: 2
  • Issues event: 2
  • Watch event: 13
  • Delete event: 2
  • Issue comment event: 9
  • Push event: 62
  • Pull request event: 17
  • Fork event: 10

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 602
  • Total Committers: 21
  • Avg Commits per committer: 28.667
  • Development Distribution Score (DDS): 0.573
Past Year
  • Commits: 31
  • Committers: 6
  • Avg Commits per committer: 5.167
  • Development Distribution Score (DDS): 0.29
Top Committers
Name Email Commits
Nezar Abdennur n****r@g****m 257
Anton Goloborodko g****n@g****m 115
gfudenberg g****g@g****m 88
Geoff Fudenberg g****g@L****l 56
agalitsyna a****a@g****m 30
Sergey Venev s****y@g****m 11
mimakaev m****v@g****m 8
Phlya f****r@g****m 7
pre-commit-ci[bot] 6****] 6
Sameer Abraham s****0@g****m 5
Félix Raimundo g****s@g****m 3
Nilesh Patra n****h@n****o 3
dependabot[bot] 4****] 3
luisdiaz1997 l****3@h****m 2
smit kadvani s****i@g****m 2
George Spracklin g****n@g****m 1
Gökçen Eraslan e****n@g****m 1
Harshit h****6@g****m 1
Isaac Virshup i****p@g****m 1
aafkevandenberg a****g@g****m 1
Thomas Reimonn t****n@u****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 84
  • Total pull requests: 114
  • Average time to close issues: 8 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 26
  • Total pull request authors: 20
  • Average comments per issue: 2.33
  • Average comments per pull request: 0.81
  • Merged pull requests: 94
  • Bot issues: 0
  • Bot pull requests: 16
Past Year
  • Issues: 1
  • Pull requests: 12
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 months
  • Issue authors: 1
  • Pull request authors: 5
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.17
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • golobor (12)
  • sergpolly (10)
  • gfudenberg (10)
  • Phlya (9)
  • nvictus (7)
  • ivirshup (5)
  • penguinpee (4)
  • agalitsyna (3)
  • WANGchuang715 (2)
  • vbchavali (2)
  • mimakaev (2)
  • endrebak (2)
  • skytguuu (2)
  • benjaminbauer (2)
  • marade (1)
Pull Request Authors
  • nvictus (45)
  • gfudenberg (17)
  • pre-commit-ci[bot] (12)
  • dependabot[bot] (7)
  • agalitsyna (6)
  • Manas-7854 (6)
  • gamazeps (6)
  • smitkadvani (4)
  • sergpolly (3)
  • Samia35-2973 (2)
  • harshit148 (2)
  • milandvijay (2)
  • Phlya (2)
  • golobor (2)
  • emdann (1)
Top Labels
Issue Labels
enhancement (33) question (5) bug (5)
Pull Request Labels
dependencies (7) github_actions (5) python (2)

Packages

  • Total packages: 14
  • Total downloads:
    • pypi 6,644 last-month
  • Total docker downloads: 354
  • Total dependent packages: 8
    (may contain duplicates)
  • Total dependent repositories: 5
    (may contain duplicates)
  • Total versions: 74
  • Total maintainers: 4
pypi.org: bioframe

Operations and utilities for Genomic Interval Dataframes.

  • Versions: 30
  • Dependent Packages: 8
  • Dependent Repositories: 5
  • Downloads: 6,644 Last month
  • Docker Downloads: 354
Rankings
Docker downloads count: 1.3%
Dependent packages count: 1.9%
Average: 3.8%
Downloads: 5.4%
Dependent repos count: 6.6%
Maintainers (3)
Last synced: 4 months ago
alpine-edge: py3-bioframe-doc

Pandas utilities for tab-delimited and other genomic data files (documentation)

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Average: 7.2%
Dependent packages count: 14.4%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.22: py3-bioframe-pyc

Precompiled Python bytecode for py3-bioframe

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 10.2%
Stargazers count: 19.6%
Forks count: 21.3%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.22: py3-bioframe

Pandas utilities for tab-delimited and other genomic data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 10.2%
Stargazers count: 19.6%
Forks count: 21.3%
Maintainers (1)
Last synced: 4 months ago
alpine-edge: py3-bioframe

Pandas utilities for tab-delimited and other genomic data files

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 13.3%
Average: 14.8%
Stargazers count: 21.2%
Forks count: 24.7%
Maintainers (1)
Last synced: 4 months ago
alpine-edge: py3-bioframe-pyc

Precompiled Python bytecode for py3-bioframe

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 13.3%
Average: 14.8%
Stargazers count: 21.2%
Forks count: 24.7%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.21: py3-bioframe-pyc

Precompiled Python bytecode for py3-bioframe

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.21: py3-bioframe-doc

Pandas utilities for tab-delimited and other genomic data files (documentation)

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.22: py3-bioframe-doc

Pandas utilities for tab-delimited and other genomic data files (documentation)

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.21: py3-bioframe

Pandas utilities for tab-delimited and other genomic data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.19: py3-bioframe

Pandas utilities for tab-delimited and other genomic data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.19: py3-bioframe-pyc

Precompiled Python bytecode for py3-bioframe

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago
alpine-v3.20: py3-bioframe-pyc

Precompiled Python bytecode for py3-bioframe

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.20: py3-bioframe

Pandas utilities for tab-delimited and other genomic data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/publish.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
pyproject.toml pypi
  • matplotlib *
  • numpy >=1.10
  • pandas >=1.3
  • pyyaml *
  • requests *
  • typing-extensions python_version<'3.9'