Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: BigBuildBench
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 933 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Bioframe: Operations on Genomic Interval Dataframes

CI pre-commit.ci status Docs status Paper Zenodo Slack NumFOCUS

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.

Bioframe is built directly on top of Pandas. Bioframe provides:

  • A variety of genomic interval operations that work directly on dataframes.
  • Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
  • Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.

Read the documentation, including the guide, as well as the publication for more information.

Bioframe is an Affiliated Project of NumFOCUS.

Installation

Bioframe is available on PyPI and bioconda:

sh pip install bioframe

Contributing

Interested in contributing to bioframe? That's great! To get started, check out the contributing guide. Discussions about the project roadmap take place on the Open2C Slack and regular developer meetings scheduled there. Anyone can join and participate!

Interval operations

Key genomic interval operations in bioframe include: - overlap: Find pairs of overlapping genomic intervals between two dataframes. - closest: For every interval in a dataframe, find the closest intervals in a second dataframe. - cluster: Group overlapping intervals in a dataframe into clusters. - complement: Find genomic intervals that are not covered by any interval from a dataframe.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.

To overlap two dataframes, call: ```python import bioframe as bf

bf.overlap(df1, df2) ```

For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call: ```python import bioframe as bf

bf.merge(df1) ```

For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See the guide for visualizations of other interval operations in bioframe.

File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s readcsv/readtable but provides a schema argument to populate column names for common tabular file formats.

python jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz' ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)

Tutorials

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Citing

If you use bioframe in your work, please cite:

bibtex @article{bioframe_2024, author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey}, doi = {10.1093/bioinformatics/btae088}, journal = {Bioinformatics}, title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}}, year = {2024} }

Owner

  • Name: BigBuildBench
  • Login: BigBuildBench
  • Kind: organization

abbr. B3, benchmarking the repo-level understanding capability of your LLMs by reconstructing project build-file.

Citation (CITATION.cff)

cff-version: 1.2.0
type: software
title: bioframe
license: MIT
repository-code: 'https://github.com/open2c/bioframe'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Nezar
    family-names: Abdennur
    orcid: 'https://orcid.org/0000-0001-5814-0864'
  - given-names: Geoffrey
    family-names: Fudenberg
    orcid: "https://orcid.org/0000-0001-5905-6517"
  - given-names: Ilya
    family-names: Flyamer
    orcid: "https://orcid.org/0000-0002-4892-4208"
  - given-names: Aleksandra
    family-names: Galitsyna
    orcid: "https://orcid.org/0000-0001-8969-5694"
  - given-names: Anton
    family-names: Goloborodko
    orcid: "https://orcid.org/0000-0002-2210-8616"
  - given-names: Maxim
    family-names: Imakaev
    orcid: "https://orcid.org/0000-0002-5320-2728"
  - given-names: Sergey
    family-names: Venev
    orcid: "https://orcid.org/0000-0002-1507-7460"
abstract: >-
  Bioframe is a library to enable flexible and performant
  operations on genomic interval data frames in Python.
keywords:
  - bioinformatics
  - genomics
  - ranges
  - intervals
  - dataframes
  - pandas
  - numpy
  - Python
identifiers:
  - type: doi
    value: 10.5281/zenodo.3897573
    description: Zenodo
  - type: doi
    value: 10.1101/2022.02.16.480748
    description: bioRxiv preprint
  - type: doi
    value: 10.1093/bioinformatics/btae088
    description: Publication
preferred-citation:
  type: article
  title: "Bioframe: Operations on Genomic Intervals in Pandas Dataframes"
  authors:
    - family-names: Open2C
    - given-names: Nezar
      family-names: Abdennur
      orcid: 'https://orcid.org/0000-0001-5814-0864'
    - given-names: Geoffrey
      family-names: Fudenberg
      orcid: "https://orcid.org/0000-0001-5905-6517"
    - given-names: Ilya
      family-names: Flyamer
      name-suffix: M
      orcid: "https://orcid.org/0000-0002-4892-4208"
    - given-names: Aleksandra
      family-names: Galitsyna
      name-suffix: A
      orcid: "https://orcid.org/0000-0001-8969-5694"
    - given-names: Anton
      family-names: Goloborodko
      orcid: "https://orcid.org/0000-0002-2210-8616"
    - given-names: Maxim
      family-names: Imakaev
      orcid: "https://orcid.org/0000-0002-5320-2728"
    - given-names: Sergey
      family-names: Venev
      orcid: "https://orcid.org/0000-0002-1507-7460"
  journal: Bioinformatics
  year: 2024
  url: "https://doi.org/10.1093/bioinformatics/btae088"
  doi: "10.1093/bioinformatics/btae088"

GitHub Events

Total
  • Pull request event: 1
  • Create event: 5
Last Year
  • Pull request event: 1
  • Create event: 5

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
.github/workflows/publish.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pypa/gh-action-pypi-publish release/v1 composite
pyproject.toml pypi