open2c_bioframe
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: BigBuildBench
- License: mit
- Language: Python
- Default Branch: master
- Size: 933 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Bioframe: Operations on Genomic Interval Dataframes

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.
Bioframe is built directly on top of Pandas. Bioframe provides:
- A variety of genomic interval operations that work directly on dataframes.
- Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
- Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.
Read the documentation, including the guide, as well as the publication for more information.
Bioframe is an Affiliated Project of NumFOCUS.
Installation
Bioframe is available on PyPI and bioconda:
sh
pip install bioframe
Contributing
Interested in contributing to bioframe? That's great! To get started, check out the contributing guide. Discussions about the project roadmap take place on the Open2C Slack and regular developer meetings scheduled there. Anyone can join and participate!
Interval operations
Key genomic interval operations in bioframe include:
- overlap: Find pairs of overlapping genomic intervals between two dataframes.
- closest: For every interval in a dataframe, find the closest intervals in a second dataframe.
- cluster: Group overlapping intervals in a dataframe into clusters.
- complement: Find genomic intervals that are not covered by any interval from a dataframe.
Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.
To overlap two dataframes, call:
```python
import bioframe as bf
bf.overlap(df1, df2) ```
For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call:
```python
import bioframe as bf
bf.merge(df1) ```
For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See the guide for visualizations of other interval operations in bioframe.
File I/O
Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s readcsv/readtable but provides a schema argument to populate column names for common tabular file formats.
python
jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)
Tutorials
See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.
Citing
If you use bioframe in your work, please cite:
bibtex
@article{bioframe_2024,
author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},
doi = {10.1093/bioinformatics/btae088},
journal = {Bioinformatics},
title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},
year = {2024}
}
Owner
- Name: BigBuildBench
- Login: BigBuildBench
- Kind: organization
- Repositories: 1
- Profile: https://github.com/BigBuildBench
abbr. B3, benchmarking the repo-level understanding capability of your LLMs by reconstructing project build-file.
Citation (CITATION.cff)
cff-version: 1.2.0
type: software
title: bioframe
license: MIT
repository-code: 'https://github.com/open2c/bioframe'
message: >-
If you use this software, please cite it using the
metadata from this file.
authors:
- given-names: Nezar
family-names: Abdennur
orcid: 'https://orcid.org/0000-0001-5814-0864'
- given-names: Geoffrey
family-names: Fudenberg
orcid: "https://orcid.org/0000-0001-5905-6517"
- given-names: Ilya
family-names: Flyamer
orcid: "https://orcid.org/0000-0002-4892-4208"
- given-names: Aleksandra
family-names: Galitsyna
orcid: "https://orcid.org/0000-0001-8969-5694"
- given-names: Anton
family-names: Goloborodko
orcid: "https://orcid.org/0000-0002-2210-8616"
- given-names: Maxim
family-names: Imakaev
orcid: "https://orcid.org/0000-0002-5320-2728"
- given-names: Sergey
family-names: Venev
orcid: "https://orcid.org/0000-0002-1507-7460"
abstract: >-
Bioframe is a library to enable flexible and performant
operations on genomic interval data frames in Python.
keywords:
- bioinformatics
- genomics
- ranges
- intervals
- dataframes
- pandas
- numpy
- Python
identifiers:
- type: doi
value: 10.5281/zenodo.3897573
description: Zenodo
- type: doi
value: 10.1101/2022.02.16.480748
description: bioRxiv preprint
- type: doi
value: 10.1093/bioinformatics/btae088
description: Publication
preferred-citation:
type: article
title: "Bioframe: Operations on Genomic Intervals in Pandas Dataframes"
authors:
- family-names: Open2C
- given-names: Nezar
family-names: Abdennur
orcid: 'https://orcid.org/0000-0001-5814-0864'
- given-names: Geoffrey
family-names: Fudenberg
orcid: "https://orcid.org/0000-0001-5905-6517"
- given-names: Ilya
family-names: Flyamer
name-suffix: M
orcid: "https://orcid.org/0000-0002-4892-4208"
- given-names: Aleksandra
family-names: Galitsyna
name-suffix: A
orcid: "https://orcid.org/0000-0001-8969-5694"
- given-names: Anton
family-names: Goloborodko
orcid: "https://orcid.org/0000-0002-2210-8616"
- given-names: Maxim
family-names: Imakaev
orcid: "https://orcid.org/0000-0002-5320-2728"
- given-names: Sergey
family-names: Venev
orcid: "https://orcid.org/0000-0002-1507-7460"
journal: Bioinformatics
year: 2024
url: "https://doi.org/10.1093/bioinformatics/btae088"
doi: "10.1093/bioinformatics/btae088"
GitHub Events
Total
- Pull request event: 1
- Create event: 5
Last Year
- Pull request event: 1
- Create event: 5
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pypa/gh-action-pypi-publish release/v1 composite