bioc

Data structures and code to read/write BioC XML and Json.

https://github.com/bionlplab/bioc

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
    5 of 9 committers (55.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

bioc bionlp json reader writer xml

Keywords from Contributors

interactive serializer packaging network-simulation hacking autograding observability embedded optim standardization
Last synced: 6 months ago · JSON representation

Repository

Data structures and code to read/write BioC XML and Json.

Basic Info
  • Host: GitHub
  • Owner: bionlplab
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 442 KB
Statistics
  • Stars: 33
  • Watchers: 1
  • Forks: 11
  • Open Issues: 1
  • Releases: 8
Topics
bioc bionlp json reader writer xml
Created over 10 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

bioc - Processing BioC, Brat, and PubTator with Python

Build
status Latest version on
PyPI Downloads License codecov

BioC XML / JSON format can be used to share text documents and annotations.

Brat standoff format is created by the brat annotation tool to store annotations on disk in a standoff format. annotations are stored separately from the annotated document text, which is never modified by the tool.

PubTator format is created by the PutTator Central system.

bioc exposes an API familiar to users of the standard library marshal and pickle modules.

Development of bioc happens on GitHub: https://github.com/bionlplab/bioc

Getting started

Installing bioc

shell $ pip install bioc

BioC

Encoding the BioC collection object collection:

```python from bioc import biocxml

Serialize collection as a BioC formatted stream to fp.

with open(filename, 'w') as fp: biocxml.dump(collection, fp) ```

Decoding the BioC XML file:

```python from bioc import biocxml

Deserialize fp to a BioC collection object.

with open(filename, 'r') as fp: collection = biocxml.load(fp) ```

Brat

Encoding the Brat document

```python from bioc import brat

Serialize doc as a brat formatted stream to text_fp and ann_fp.

with open(annpath, 'w') as annfp, open(txtpath, 'w') as textfp: brat.dump(doc, textfp, annfp) ```

Decoding the Brat document:

```python from bioc import brat

Deserialize two streams (text and ann) to a Brat document object.

with open(annpath) as annfp, open(txtpath) as textfp: doc = brat.load(textfp, annfp) ```

PubTator

Encoding the PubTator document object doc:

```python from bioc import pubtator

Serialize collection as a BioC formatted stream to fp.

with open(filename, 'w') as fp: pubtator.dump([doc], fp) ```

Decoding the PubTator file

```python from bioc import pubtator

Deserialize fp to a PubTator object.

with open(filename, 'r') as fp: docs = pubtator.load(fp) ```

Documentation

You will find complete documentation at our Read the Docs site.

Contributing

You can find information about contributing to bioc at our Contribution page.

Reference

If you use bioc in your research, please cite the following paper:

  • Comeau DC, Doğan RI, Ciccarese P, Cohen KB, Krallinger M, Leitner F, Lu Z, Peng Y, Rinaldi F, Torii M, Valencia V, Verspoor K, Wiegers TC, Wu CH, Wilbur WJ. BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford). 2013;2013:bat064. doi: 10.1093/database/bat064. Print 2013. PMID: 24048470; PMCID: PMC3889917

Acknowledgment

This work is supported by the National Library of Medicine under Award No. 4R00LM013001.

License

Copyright BioNLP Lab at Weill Cornell Medicine, 2023.

Distributed under the terms of the MIT license, bioc is free and open source software.

Owner

  • Name: BioNLP Lab
  • Login: bionlplab
  • Kind: organization
  • Email: yip4002@med.cornell.edu
  • Location: New York City

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 274
  • Total Committers: 9
  • Avg Commits per committer: 30.444
  • Development Distribution Score (DDS): 0.526
Past Year
  • Commits: 7
  • Committers: 2
  • Avg Commits per committer: 3.5
  • Development Distribution Score (DDS): 0.143
Top Committers
Name Email Commits
Yifan Peng y****g@n****v 130
Yifan Peng y****2@m****u 85
Yifan Peng y****g@u****u 36
Yifan Peng p****l@g****m 12
Yifan Peng y****g@c****u 5
Jake Lever j****r@g****m 2
dependabot[bot] 4****] 2
Robert Martin m****o@i****e 1
Nancy Wong n****g@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 11
  • Total pull requests: 12
  • Average time to close issues: 6 months
  • Average time to close pull requests: 17 days
  • Total issue authors: 10
  • Total pull request authors: 5
  • Average comments per issue: 1.91
  • Average comments per pull request: 0.08
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • shamikbose (2)
  • jakelever (1)
  • chlor (1)
  • kmurphy902 (1)
  • prasad676 (1)
  • zerogerc (1)
  • sg-wbi (1)
  • raven44099 (1)
  • pyramid20002000 (1)
Pull Request Authors
  • yfpeng (6)
  • jakelever (2)
  • dependabot[bot] (2)
  • rileynwong (1)
  • mart1nro (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 58,502 last-month
  • Total docker downloads: 1,129
  • Total dependent packages: 10
  • Total dependent repositories: 163
  • Total versions: 39
  • Total maintainers: 1
pypi.org: bioc

bioc - Processing BioC, Brat, and PubTator with Python.

  • Versions: 39
  • Dependent Packages: 10
  • Dependent Repositories: 163
  • Downloads: 58,502 Last month
  • Docker Downloads: 1,129
Rankings
Dependent repos count: 1.2%
Dependent packages count: 1.3%
Downloads: 3.7%
Docker downloads count: 4.8%
Average: 5.6%
Forks count: 10.5%
Stargazers count: 12.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

bak/setup.py pypi
  • docutils >=0.15.2
  • jsonlines >=1.2.0
  • lxml >=4.6.3
docs/requirements.txt pypi
  • myst-parser ==0.16.1
  • sphinx ==4.4.0
  • sphinx_rtd_theme ==1.0.0
requirements.txt pypi
  • docopt *
  • intervaltree *
  • jsonlines >=1.2.0
  • lxml >=4.6.3
  • tqdm *
.github/workflows/pytest.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/validate_schema.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
pyproject.toml pypi
requirements-dev.txt pypi
  • build * development
  • docopt * development
  • intervaltree * development
  • jsonlines >=1.2.0 development
  • lxml >=4.6.3 development
  • pytest * development
  • pytest-cov * development
  • tqdm * development
  • twine * development