scribl

scribl: A system for the semantic capture of relationships in biological literature - Published in JOSS (2024)

https://github.com/amberbiology/scribl

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

computational-biology literature-mining network-biology systems-biology

Keywords from Contributors

hydrology meshes exoplanet energy-system geoscience cryptocurrencies solar xpress simulation-framework routing
Last synced: 6 months ago · JSON representation ·

Repository

Semantic Capture of Relationships In Biological Literature (scribl)

Basic Info
  • Host: GitHub
  • Owner: amberbiology
  • License: agpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://amberbiology.com/
  • Size: 5.86 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 5
Topics
computational-biology literature-mining network-biology systems-biology
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation Zenodo

README.md

status PyPI version shields.io Build scribl GitHub license <!-- Citation Badge -->

scribl

A system for the semantic capture of relationships in biological literature

The scribl language was designed for the curation from scientific articles, of the relationships between the various biological agents and processes that they describe, with a view to generating a graph database that captures these relationships as a connected network.

scribl statements are added as tags to articles in a literature database (the scribl codebase currently supports the free, open-source Zotero database). These tags are parsed and used to create a graph data structure that can be then be exported for use in a graph database platform such as neo4j, or Python's NetworkX.

For example:

::agent c9orf72 :gene :protein :url https://www.uniprot.org/uniprot/Q96LT7 ::agent gtp :tag nucleoside, purine, nucleoside triphosphate ::process exportin releases cargo into cytoplasm @ exportin-1 ::process smcr8 mutation > ulk1 phosphorylation < autophagy = smcr8 expression

Full details of the language are available in the full documentation contained in scribl.pdf.

How to cite scribl

If you write a paper that uses scribl in your analysis, please cite both:

  • our 2024 article published in the Journal of Open Source Software:

Webster GD, Lancaster AK. (2024) "scribl: A system for the semantic capture of relationships in biological literature." Journal of Open Source Software 9(99):6645. doi:10.21105/joss.06645

  • and the Zenodo record for the software. To cite the correct version, follow these steps:
  1. First visit the DOI for the overall Zenodo record: 10.5281/zenodo.12728362. This DOI represents all versions, and will always resolve to the latest one.

  2. When you are viewing the record, look for the Versions box in the right-sidebar. Here are listed all versions (including older versions).

  3. Select and click the version-specific DOI that matches the specific version of scribl that you used for your analysis.

  4. Once you are visiting the Zenodo record for the specific version, under the Citation box in the right-sidebar, select the citation format you wish to use and click to copy the citation. It will contain link to the version-specific DOI, and be of the form:

    Webster GD, Lancaster, AK. (YYYY) "scribl: A system for the semantic capture of relationships in biological literature" (Version X.Y.Z) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.XXXXX

Note that citation metadata for the current Zenodo record is also stored in CITATION.cff

Quickstart install and test

We recommend installing in a virtual environment.

Install from PyPI

shell pip install scribl

Install via GitHub repo

shell git clone https://github.com/amberbiology/scribl.git cd scribl pip install .

This will pull in all relevant Python dependencies, including pyparsing, pyzotero and others within your virtual environment

Test your installation

When scribl is installed, it contains a command-line program scribl that can perform some common API tasks, and to help the user jumpstart exploring using the system. After installation here are two example command-line you can cut-and-paste and run from the terminal to check installation:

Check version

shell scribl --version

Check outputs

First create a new, empty directory and change into it, e.g.:

shell mkdir /tmp/scribl-test cd /tmp/scribl-test

Then run:

shell scribl -g new_graphdb --zotero-library 5251557:group --networkx-fig graphdb-visual.png --graphmlfile graphdb.xml --cyphertextfile graphdb.cypher

This will read from a public Zotero group collection we have created (ID 5251557) for testing purposes, that includes citations with scribl tags. This example run generates the following outputs:

  • new_graphdb - directory containing the new scribl database

  • graphdb.cypher - a plain text file in the Cypher query language

  • graphdb.xml - an XML file in GraphML format

  • graphdb-visual.png - a visualization of the GraphML graph in PNG format:

Visualization of scribl database via NetworkX

see below for a more detailed description of visualization options.

Visualization of the scribl database

As mentioned previously, a scribl database can be exported for use in a graph database platform for further interactive exploration. One of the most developed graph visualization platforms is as Neo4j which reads the output in Cypher query language . Unfortunately Neo4j isn't directly available via Python packaging system. The instructions for installation and configuration is beyond the scope of this documentation, but they are available for most platforms. The main scribl documentation describes how to take the exported Cypher text generated by a command like the above into Neo4j and dynamically update the Neo4j database.

In the interests getting up and running quickly with visualizations, but staying purely within the Python ecosystem, scribl also supports outputs other than Cypher. The above mentioned GraphML backend generates visualizations using a combination of NetworkX and matplotlib. This feature enables generation of static visualizations of the current scribl database. Visualizations generated via the GraphML/NetworkX are best suited for small networks and to get a feel for the scribl database structure with a minimum of fuss. For "production" level visualization as well as interactive analyses, however, you are likely to want to try Neo4j.

Examples of scribl command-line program

Here are some common scribl tasks you can do using the scribl command-line:

Create a new database from Zotero library

shell scribl -g new_graphdb --zotero-library <LIBRARY_ID>:<TYPE> --zotero-api-key <API_KEY>

A scribl database is created in current directory with the name new_graphdb using the Zotero LIBRARY_ID with specified TYPE (either user or group). An API_KEY is only needed in the case of accessing private libraries, or non-public group libraries. The pyzotero library has documentation on how to find out your library ID and creating an API key.

Create a new database from a local Zotero CSV file

shell scribl -g new_graphdb --zoterofile <ZOTERO_CSV>

This populates the database from a local CSV file ZOTERO_CSV exported from a Zotero library. Note that the above two commands, just create (or read from an existing) database, but generate no output. (Note that the new_graphdb database is not overwritten if it already exists, unless the --overwrite flag is also supplied)

To generate outputs, there are a number of options. All assume at least one Zotero database has been imported, either from a local CSV file, or via the Zotero library API (i.e. at least one of the import steps above have been run at least once). Outputs will be skipped if the database empty.

Output representation of graph in GraphML format

shell scribl -g new_graphdb --graphmlfile <OUTPUT_XML>

This generates output in GraphML XML format (e.g. OUTPUT_XML should be supplied with the appropriate extension, e.g. graphdb.xml).

Output a NetworkX visualization

shell scribl -g new_graphdb --networkx-fig <OUTPUT_IMAGE>

This generates and saves the visualization as as one of the matplotlib supported OUTPUT_IMAGE formats, using data from current scribl database (for example, using graphdb-visual.pdf would generate it in PDF format). The visualizations are produced from GraphML XML output that is rendered using Python's NetworkX library.

Output representation of graph as a Cypher file

shell scribl -g new_graphdb --cyphertextfile <OUTPUT_CYPHER>

The above outputs a Cypher text representation of the database in the file OUTPUT_CYPHER, as a plain text file in the previously mentioned Cypher query language, suitable for import into Neo4J.

For a full description of all command-line arguments, run:

shell scribl --help

Using the scribl API directly in your programs

You can write your own Python program using the API directly. For example, the code below, generates the same outputs as the check outputs example above. It creates a new database, populates it from the same remote Zotero library, and then generates graphdb.xml (GraphML), graphdb.cypher (Cypher) and graphdb-visual.png (PNG).

```Python from scribl.manage_graphdb import GraphDBInstance

gdb = GraphDBInstance("newgraphdb") gdb.setmetadata("Test Scribl DB", "User name", "Small test") gdb.importzoterolibrary(5251557, "group") gdb.loadzoterocsv() gdb.exportgraphmltext(filepath="graphdb.xml") gdb.exportgraphmlfigure(filepath="graphdb-visual.png") gdb.exportcyphertext(filepath="graphdb.cypher") ```

The code for the scribl command contains more examples of use of the API. In addition, more details can be found in the main documentation.

Running unit tests with pytest

  1. clone the repo:

shell git clone https://github.com/amberbiology/scribl.git

  1. install with pip with the option test package :

shell pip install .[test]

  1. run pytest

Note that pytest creates output sandbox test files in the system temporary directory, e.g. on Linux pytest creates a directory structure /tmp/pytest-of-USER/pytest-NUM/scribl_sandboxNUM where USER is the current user and NUM is incremented on each run. Only the last three directories are retained.

Development and contributing to scribl

scribl is completely open-source and being developed by Amber Biology LLC (@amberbiology).

See CHANGELOG.md for a history of changes.

If you're interested in contributing, please read our CONTRIBUTING guide.

Copyright and license

scribl is Copyright (C) 2023, 2024. Amber Biology LLC

scribl is distributed under the terms of AGPL-3.0 license

Acknowledgements

The development of the scribl platform was made possible by the funding and expertise provided by the Association for Frontotemporal Degeneration (AFTD - https://www.theaftd.org/) whose mission is to improve the quality of life of people affected by Frontotemporal Degeneration and drive research to a cure. We are particularly grateful to AFTD leadership team members Debra Niehoff and Penny Dacks for their invaluable direction and guidance, and for getting us access to leading researchers in the field of neurodegenerative disease - all of whose insights and advice were pivotal in the development of this free, open-source research tool.

Owner

  • Name: Amber Biology
  • Login: amberbiology
  • Kind: organization
  • Email: info@amberbiology.com
  • Location: Cambridge MA

Amber Biology's Code Repository

JOSS Publication

scribl: A system for the semantic capture of relationships in biological literature
Published
July 17, 2024
Volume 9, Issue 99, Page 6645
Authors
Gordon D. Webster ORCID
Amber Biology LLC, USA
Alexander K. Lancaster ORCID
Amber Biology LLC, USA, Institute for Globally Distributed Open Research and Education (IGDORE)
Editor
Ana Trisovic ORCID
Tags
systems biology network biology literature mining bioinformatics computational biology Zotero graph database

Citation (CITATION.cff)

cff-version: 1.2.0
abstract: >-
  scribl (Semantic Capture of Relationships in Biological Literature) is a
  Python software pipeline and API that transforms a Zotero literature database
  with entries annotated with tags in the **scribl** language to create a graph
  data structure, suitable for import into a graph database platform, such as
  Neo4j or Python's NetworkX. The scribl language was designed for the curation
  from scientific articles, to capture the relationships between the various
  biological agents and the processes that they describe in a connected network.
authors:
  - family-names: Webster
    given-names: Gordon
    affiliation: Amber Biology LLC
    orcid: https://orcid.org/0009-0009-2862-0467
  - family-names: Lancaster
    given-names: Alexander K.
    orcid: https://orcid.org/0000-0002-0002-9263
    affiliation: >-
      Amber Biology LLC and Institute for Globally Distributed Open Research and
      Education (IGDORE)
title: >-
  scribl: A system for the semantic capture of relationships in biological
  literature
url: https://amberbiology.com
repository-code: https://github.com/amberbiology/scribl
type: software
keywords:
  - systems biology
  - network biology
  - literature mining
  - bioinformatics
  - computational biology
license: AGPL-3.0-or-later
version: v0.8.3
doi: 10.5281/zenodo.16892066
message: >-
  If you use this software, please cite our article in the Journal of Open
  Source Software.
preferred-citation:
  authors:
    - family-names: Webster
      given-names: Gordon D.
      orcid: https://orcid.org/0009-0009-2862-0467
    - family-names: Lancaster
      given-names: Alexander K.
      orcid: https://orcid.org/0000-0002-0002-9263
  date-published: 2024-07-17
  doi: 10.21105/joss.06645
  issn: 2475-9066
  issue: 99
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6645
  title: >-
    scribl: A system for the semantic capture of relationships in biological
    literature
  type: article
  url: https://joss.theoj.org/papers/10.21105/joss.06645
  volume: 9

GitHub Events

Total
  • Release event: 6
  • Watch event: 1
  • Delete event: 42
  • Issue comment event: 5
  • Push event: 93
  • Pull request event: 74
  • Pull request review event: 31
  • Create event: 45
Last Year
  • Release event: 6
  • Watch event: 1
  • Delete event: 42
  • Issue comment event: 5
  • Push event: 93
  • Pull request event: 74
  • Pull request review event: 31
  • Create event: 45

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 99
  • Total Committers: 5
  • Avg Commits per committer: 19.8
  • Development Distribution Score (DDS): 0.616
Past Year
  • Commits: 59
  • Committers: 5
  • Avg Commits per committer: 11.8
  • Development Distribution Score (DDS): 0.627
Top Committers
Name Email Commits
Alex Lancaster a****r 38
dependabot[bot] 4****] 35
pre-commit-ci[bot] 6****] 19
ci-scribl u****e 4
zenodraft/action 3

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 123
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Total issue authors: 1
  • Total pull request authors: 4
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.13
  • Merged pull requests: 100
  • Bot issues: 0
  • Bot pull requests: 99
Past Year
  • Issues: 0
  • Pull requests: 78
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.1
  • Merged pull requests: 65
  • Bot issues: 0
  • Bot pull requests: 72
Top Authors
Issue Authors
  • benlansdell (1)
Pull Request Authors
  • dependabot[bot] (70)
  • pre-commit-ci[bot] (39)
  • alexlancaster (25)
  • atrisovic (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (71) python (70) skip-changelog (35) documentation (7) enhancement (6) bug (6) github_actions (6)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 147 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
pypi.org: scribl

A system for the semantic capture of relationships in biological literature

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 147 Last month
Rankings
Dependent packages count: 10.7%
Average: 35.4%
Dependent repos count: 60.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

pyproject.toml pypi
  • matplotlib <= 3.8.2
  • networkx <= 3.2.1
  • pandas <= 2.1.4
  • pyparsing <= 3.1.1
  • pyzotero <= 1.5.18